I'm trying to make a function that takes a string and compresses the repeating blocks. The code I'm using is implemented in such a way that a single character like 'a' would be converted to '1(a)' resulting in a bigger length.
The code is something like this:
import re
def _format_so(bchars, brep):
return '%i(%s)' % (brep, bchars) if bchars else ''
def char_rep(txt, _format=_format_so):
output, lastend = [], 0
for match in re.finditer(r"""(?ms)(?P<repeat>(?P<chars>.+?)(?:(?P=chars))+)""", txt):
beginpos, endpos = match.span()
repeat, chars = match.group('repeat'), match.group('chars')
if lastend < beginpos:
output.append(_format(txt[lastend:beginpos], 1))
output.append(_format(chars, repeat.count(chars)))
lastend = endpos
output = ''.join(output) + _format(txt[lastend:], 1)
return output
givenList = ['dwdawdawd', 'aaaaaaaaa', 'abcabcabca']
newList = []
for txt in givenList:
output_so = char_rep(txt, _format=_format_so)
newList.append(output_so)
print(newList)
Output = ['1(d)2(wda)1(wd)', '9(a)', '3(abc)1(a)']
I want to make sure that the output will have the shortest length possible. The previous example should output ['d2(wda)1wd', '9(a)', '3(abc)a']
What do you suggest as the best approach for solving this problem?