This is inspired by a problem from here. This is the approximate form of the problem:
Given a string like "aaaa777cbb" (10 symbols long), run length encode it in-place to a string like "a473c1b2" (8 symbols long). You are guaranteed that the input will always be longer than the output.
The precise form of the problem is:
- You are given an ordered list $L$ of symbols from a set $S$. Any symbol from $S$ may appear in the list.
- $S$ contains all the positive integers up to and including $|L|$ (the length of $L$) and also some other symbols.
- Rules of manipulating the input in-place
- You can replace one symbol in the list with another
- You can trim the list to a length of your choice by removing symbols from the end
- You cannot insert symbols
- You must overwrite the list of symbols with it's run-length-encoding representation and trim it to length so that it includes only the run-length-encoding representation.
- The run-length-encoding representation replaces each series of 1 or more of the same symbol in the input with that symbol followed by the symbol representing the number of occurrences of the previous symbol.
- For example: $[a, a, a, a, a, a, a, a, a, a, 7]$ becomes $[a, 10, 7, 1]$ meaning "$a$ ten times followed by $7$ one time"
- Note that the length of the output list is always even
- You are guaranteed that the length of the input list is always larger than the length of the output list
- The run-length-encoding representation replaces each series of 1 or more of the same symbol in the input with that symbol followed by the symbol representing the number of occurrences of the previous symbol.
- You must do this with $O(1)$ additional working memory
- Each "word" of working memory contains $log_2 |S|$ bits (put another way, words may be constructed which store constant amounts of information, the position of any element in the input, or any symbol from the input)
Intuitively I don't think this is possible. The solutions provided on the original site seem to break on strings like "abccccc" (length 7) where the output should be "a1b1c5" (length 6), since they start by overwriting "b" with the "1" from "a1" before they have even checked which symbol is in the 2nd position.
I have thought about trying to start by finding the compressible runs of letters (2 or more of the same letter), but I don't know how to tell which symbols are already processed and which are from the original input without using some sort of memory that which would grow with the size of the input (like a bitmap of processed areas) and therefore put me in violation of the $O(1)$ space requirement.
I consider acceptable answers to be proofs that this problem either is or is not solvable in $O(1)$ space.
output string's length is always smaller than input string
(from the hyperlink) seems to allow two interpretations: 1) for every problem instance, the output is shorter than the input 2) at every output (of two symbols), the encoding of the prefix to the part under consideration will be short enough to allow two symbols (implying input symbols will be duplicated to the end of input or a run longer than two). – greybeard Sep 23 '20 at 07:19