If the title was not clear, I'm examining methods of taking two binary strings as input and outputting one binary string in such a way that the two original strings can be extracted from the output, and I want to know about how efficiently you can do it - is there a best way? What's the minimum overhead?
I'm also interested in any method that differs significantly from or improves upon any of the following.
Given strings $A$, $B$ with lengths $|A|$, $|B|$:
Approach 1: Code 0 as 00, 1 as 11 and let 01 be a 'separator' character. This produces a string of length $2(|A|+|B|+1)$. To get the original strings back, you just find the separator, discard it and read off $A$ and $B$.
Approach 2: Somewhat more elegant, but produces a string the same length; code to $A0BA1B$. To get $A$ and $B$ back, just halve the string and examine the nth characters until there's a discrepancy.
Approach 3: This is an improovement on approach 1. Code A with the doubling scheme ($0\to00, 1\to11$), follow up with the opposite of the first character of $B$, then put down $B$. This is length $2|A|+|B|+1$. To get $A$ and $B$ back, break the string into blocks of two, and look for the first that isn't $00$ or $11$. Before that point, you have the coded version of $A$. The first part of the first anomalous block can be discarded, and the rest is $B$.
Approach 4: Getting cleverer now. Uses a similar idea to approach 3, but instead of separating $A$ from $B$, you separate |A| (as a binary number) from AB. This is length $|A|+|B|+2\log_2 |A|$.
Approach 5: This is the most complicated, but potentially the best (yet) for maximally compressed strings (asymptotically). Look for the longest consecutive run of $1$ or $0$ in $A$ and $B$. Say you find a run of $a$ length $n$. Then produce the string $A(1-a)(a)^{n+2}(1-a)B$ (where power is repeated concatenation). This is length $|A|+|B|+n+4$, but I've not managed to make a probabalistic argument as to the expected size of $n$. My best guess is that it's logarithmic in $|A|+|B|$. To get $A$ and $B$ back, look for the longest consecutive run of 0 or 1, then discard that run and the two characters on either side of it. $A$ is on the left, $B$ is on the right.
In the last section where I refer to probabilistic arguments I assume maximally compressed strings have a distribution based on being essentially 'random' (lacking in patterns), but I'm not as knowledgeable in this area as I could be. Generally speaking I'm interested in both worst and average case.
Your answer is clever! I'm leaving this question open for now, but if there's no activity for a while I'll accept.
– G. H. Faust Aug 05 '14 at 08:58