22

Source alphabet: $\{a, b, c, d, e, f\}$

Code alphabet: $\{0, 1\}$

  • $a\colon 0101$
  • $b\colon 1001$
  • $c\colon 10$
  • $d\colon 000$
  • $e\colon 11$
  • $f\colon 100$

I thought that for a code to be uniquely decodable, it had to be prefix-free. But in this code, the codeword $c$ is the prefix of codeword $f$ for example, so it is not prefix-free. However my textbook tells me that its reverse is prefix free (I don't understand this), and therefore it is uniquely decodable. Can someone explain what this means, or why it is uniquely decodable? I know it satisfies Kraft's inequality, but that is only a necessary condition, not a sufficient condition.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
2000mroliver
  • 333
  • 2
  • 6
  • 10
    Prefix-free implies uniquely decodable, but that it is not an "if and only if" statement. See, for example, here. – dkaeae Mar 03 '19 at 13:46
  • Okay I see, but my text book says this: Code A is uniquely decodable since its reverse it is prefixfree, so uniquely decodable

    Do you understand what they mean by its reverse?

    – 2000mroliver Mar 03 '19 at 13:47
  • 1
    Probably simply the code obtained by reversing all codewords. – dkaeae Mar 03 '19 at 13:47
  • and why does that imply uniquely decodable, I don't get it – 2000mroliver Mar 03 '19 at 13:53
  • You can decode it by running the normal decoding algorithm, but going backwards through the string. – RemcoGerlich Mar 04 '19 at 09:04
  • 1
    c may be a prefix of b and f, but the suffixes that are left over don't exist in the code. When you reverse the code, suffixes become prefixes, and then it becomes prefix-free. – Barmar Mar 04 '19 at 17:12

3 Answers3

28

Your code has the property that if you reverse all codewords, then you get a prefix code. This implies that your code is uniquely decodable.

Indeed, consider any code $C = x_1,\ldots,x_n$ whose reverse $C^R := x_1^R,\ldots,x_n^R$ is uniquely decodable. I claim that $C$ is also uniquely decodable. This is because $$ w = x_{i_1} \ldots x_{i_m} \text{ if and only if } w^R = x_{i_m}^R \ldots x_{i_1}^R. $$ In words, decompositions of $w$ into codewords of $C$ are in one-to-one correspondence with decompositions of $w^R$ into codewords of $C^R$. Since the latter are unique, so are the former.

Since prefix codes are uniquely decodable, it follows that the reverse of a prefix code is also uniquely decodable. This is the case in your example.

The McMillan inequality states that if $C$ is uniquely decodable then $$ \sum_{i=1}^n 2^{-|x_i|} \leq 1. $$ In other words, a uniquely decodable code satisfies Kraft's inequality. Therefore if all you're interested in is minimizing the expected codeword length, there is no reason to look beyond prefix codes.

Sam Roweis gives in his slides a nice example of a uniquely decodable code which is neither a prefix code nor the reverse of a prefix code: $$ 0,01,110. $$ In order to show that this code is uniquely decodable, it suffices to show how to decode the first codeword of a word. If the word starts with a $1$, then the first codeword is $110$. If it is of the form $01^*$, then it must be either $0$ or $01$. Otherwise, there must be a prefix of the form $01^*0$. We now distinguish several cases:

$$ \begin{array}{c|cccc} \text{prefix} & 00 & 010 & 0110 & 01110 \\\hline \text{codeword} & 0 & 01 & 0 & 01 \end{array} $$ Longer runs of $1$ cannot be decoded at all.

Yuval Filmus
  • 276,994
  • 27
  • 311
  • 503
  • 2
    In seems that in the OP's example, we cannot decode the first codeword after a fixed amount of digits, there are infinitely many cases: 1001010101010101… can be either fcccccc… or caaa…, and we might need to wait until the end of the input to decide. – Bergi Mar 03 '19 at 21:58
  • 1
    This also happens for $1,10,00$. – Yuval Filmus Mar 03 '19 at 22:00
  • 4
    @Bergi It is always decodable for any finite amount of digits. There is always only one way to decode the encoding without any remainders. Any other attempt will end up with spare 1's or 0s. This is because the code is uniquely decodable if we read it tail first. In theory if something is uniquely decodable in one direction it makes no sense that there can be more than one solution in the other direction – slebetman Mar 04 '19 at 00:22
  • @slebetman I was referring to a finite prefix (with possible remainders). Yes, if we take the whole input it always is decodable. – Bergi Mar 04 '19 at 11:40
5

If I give you any message that you are supposed to decode, then you can do the following: Reverse the message, starting with the last bit instead of the first bit. Reverse the code words. Decode the message. Reverse the decoded string.

You can do that because after reversing the six code words, you get a prefix-free code: 1010, 1001, 01, 000, 11, 001 is prefix free.

gnasher729
  • 29,996
  • 34
  • 54
0

If prefix-free means what I think, the reverse of ‘a’ starts with 1, or 10, or 101, none of which is any other whole valid code.

Therefore, if a message ends with 0101, it can only be an ‘a’ and you can apply similar logic to the preceding bit(s).

However, what if there is no end to start from? Well, if the first bit is 1, you know it isn’t ‘a’ or ‘d’. The second bit will eliminate ‘e’ or {‘b’,’c’,’f’}. The third bit might bring it down to one choice, but if not, it is unique by the fourth bit.

As soon as you get to a unique sequence, you restart the algorithm on the next bit.

WGroleau
  • 101
  • 1