I am trying to implement Golomb coding, but I don't understand how it's tuned to obtain optimal code.
It is said that
Golomb coding uses a tunable parameter M to divide an input value into two parts: q, the result of a division by M, and r, the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding.
I don't understand how should I choose the parameter M - I can't see how the explanation in Wikipedia relates to actual data. I believe it should be related to statistical moments, is that true?
For example, if I have this example set:
{3,4,4,4,3,1,2,2,3,1,2,1,4,1,2,2,2,2,1,1,2,2,1}
I believe M should be very small for this kind of data. I bet it's either 1 or 2. It's mean is ~2.2 and standard deviation is ~1.1. My intuition would tell me to choose 2.
Another dataset here:
{2,7,11,19,6,2,6,13,11,1,5,2,19,7,6,9,6,7,2,4,5,12,3}
This time the mean is ~7.2 and standard deviation is ~5.0.
Is 7 the right value in this case? And should I prefer Rice code (use 8 as it is a power of 2) if I get a value like 7?
I understand that division will be easier if I use Rice coding, but are there any benefits in NOT using it? I mean - 3 bits will be used for remainder in either case, how could pure Golomb code be more optimal then?
One more nuance - Golomb code is for nonnegative integers. If I have positive integers instead, should I save x-1 instead? It would change a lot for the first of the mentioned datasets.