Let $X_k$ be a set containing $k$ characters: $X_k=\{x_1, x_2, ..., x_k\}$.
$S_n$ is the set of all possible strings encoded by characters in $X_k$ with length $n$: $S_n=\{y_1y_2...y_n|y_i \in X_k, i=1,2,...,n\}$.
$S$ is the union of all $S_n$ (all $k$-encoded strings): $$S=\bigcup_{i=1}^{\infty} S_i$$
Question: Is it possible, or how to write a bijection between $S$ and all reals in $[0,1]$ ? ($[0,1]$ could also be $(0,1)$, $[0,1)$ or $(0,1]$ if the answer is more concise. )
I have found some similar but different questions:
- Bijection between the reals and the set of permutations of the natural numbers. However, the sequence described in this question is infinite, while my strings have all possible lengths.
- A bijection between the reals and infinite binary strings. The strings in this question are still infinite. Besides, they're binary.
This question originates from my attempt to map all DNA sequences (4-encoded) to $[0,1]$. Then I was wondering if this question could be expanded to any $k$-encoded string.
The basic idea was to consider every string as a decimal of base $k$. For example, suppose $X_4=\{0,1,2,3\}$, then string "23102" is mapped to 0.23102 (base 4), and string "321" is mapped to 0.321 (base 4). However, string "23102", "231020" and "2310200" will all be mapped to 0.23102 (base 4).
This situation can be circumvented if the string is binary by adding one extra 1 after the string. For example, consider string "01" and "010". Transform them by adding the extra 1, and they become "011" and "0101". Then map them to 0.011 and 0.0101 (both base 2), respectively. This way works because every finite decimal of base 2 will necessarily end up with 1 (and infinite decimals don't have the problem above). But this trick doesn't work when the string is not binary ($k\ne2$).