A bijection between $[0,1]$ and all $k$-encoded strings

Question

Let $X_k$ be a set containing $k$ characters: $X_k=\{x_1, x_2, ..., x_k\}$.

$S_n$ is the set of all possible strings encoded by characters in $X_k$ with length $n$: $S_n=\{y_1y_2...y_n|y_i \in X_k, i=1,2,...,n\}$.

$S$ is the union of all $S_n$ (all $k$-encoded strings): $$S=\bigcup_{i=1}^{\infty} S_i$$

Question: Is it possible, or how to write a bijection between $S$ and all reals in $[0,1]$ ? ($[0,1]$ could also be $(0,1)$, $[0,1)$ or $(0,1]$ if the answer is more concise. )

I have found some similar but different questions:

Bijection between the reals and the set of permutations of the natural numbers. However, the sequence described in this question is infinite, while my strings have all possible lengths.
A bijection between the reals and infinite binary strings. The strings in this question are still infinite. Besides, they're binary.

This question originates from my attempt to map all DNA sequences (4-encoded) to $[0,1]$. Then I was wondering if this question could be expanded to any $k$-encoded string.

The basic idea was to consider every string as a decimal of base $k$. For example, suppose $X_4=\{0,1,2,3\}$, then string "23102" is mapped to 0.23102 (base 4), and string "321" is mapped to 0.321 (base 4). However, string "23102", "231020" and "2310200" will all be mapped to 0.23102 (base 4).

This situation can be circumvented if the string is binary by adding one extra 1 after the string. For example, consider string "01" and "010". Transform them by adding the extra 1, and they become "011" and "0101". Then map them to 0.011 and 0.0101 (both base 2), respectively. This way works because every finite decimal of base 2 will necessarily end up with 1 (and infinite decimals don't have the problem above). But this trick doesn't work when the string is not binary ($k\ne2$).

Please don't use $\aleph_k$ for a set of size $k$. Just use $S_k$ or some other indexed set, as $\aleph_k$ is already a name for a specific (very infinite) set in set theory. — Henno Brandsma, Feb 19 '16 at 10:42

score 2 · Answer 1 · answered Feb 19 '16 at 10:30

2

There cannot be any bijection, because these sets are of different cardinalities. The set $S$ is a set of finite strings of arbitrary length on finite alphabet, so $S$ is countable, $|S| = |\mathbb{N}|$. On the other hand $[0,1]$ has the same cardinality as $\mathbb{R}$ which is not countable.

I hope this helps $\ddot\smile$

answered Feb 19 '16 at 10:30

dtldarek

37,381

But $S$ contains $S_{\infty}$. Is $S_{\infty}$ also countable? – Wei Feng Feb 20 '16 at 11:35
Assuming that $S_\infty$ means the set of infinite strings, you did not define it nor included in $S$. You defined $S_n$ only for $n \in \mathbb{N}$ and set $S$ to be the sum of them. Given your current definition $S$ contains arbitrarily long strings, but it does not contain infinite strings. – dtldarek Feb 20 '16 at 11:55
So how can I redefine $S$ to include $S_{\infty}$? And does the bijection exist after adding $S_{\infty}$? – Wei Feng Feb 20 '16 at 12:37
Define $S_\infty$ to be the set of infinite strings on $X_k$ (or something similar like $X_k^{\mathbb{N}}$) and set $S = S_\infty \cup \bigcup_{i=1}^{\infty} S_i$. Then, for any $k \geq 2$ the bijection does exists. However, writing it down won't be trivial, in particular the base-encoding you proposed has at least one inherent problem: $0.1\overline{9} = 2$, so you would collapse the strings "2" and "1999...", i.e. the function wouldn't be a bijection anymore. If you really need a bijection, you can find some ideas here. – dtldarek Feb 20 '16 at 12:56

A bijection between $[0,1]$ and all $k$-encoded strings

1 Answers1