Questions tagged [encoding-scheme]

140 questions
45
votes
9 answers

Is UTF-8 the final character encoding for all future time?

It seems to me that Unicode is the "final" character encoding. I cannot imagine anything else replacing it at this point. I'm frankly confused about why UTF-16 and UTF-32 etc. exist at all, not to mention all the non-Unicode character encodings…
Timone
  • 459
  • 4
  • 3
22
votes
3 answers

Why is this code uniquely decodable?

Source alphabet: $\{a, b, c, d, e, f\}$ Code alphabet: $\{0, 1\}$ $a\colon 0101$ $b\colon 1001$ $c\colon 10$ $d\colon 000$ $e\colon 11$ $f\colon 100$ I thought that for a code to be uniquely decodable, it had to be prefix-free. But in this code,…
2000mroliver
  • 333
  • 2
  • 6
6
votes
2 answers

Can nested structures be encoded more "readably" with a single delimiter?

Imagine you have two systems of delimiting. One with paired delimiters, [ and ]: [abc] Then another system which uses a single interstitial delimiter, /: a/b/c It's easy to see how to encode structure in the first case, as they nest…
3
votes
3 answers

Reconstructing files from binary

Popular view tells us that any kind of information is just a collection of bits, that is zeroes and ones placed in a particular order. I was thus having this thought. Suppose that I have some kind of file such as a text document, a PDF or anything…
Kurt
  • 133
  • 4
2
votes
1 answer

Encoding two 6-bit positive integers compactly

Is it possible to encode two 6-bit wide positive integers a and b guaranteed to be in [0, 63] in a way that a and b are recoverable -- in fewer than 12 bits? We could obviously concatenate the binary representations (a << 6 | b). The scheme would…
user110001
  • 153
  • 3
1
vote
2 answers

Understanding the binary and hexadecimal representations of UTF-8

Here is a code point: U+091D. The symbol it represents in UTF-8 is झ. In hex the symbol requires three bytes: e0 a4 9d. But it looks like the number 091D requires two bytes. So why do we need three bytes to encode the symbol? Probably due to the…
Maksim Dmitriev
  • 413
  • 1
  • 3
  • 14
1
vote
0 answers

Base-N encoding with smallest output

I have a set of bytes (utf-8), and need to encode them into the smallest dataset possible using a Base N encoding scheme. Is it simply the higher the Base N encoding is (ie something like Base85 encoding) the smaller the output will be, or is there…
SamG101
  • 111
  • 3
0
votes
0 answers

algorithm for encoding segmentations in RGBA data

I'm trying to figure out a good way of encoding object segmentation data into a single RGBA image. I'd like to not only distinguish between classes, but also individuals within classes, and be able to have multiple classes and objects per pixel. For…
waspinator
  • 101
  • 1
0
votes
0 answers

What is bi-binary code?

This being used in data bits transmitted by GLONASS Satellites. I am sure it is different from other codes like 'Binary Code' which the search engine throws up
Aadishri
  • 101
  • 1
0
votes
1 answer

Probability of certain symbols in a mapping of symbol and code word

Let's say you have this mapping of symbols and codewords: $$ \begin{array}{cc} \hline \text { Symbol } & \text { Codeword } \\ \hline \text { A } & 101 \\ \text { B } & 100 \\ \text { C } & 01 \\ \text { D } & 00 \\ \text { E } & 110 \\ \text { F }…
Rico1990
  • 125
  • 2
0
votes
1 answer

Are all prefix codes uniquely decodable?

I can't think of any counterexample but I can't find any such statement on the internet or my textbook either. I know that for each uniquely decodable code, there exists a prefix code with the same average length.