Unicode Explanation Required

Question

Can someone explain what this means?

Unicode defines a codespace of 1,114,112 code points in the range 0_hex to 10FFFF_hex.

http://en.wikipedia.org/wiki/Unicode

score 9 · Answer 1 · answered Sep 26 '11 at 15:22

The value of 0x10FFFF is 1,114,111. Between 0x000000 and 0x10FFFF, you have 1,114,112 numerical values. In Unicode, every character is represented by a number in the range 0 through 1,114,111. Software then interprets this numerical value to render the character on the screen.

score 7 · Accepted Answer · answered Sep 26 '11 at 15:18

A code point or code position is any of the numerical values that make up the codespace.

As per the Wikipedia entry on Code point:

The notion of a code point is used for abstraction, to distinguish both:

the number from an encoding as a sequence of bits, and

the abstract character from a particular graphical representation (glyph).

This is because one may wish to make these distinctions:

encode a particular code space in different ways, or

display a character via different glyphs.

So Unicode defines space for over a million distinct glyphs, which are accessed in the hex range from 0 to 10FFFF. You may be familiar with Extended ASCII which comprises 256 code points in the range 0hex to FFhex.

score 1 · Answer 3 · answered Sep 26 '11 at 15:47

1

The codespace is an abstract layer between the glyphs (User Interface level) and the (byte-)encoding. Every character defined in Unicode has its own code point, and there 1,114,112 possible code points (not all are used).

For example, the euro sign is always code point U+20AC. Depending on the selected encoding it may be encoded as +IKw- (UTF-7), E2 82 AC_hex (UTF-8), 20 AC_hex (UTF-16) OR 00 00 20 AC_hex (UTF-32).

answered Sep 26 '11 at 15:47

Jaap

2,305

it would be nice (+1) if you could include an example to show how a character looks at the interface level along with its unicode equivalent along with how it would look in its byte-encoding – Imran Omar Bukhsh Sep 27 '11 at 08:19

Unicode Explanation Required

3 Answers3