Can someone explain what this means?
Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex.
Can someone explain what this means?
Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex.
The value of 0x10FFFF is 1,114,111. Between 0x000000 and 0x10FFFF, you have 1,114,112 numerical values. In Unicode, every character is represented by a number in the range 0 through 1,114,111. Software then interprets this numerical value to render the character on the screen.
A code point or code position is any of the numerical values that make up the codespace.
As per the Wikipedia entry on Code point:
The notion of a code point is used for abstraction, to distinguish both:
- the number from an encoding as a sequence of bits, and
- the abstract character from a particular graphical representation (glyph).
This is because one may wish to make these distinctions:
- encode a particular code space in different ways, or
- display a character via different glyphs.
So Unicode defines space for over a million distinct glyphs, which are accessed in the hex range from 0 to 10FFFF. You may be familiar with Extended ASCII which comprises 256 code points in the range 0hex to FFhex.
The codespace
is an abstract layer between the glyphs (User Interface level) and the (byte-)encoding. Every character defined in Unicode has its own code point
, and there 1,114,112 possible code points (not all are used).
For example, the euro sign is always code point U+20AC.
Depending on the selected encoding it may be encoded as +IKw-
(UTF-7), E2 82 AC
hex (UTF-8), 20 AC
hex (UTF-16) OR 00 00 20 AC
hex (UTF-32).