1

Can someone explain what this means?

Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex.

http://en.wikipedia.org/wiki/Unicode

Jon Purdy
  • 20,547

3 Answers3

9

The value of 0x10FFFF is 1,114,111. Between 0x000000 and 0x10FFFF, you have 1,114,112 numerical values. In Unicode, every character is represented by a number in the range 0 through 1,114,111. Software then interprets this numerical value to render the character on the screen.

Thomas Owens
  • 82,739
7

A code point or code position is any of the numerical values that make up the codespace.

As per the Wikipedia entry on Code point:

The notion of a code point is used for abstraction, to distinguish both:

  • the number from an encoding as a sequence of bits, and
  • the abstract character from a particular graphical representation (glyph).

This is because one may wish to make these distinctions:

  • encode a particular code space in different ways, or
  • display a character via different glyphs.

So Unicode defines space for over a million distinct glyphs, which are accessed in the hex range from 0 to 10FFFF. You may be familiar with Extended ASCII which comprises 256 code points in the range 0hex to FFhex.

ghoppe
  • 386
  • 1
  • 4
1

The codespace is an abstract layer between the glyphs (User Interface level) and the (byte-)encoding. Every character defined in Unicode has its own code point, and there 1,114,112 possible code points (not all are used).

For example, the euro sign is always code point U+20AC. Depending on the selected encoding it may be encoded as +IKw- (UTF-7), E2 82 AChex (UTF-8), 20 AChex (UTF-16) OR 00 00 20 AChex (UTF-32).

Jaap
  • 2,305
  • it would be nice (+1) if you could include an example to show how a character looks at the interface level along with its unicode equivalent along with how it would look in its byte-encoding – Imran Omar Bukhsh Sep 27 '11 at 08:19