Editing Unicode guide (section)

== Code points ==
The Unicode standard defines a range of integers from 0x0 to 0x10FFFF as the 'Unicode codespace', and defines a code point as a value within this codespace.

The primary purpose of the code points are to address encoded characters, but it also encodes more. There are seven categories of code points:

* Graphic: Assigned to visible characters
* Format: Assigned to invisible formatting characters
* Control: Assigned to characters used in Unicode and non-Unicode protocols and standards
* Private-use: Assigned for interpretation used outside the Unicode standard
* Surrogate: Reserved for UCS-2 compatibility, must not be encoded
* Noncharacter: Reserved for application internal use, not used for open interchange
* Reserved: Not assigned yet, used in future Unicode versions

Each code point belongs to one of these categories. I bring this system up because there's two major implications that stem from it:

The first is that it's not always possible to interpret a code point as an encoded character: It may be from a future version of Unicode, it may be private use and not known to you, or it may not even be a character at all and instead used for application specific processing.

The second is that exchanging code points must be done mindfully: Surrogate code points can not be exchanged using official Unicode encodings, noncharacters are not intended to be interchanged openly, and private use characters require an external agreement outside the standard.

Unicode also defines a sequence of one or more code points as a 'Coded character sequence', or just 'character sequence' for short. Despite this name it may include any valid code point, including noncharacters or reserved code points. It is strictly a sequence of code points.

The best way to think about code points and sequences of them are as opaque building blocks used in Unicode-aware algorithms. Much like encoded characters don't map to the human concept of character, code points don't map to the machine concept of encoded characters or anything higher level.