Jump to content
Toggle sidebar
JookWiki
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Navigation
Main page
Recent changes
Random page
All pages
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information
Editing
Unicode guide
(section)
Page
Discussion
English
Read
Edit
Edit source
View history
More
Read
Edit
Edit source
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Code points == The Unicode standard defines a range of integers from 0x0 to 0x10FFFF as the 'Unicode codespace', and defines a code point as a value within this codespace. The primary purpose of the code points are to address encoded characters, but it also encodes more. There are seven categories of code points: * Graphic: Assigned to visible characters * Format: Assigned to invisible formatting characters * Control: Assigned to characters used in Unicode and non-Unicode protocols and standards * Private-use: Assigned for interpretation used outside the Unicode standard * Surrogate: Reserved for UCS-2 compatibility, must not be encoded * Noncharacter: Reserved for application internal use, not used for open interchange * Reserved: Not assigned yet, used in future Unicode versions Each code point belongs to one of these categories. I bring this system up because there's two major implications that stem from it: The first is that it's not always possible to interpret a code point as an encoded character: It may be from a future version of Unicode, it may be private use and not known to you, or it may not even be a character at all and instead used for application specific processing. The second is that exchanging code points must be done mindfully: Surrogate code points can not be exchanged using official Unicode encodings, noncharacters are not intended to be interchanged openly, and private use characters require an external agreement outside the standard. Unicode also defines a sequence of one or more code points as a 'Coded character sequence', or just 'character sequence' for short. Despite this name it may include any valid code point, including noncharacters or reserved code points. It is strictly a sequence of code points. The best way to think about code points and sequences of them are as opaque building blocks used in Unicode-aware algorithms. Much like encoded characters don't map to the human concept of character, code points don't map to the machine concept of encoded characters or anything higher level.
Summary:
Please note that all contributions to JookWiki are considered to be released under the Creative Commons Zero (Public Domain) (see
JookWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To edit this page, please answer the question that appears below (
more info
):
Who owns this wiki?
Cancel
Editing help
(opens in new window)