Editing Unicode guide

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 172: Line 172:
The third step is composition: This step is optional and does the reverse of decomposition as a form of compression. It looks at the new sequence and recursively matches character sequences in it to decomposition mappings. This step excludes many opportunities to compose: Various scripts have specific exclusions, and single encoded characters will not compose to other single encoded characters. As an example of composition, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT is composed back to LATIN SMALL LETTER E WITH ACUTE. As an example of an exclusion, OHM SIGN will decompose to GREEK CAPITAL LETTER OMEGA but not compose back to OHM SIGN.
The third step is composition: This step is optional and does the reverse of decomposition as a form of compression. It looks at the new sequence and recursively matches character sequences in it to decomposition mappings. This step excludes many opportunities to compose: Various scripts have specific exclusions, and single encoded characters will not compose to other single encoded characters. As an example of composition, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT is composed back to LATIN SMALL LETTER E WITH ACUTE. As an example of an exclusion, OHM SIGN will decompose to GREEK CAPITAL LETTER OMEGA but not compose back to OHM SIGN.


When describing these steps I glossed over what it means for encoded characters to be equivalent. Unicode defines two forms of equivalent: Canonical and compatibility equivalence. Both of these equivalences require that the encoded characters represent the same abstract character. Compatibility equivalence goes a step further and defines equivalence between encoded characters that have different appearances or behaviours. This usually includes formatting and other ways to write a character, but does not include other variants of the character such as different cases.
When describing these steps I glossed over what it means for encoded characters to be equivalent. Unicode defines two forms of equivalent: Canonical and compatibility equivalence. Both of these equivalences require that the encoded characters represent the same abstract character. Compatibility equivalence goes a step further and defines equivalence between encoded characters that have different appearances or behaviours.


These encoded characters are all compatibly equivalent to the digit two:
These encoded characters are all compatibly equivalent to the digit two:
Please note that all contributions to JookWiki are considered to be released under the Creative Commons Zero (Public Domain) (see JookWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)