Editing Unicode guide
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 172: | Line 172: | ||
The third step is composition: This step is optional and does the reverse of decomposition as a form of compression. It looks at the new sequence and recursively matches character sequences in it to decomposition mappings. This step excludes many opportunities to compose: Various scripts have specific exclusions, and single encoded characters will not compose to other single encoded characters. As an example of composition, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT is composed back to LATIN SMALL LETTER E WITH ACUTE. As an example of an exclusion, OHM SIGN will decompose to GREEK CAPITAL LETTER OMEGA but not compose back to OHM SIGN. | The third step is composition: This step is optional and does the reverse of decomposition as a form of compression. It looks at the new sequence and recursively matches character sequences in it to decomposition mappings. This step excludes many opportunities to compose: Various scripts have specific exclusions, and single encoded characters will not compose to other single encoded characters. As an example of composition, LATIN SMALL LETTER E and COMBINING ACUTE ACCENT is composed back to LATIN SMALL LETTER E WITH ACUTE. As an example of an exclusion, OHM SIGN will decompose to GREEK CAPITAL LETTER OMEGA but not compose back to OHM SIGN. | ||
When describing these steps I glossed over what it means for encoded characters to be equivalent. Unicode defines two forms of equivalent: Canonical and compatibility equivalence. Both of these equivalences require that the encoded characters represent the same abstract character. Compatibility equivalence goes a step further and defines equivalence between encoded characters that have different appearances or behaviours | When describing these steps I glossed over what it means for encoded characters to be equivalent. Unicode defines two forms of equivalent: Canonical and compatibility equivalence. Both of these equivalences require that the encoded characters represent the same abstract character. Compatibility equivalence goes a step further and defines equivalence between encoded characters that have different appearances or behaviours. | ||
These encoded characters are all compatibly equivalent to the digit two: | These encoded characters are all compatibly equivalent to the digit two: |