Unicode guide: Difference between revisions
(Background section) |
(Organize a bit more) |
||
Line 3: | Line 3: | ||
SUMMARY | SUMMARY | ||
== | == Strings == | ||
- | - character sets | ||
- strings | |||
- utf-8 | - utf-8 | ||
- ebdic/ascii | - ebdic/ascii | ||
- upper | - upper | ||
- lower | - lower | ||
- length | |||
- locales | |||
- OS APIs | - OS APIs | ||
== Unicode == | |||
- what is unicode | |||
- bytes | |||
- code points | |||
- characters | |||
- grapehen | |||
- locales | |||
- splitting things by space? | |||
- nightmare windows APIs | |||
- normalization | |||
- CLDR | |||
- languages, rich data, paragraphs, etc | |||
- length | |||
- languages | |||
== Idea dump == | |||
unicode handling across languages | unicode handling across languages | ||
perl unicode | perl unicode | ||
- char/wchar | |||
- bytes | - bytes | ||
Line 31: | Line 65: | ||
- wtf-8 | - wtf-8 | ||
- opaqueness | - opaqueness | ||
- locales | - locales | ||
- non-unicode | - non-unicode | ||
Line 51: | Line 81: | ||
- scheme | - scheme | ||
- formatting bytes/etc | - formatting bytes/etc | ||
Line 59: | Line 87: | ||
- bytes -> maybe unicode -> unicode -> graphemes/text/etc | - bytes -> maybe unicode -> unicode -> graphemes/text/etc | ||
user-perceived character / grapheme cluster | user-perceived character / grapheme cluster | ||
- scripts | - scripts | ||
- wchar | - wchar | ||
- <nowiki>https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default</nowiki> | - <nowiki>https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default</nowiki> | ||
- runes | - runes |
Revision as of 19:35, 7 March 2022
This is a WIP page, take nothing here as final.
SUMMARY
Strings
- character sets
- strings
- utf-8
- ebdic/ascii
- upper
- lower
- length
- locales
- OS APIs
Unicode
- what is unicode
- bytes
- code points
- characters
- grapehen
- locales
- splitting things by space?
- nightmare windows APIs
- normalization
- CLDR
- languages, rich data, paragraphs, etc
- length
- languages
Idea dump
unicode handling across languages
perl unicode
- char/wchar
- bytes
- characters
- utf-8
- utf-8b
- wtf-8
- opaqueness
- locales
- non-unicode
- bytes as strings kinda works better
- round trips
- perl
- c
- scheme
- formatting bytes/etc
- native format as utf-8? what?
- bytes -> maybe unicode -> unicode -> graphemes/text/etc
user-perceived character / grapheme cluster
- scripts
- wchar
- https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default
- runes
- rust char
https://stackoverflow.com/questions/12450750/how-can-i-work-with-raw-bytes-in-perl