Unicode guide: Difference between revisions
(Add idea dump) |
(Background section) |
||
Line 3: | Line 3: | ||
SUMMARY | SUMMARY | ||
== Background == | |||
- what is unicode | |||
- utf-8 | |||
- ebdic/ascii | |||
- strings | |||
- upper | |||
- lower | |||
- OS APIs | |||
unicode handling across languages | unicode handling across languages |
Revision as of 14:44, 7 March 2022
This is a WIP page, take nothing here as final.
SUMMARY
Background
- what is unicode
- utf-8
- ebdic/ascii
- strings
- upper
- lower
- OS APIs
unicode handling across languages
perl unicode
- bytes
- characters
- utf-8
- utf-8b
- wtf-8
- splitting things by space?
- opaqueness
- locales
- nightmare windows APIs
- non-unicode
- bytes as strings kinda works better
- round trips
- perl
- c
- scheme
- languages
- formatting bytes/etc
- native format as utf-8? what?
- bytes -> maybe unicode -> unicode -> graphemes/text/etc
- code points
user-perceived character / grapheme cluster
- languages, rich data, paragraphs, etc
- scripts
- wchar
- unicode characters
- https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default
- normalization
- length
- upper
- lower
- CLDR
- runes
- rust char
https://stackoverflow.com/questions/12450750/how-can-i-work-with-raw-bytes-in-perl