Unicode guide: Difference between revisions

Revision as of 14:44, 7 March 2022

This is a WIP page, take nothing here as final.

SUMMARY

- what is unicode

- utf-8

- ebdic/ascii

- strings

- upper

- lower

- OS APIs

unicode handling across languages

perl unicode

- bytes

- characters

- utf-8

- utf-8b

- wtf-8

- splitting things by space?

- opaqueness

- locales

- nightmare windows APIs

- non-unicode

- bytes as strings kinda works better

- round trips

- perl

- c

- scheme

- languages

- formatting bytes/etc

- native format as utf-8? what?

- bytes -> maybe unicode -> unicode -> graphemes/text/etc

- code points

user-perceived character / grapheme cluster

- languages, rich data, paragraphs, etc

- scripts

- wchar

- unicode characters

- https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default

- normalization

- length

- upper

- lower

- CLDR

- runes

- rust char

https://stackoverflow.com/questions/12450750/how-can-i-work-with-raw-bytes-in-perl