Unicode guide/Implementations
TODO: organize
- bytestrings (applies to most?)
- bare encoding/runes (java, windows, wchar, rust, go, javascript, ruby, kotlin, zig, elixir)
- codepoint based (python, haskell, perl, tcl)
- grapheme-based (swift, raku) which lets you convert a string to codepoints?
- normalized (raku)
research and categorize the following:
- filesystem/OS APIs being broken?
- surrogates in valid strings (python utf8b)
- bytes in strings? (utf8-c8, utf8b)
- string apis? (encoding/decoding, code points, normalization, graphenes, segmentation, ordering, comparing, breaking, case folding, finding, regex)
- bytestrings
- wchar
- windows
- rust
- java
- swift
- go
- kotlin
- java
- python, utf8b
- tcl
- linux/unix
- javascript
- perl
- ruby
- zig
- raku
- haskell
- elixir
- ICU
- C and C++
- Python 2
- Lua
- PHP (ignoring mbstring)
- POSIX APIs
- Windows narrow APIs
- DOS APIs