Unicode guide/Implementations
This page is my attempt to document my research on Unicode string implementations supported in various languages and software.
Classifications
Here's a quick list of things I'll be classifying:
- Bytestring support
- Internal encoding
- String encoding
- Character type
- OS API encoding/type
- Supports bytes in strings
- Can encode/decode to other encodings
- How breaking by code points, graphene, words, paragraphs, etc is done
- How ordering works
- How upper/lower/folding case works
- How finding works
- How regex works
- How locale tailoring is done
C
- C++ too?
D
POSIX
DOS
Windows
- narrow APIs
- wide APIs
Rust
Java
Swift
Go
Kotlin
Python
python 2
python 3
Tcl
Lua
Squirrel
Perl
Ruby
Zig
Elixir
- erlang too?
Raku
Haskell
PHP
- narrow APIs
- mbstring