Unicode guide/Implementations: Difference between revisions
(Add quick introduction) |
(Add Squirrel) |
||
Line 75: | Line 75: | ||
DOS APIs | DOS APIs | ||
squirrel | |||
[[Category:Research]] | [[Category:Research]] |
Revision as of 18:14, 19 March 2022
This page is my attempt to document my research on Unicode string implementations supported in various languages and software.
Classifications
- bytestrings (applies to most?)
- bare encoding/runes (java, windows, wchar, rust, go, javascript, ruby, kotlin, zig, elixir)
- codepoint based (python, haskell, perl, tcl)
- grapheme-based (swift, raku) which lets you convert a string to codepoints?
- normalized (raku)
research and categorize the following:
- filesystem/OS APIs being broken?
- surrogates in valid strings (python utf8b)
- bytes in strings? (utf8-c8, utf8b)
- string apis? (encoding/decoding, code points, normalization, graphenes, segmentation, ordering, comparing, breaking, case folding, finding, regex)
- bytestrings
- wchar
- windows
- rust
- java
- swift
- go
- kotlin
- java
- python, utf8b
- tcl
- linux/unix
- javascript
- perl
- ruby
- zig
- raku
- haskell
- elixir
- ICU
C and C++
Python 2
Lua
PHP (ignoring mbstring)
POSIX APIs
Windows narrow APIs
DOS APIs
squirrel