Unicode guide/Implementations: Difference between revisions

From JookWiki
(Add category)
(Add quick introduction)
Line 1: Line 1:
TODO: organize
This page is my attempt to document my research on Unicode string implementations supported in various languages and software.


== Classifications ==
- bytestrings (applies to most?)
- bytestrings (applies to most?)


Line 60: Line 61:


- ICU
- ICU
*C and C++
 
*Python 2
C and C++
*Lua
 
*PHP (ignoring mbstring)
Python 2
*POSIX APIs
 
*Windows narrow APIs
Lua
*DOS APIs
 
PHP (ignoring mbstring)
 
POSIX APIs
 
Windows narrow APIs
 
DOS APIs
[[Category:Research]]
[[Category:Research]]

Revision as of 06:58, 19 March 2022

This page is my attempt to document my research on Unicode string implementations supported in various languages and software.

Classifications

- bytestrings (applies to most?)

- bare encoding/runes (java, windows, wchar, rust, go, javascript, ruby, kotlin, zig, elixir)

- codepoint based (python, haskell, perl, tcl)

- grapheme-based (swift, raku) which lets you convert a string to codepoints?

- normalized (raku)

research and categorize the following:

- filesystem/OS APIs being broken?

- surrogates in valid strings (python utf8b)

- bytes in strings? (utf8-c8, utf8b)

- string apis? (encoding/decoding, code points, normalization, graphenes, segmentation, ordering, comparing, breaking, case folding, finding, regex)

- bytestrings

- wchar

- windows

- rust

- java

- swift

- go

- kotlin

- java

- python, utf8b

- tcl

- linux/unix

- javascript

- perl

- ruby

- zig

- raku

- haskell

- elixir

- ICU

C and C++

Python 2

Lua

PHP (ignoring mbstring)

POSIX APIs

Windows narrow APIs

DOS APIs