Unicode guide/Implementations: Difference between revisions

Revision as of 21:52, 19 March 2022

This page is my attempt to document my research on Unicode string implementations supported in various languages and software.

Here's a quick list of things I'll be classifying:

- C++ too?

- narrow APIs

- wide APIs

python 2

python 3

- erlang too?

- narrow APIs

- mbstring

@@ Line 18: / Line 18: @@
 * How locale tailoring is done
-- bare encoding/runes (java, windows, wchar, rust, go, javascript, ruby, kotlin, zig, elixir)
+== C ==
+- C++ too?
-- codepoint based (python, haskell, perl, tcl)
+== D ==
-- grapheme-based (swift, raku) which lets you convert a string to codepoints?
+== POSIX ==
-- normalized (raku)
+== DOS ==
-- bytestrings
+== Windows ==
+- narrow APIs
-- wchar
+- wide APIs
-- windows
+== Rust ==
-- rust
+== Java ==
-- java
+== Swift ==
-- swift
+== Go ==
-- go
+== Kotlin ==
-- kotlin
+== Python ==
+python 2
-- java
+python 3
-- python, utf8b
+== Tcl ==
-- tcl
+== Lua ==
-- linux/unix
+== Squirrel ==
-- javascript
+== Perl ==
-- perl
+== Ruby ==
-- ruby
+== Zig ==
-- zig
+== Elixir ==
+- erlang too?
-- raku
+== Raku ==
-- haskell
+== Haskell ==
-- elixir
+== PHP ==
+- narrow APIs
-- ICU
+- mbstring
-C and C++
+== JavaScript ==
-Python 2
-Lua
-PHP (ignoring mbstring)
-POSIX APIs
-Windows narrow APIs
-DOS APIs
-squirrel
 [[Category:Research]]