Unicode guide/Implementations: Difference between revisions

From JookWiki
(Add classifications)
(Add languages and implementations)
Line 18: Line 18:
* How locale tailoring is done
* How locale tailoring is done


- bare encoding/runes (java, windows, wchar, rust, go, javascript, ruby, kotlin, zig, elixir)
== C ==
- C++ too?


- codepoint based (python, haskell, perl, tcl)
== D ==


- grapheme-based (swift, raku) which lets you convert a string to codepoints?
== POSIX ==


- normalized (raku)
== DOS ==


- bytestrings
== Windows ==
- narrow APIs


- wchar
- wide APIs


- windows
== Rust ==


- rust
== Java ==


- java
== Swift ==


- swift
== Go ==


- go
== Kotlin ==


- kotlin
== Python ==
python 2


- java
python 3


- python, utf8b
== Tcl ==


- tcl
== Lua ==


- linux/unix
== Squirrel ==


- javascript
== Perl ==


- perl
== Ruby ==


- ruby
== Zig ==


- zig
== Elixir ==
- erlang too?


- raku
== Raku ==


- haskell
== Haskell ==


- elixir
== PHP ==
- narrow APIs


- ICU
- mbstring


C and C++
== JavaScript ==
 
Python 2
 
Lua
 
PHP (ignoring mbstring)
 
POSIX APIs
 
Windows narrow APIs
 
DOS APIs
 
squirrel
[[Category:Research]]
[[Category:Research]]

Revision as of 21:52, 19 March 2022

This page is my attempt to document my research on Unicode string implementations supported in various languages and software.

Classifications

Here's a quick list of things I'll be classifying:

  • Bytestring support
  • Internal encoding
  • String encoding
  • Character type
  • OS API encoding/type
  • Supports bytes in strings
  • Can encode/decode to other encodings
  • How breaking by code points, graphene, words, paragraphs, etc is done
  • How ordering works
  • How upper/lower/folding case works
  • How finding works
  • How regex works
  • How locale tailoring is done

C

- C++ too?

D

POSIX

DOS

Windows

- narrow APIs

- wide APIs

Rust

Java

Swift

Go

Kotlin

Python

python 2

python 3

Tcl

Lua

Squirrel

Perl

Ruby

Zig

Elixir

- erlang too?

Raku

Haskell

PHP

- narrow APIs

- mbstring

JavaScript