Latest revision |
Your text |
Line 180: |
Line 180: |
| == Go == | | == Go == |
| Go has a kind of unopinionated take here. | | Go has a kind of unopinionated take here. |
| *Character type: 32-bit integer, Unicode code point | | *Character type: 32-bit integer |
| * Byte strings: Yes | | * Byte strings: Yes |
| * Internal encoding: None | | * Internal encoding: None |
Line 207: |
Line 207: |
| Ruby hasn't really had a major refactoring for Unicode like contemporary languages. | | Ruby hasn't really had a major refactoring for Unicode like contemporary languages. |
|
| |
|
| *Character type: Arbitrarily large integers, unspecified character set | | *Character type: Arbitrarily large integers |
| * Byte strings: Yes | | * Byte strings: Yes |
| * Internal encoding: None | | * Internal encoding: None |
Line 223: |
Line 223: |
| * Classifies by: Unicode properties | | * Classifies by: Unicode properties |
| * Collates by: Doesn't provide an API for this | | * Collates by: Doesn't provide an API for this |
| * Converts case by: Unicode properties with Turkic language support | | * Converts case by: Unicode properties if requested |
| * Locale tailoring is done by: Doesn't provide an API for this | | * Locale tailoring is done by: Doesn't provide an API for this |
| * Wraps operating system APIs with Unicode ones: No | | * Wraps operating system APIs with Unicode ones: No |
Line 230: |
Line 230: |
|
| |
|
| == Erlang == | | == Erlang == |
| Erlang is kind of an alien language compared to all the above.
| | ?? |
| | |
| *Character type: Integer, Unicode code point
| |
| * Byte strings: Yes
| |
| * Internal encoding: I don't know
| |
| * String encoding: A mix of integers or UTF-8 strings
| |
| * Supports bytes in strings: Yes
| |
| * Supports surrogates in strings: Yes
| |
| * Supports invalid code units in strings: Yes
| |
| * Supports normalizing strings: Yes
| |
| * Supports querying character properties: No
| |
| * Supports breaking by code point: Yes
| |
| * Supports breaking by extended grapheme cluster: Yes
| |
| * Supports breaking by text boundaries: No
| |
| * Supports encoding and decoding to other encodings: Yes
| |
| * Supports Unicode regex extensions: Yes
| |
| * Classifies by: Doesn't provide an API for this
| |
| * Collates by: Doesn't provide an API for this
| |
| * Converts case by: I don't know
| |
| * Locale tailoring is done by: Doesn't provide an API for this
| |
| * Wraps operating system APIs with Unicode ones: No
| |
| | |
| Unicode support seems fairly limited and confusing here.
| |
|
| |
|
| == Raku == | | == Raku == |
| Raku seems to have had the most thought put in to its Unicode support.
| | ?? |
| | |
| *Character type: 32-bit integer
| |
| * Byte strings: Yes
| |
| * Internal encoding: A mix of signed integers of various sizes
| |
| * String encoding: Normalized grapheme clusters
| |
| * Supports bytes in strings: Yes
| |
| * Supports surrogates in strings: I don't know
| |
| * Supports invalid code units in strings: I don't know
| |
| * Supports normalizing strings: Yes
| |
| * Supports querying character properties: Yes
| |
| * Supports breaking by code point: Yes
| |
| * Supports breaking by extended grapheme cluster: Yes
| |
| * Supports breaking by text boundaries: No
| |
| * Supports encoding and decoding to other encodings: Yes
| |
| * Supports Unicode regex extensions: Yes
| |
| * Classifies by: Unicode properties
| |
| * Collates by: Unicode properties
| |
| * Converts case by: I don't know
| |
| * Locale tailoring is done by: Doesn't provide an API for this
| |
| * Wraps operating system APIs with Unicode ones: Yes, with UTF-C8 to escape bytes
| |
| | |
| Interesting it doesn't contain wrappers like isalpha, isupper, etc.
| |
| [[Category:Research]] | | [[Category:Research]] |