[Grace-core] Strings

Michael Homer mwh at ecs.vuw.ac.nz
Mon Jan 26 16:09:59 PST 2015


Strings should represent text for a human being.

The behaviour of strings could be several things. They might act as:
- Sequences of bytes, as in C, which are not text.
- Sequences of code units, as in Java, which is indefensible.
- Sequences of codepoints, as in Go, which are meaningless to users.

What they should be is borrowed from Apple's Swift: sequences of
perceived characters (i.e., grapheme clusters), behaving as expected
by a human being. Equality should be defined over canonicalised forms
(NFC or NFD), again best corresponding to reader equality.

Obtaining code points, code units, and bytes should be the role of
encodings, not of text, because those are things made for computers
and not for people.

* Swift strings:
<https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html#//apple_ref/doc/uid/TP40014097-CH7-ID296>

* Grapheme clusters:
<http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries>

* How various languages implement Unicode behaviours:
<https://www.azabani.com/pages/gbu/#slide7>. "Named characters" is
"\N{SNOWMAN}", "\N{GREEK SMALL LETTER PI}".
-Michael


More information about the Grace-core mailing list