[Grace-core] Unicode
Andrew P. Black
black at cs.pdx.edu
Tue May 17 12:55:03 PDT 2011
Michael,
Grace programs are sequences of Unicode code-points. For interchange between computers, any Grace implementation should be able to convert such a program into a UTF-8 stream, and convert a UTF-8 stream into a Grace program. If the implementation also chooses to support additional formats, that is, of course, just fine.
But that's not quite the end of the story.
On 13 May 2011, at 16:18 , Michael Homer wrote:
>> Maybe the real question hiding here is: when comparing method names for equality in the
>> dispatch mechanism, should the names be normalized first? The answer to this is, perhaps,
>> yes, whereas for string literals it seems clearly no: the programmer gets what the programmer
>> puts.
>
> That was my instinct, but the inconsistency there is what prompted the
> question. Implicit normalisation for strings seems clearly the wrong
> thing
Yes, I agree that it's clearly the wrong thing. If a programmer takes care to put something into a string, then that's what should be in the string. One thing that's missing from my grammar is Unicode escapes.
> allowing
> distinct-but-identical method names seems like asking for obfuscation
> and error.
I agree again. Another question, which you don't ask, but I've had to deal with in my parser, is: what is the "name" of a method like
if condition then trueBlock else falseBlock
I made the decision that it should be
if()then()else()
but this isn't written down anywhere (other than here!) I did think of using _ to mark the position of parameters, but we might conceivably use _ as a valid identifier character. (Because we will eventually need to call foreign functions, we would be wise to do what F# does and allow quoted forms for method names.)
So, somewhere in the spec there needs to be an explanation of when one method name is equal to another, because the semantics of method dispatch depends on that. Normalization of the character strings, quoting, and marking the position of arguments are all part of that. I suggest that you take a shot at writing it all down.
> Having different behaviour for the two, or different
> canonicalisation for the source code depending on syntactic position,
> seems confusing. It may still be right, though
I dont't see any inconsistency in doing normalization of method names, and in replacing arguments with (), but in not doing the same thing for Strings. Quoted strings are quoted exactly because they are subject to different rules from the rest of the program. String escapes like \r work in Strings, but not in other places. I don't think that's inconsistent; it's just different.
Andrew
>
> h.
More information about the Grace-core
mailing list