[Grace-core] Typing of Number
Michael Homer
mwh at ecs.vuw.ac.nz
Mon Jun 20 16:35:16 PDT 2011
On Tue, Jun 21, 2011 at 9:02 AM, James Noble <kjx at ecs.vuw.ac.nz> wrote:
>> Also, Michael's lament:
>>
>>> with the "once inexact,
>>> always inexact" behaviour you'd [often] end up with "everything inexact",
>>> since you could never tell if you'd introduced an inexact type
>>> somewhere along the line (perhaps from an argument).
>>
>> This sounds a bit like wishing that π were 3. That _would_ make everything simpler ... but it just ain't so.
>
> This is also an argument that the static type system should be able to track this distinction.
Well, right. It is valid to say that it just can't, but then the
numeric types don't achieve very much.
If you want to define only Rational + Rational, Binary64 + Binary64,
..., then the type probably really needs to be something like:
type Number<T> {
+(other : Number<T>) -> Number<T>
}
Number isn't doing a lot there though. I do think that if there's a
Rational type it should be possible to constrain yourself to remaining
within it somehow. I also think Number + Number ought to work other
than in very exceptional cases, by the principle of least surprise.
If mixed-type operations are allowed, I don't see how the type system
can track "anything with Binary64 results in Binary64" without
multiple dispatch.
>> The double-dispatch will behave like a distributed typecase. If we want to actually honor our intention to allow library builders to add new representations of numbers, then we better use double-dispatch rather than a big typecase in every operation.
>
> Hmm. I thought about this and I'm not really sure it matters either way.
> Realistically you'll need to extend all of the existing numeric classes
> anyway to add a new representation.
With double dispatch the static return type is the type of the
argument (or Number), without self-type annotations and using only
covariant type parameters instead. That types, and works OK for
builtins. I'm not sure that it works so well for user-defined classes,
but that might be unavoidable. It doesn't let Rational + Binary64 and
Binary64 + Rational both return Binary64, though. Or Complex +
Rational and Rational + Complex both do the arithmetically-correct
thing.
If that is the way it goes, what's the syntax for upper type bounds?
>
>>> I can't see why we'd do more implicitly than Go does.
Because Go numbers don't have a common supertype with operations, and
all the errors are detected statically. If Number + Number isn't
defined the Number type is pretty meaningless.
>> I can't see what rule we might use to trigger implicit type conversions. Are you going to suggest one,or are we sticking to our guns for now?
>
> Some of Michael's code had a (semi-) implicit .asRational call in the middle of it I think.
> I'd prefer we stick to our guns. It should simplify things.
As it stands the specification says that the standard library will try
to coerce any Number to an integral when used as an index. Will it not
do that? If it does, what happens for Binary64?
In a purely structurally-typed world I'm not sure how the numeric
types are distinguished anyway. I can come up with methods that would
be unique to Rationals and IEEE floating-point types, but a
hypothetical Integer32 and Integer64 seem like they'd implement
exactly the same interface and be type aliases.
I don't think there is a perfect solution here, so one of the
imperfect ones will have to do. I don't know which; none of them
behave quite the way I'd expect up front.
>
>>
>>> PS: Go's identifier rules - Rob Pike probably knows as much about unicode programing languages as anyone.
>>>
>>> It was important to us to extend the space of identifiers from the confines of ASCII. Go's rule—identifier characters must be letters or digits as defined by Unicode—is simple to understand and to implement but has restrictions. Combining characters are excluded by design, for instance. Until there is an agreed external definition of what an identifier might be, plus a definition of canonicalization of identifiers that guarantees no ambiguity, it seemed better to keep combining characters out of the mix. Thus we have a simple rule that can be expanded later without breaking programs, one that avoids bugs that would surely arise from a rule that admits ambiguous identifiers.
That is essentially the rule I've implemented at the moment, I think.
It's a little ambiguous what he means there: there are "letters" that
are combining characters, and I currently do allow them, along with
modifier numbers and non-digit numbers (through lack of excluding them
explicitly, rather than particular design). I don't allow combining
symbols and diacritics. Banning combining characters altogether does
have some flow-on effects, since legitimate letters from some scripts
do not have precomposed forms and so couldn't be used at all, although
I'm not sure how much of a problem that is today. It is a simple rule
that leaves room to expand later on if required.
>>
>>
>> This seems like a reasonable rule — although we would allow APOSTROPHE (Unicode 0027) in addition to class It would exclude the tricky cases with combining character about which Michael was fretting. I'm not actually sure that a Unicode NUMBER is; does it include, for example, TAMIL NUMBER TEN (Unicode 0BF0)?
>
> I don't know either, but as Rob Pike says, it does avoid a lot of difficult cases.
> Michael - what do you think?
I think he's saying they allow only digits (Unicode category Nd
"Number, Decimal digit"). That doesn't include U+0BF0, just the digits
0-9 in various scripts.
There are two other numeric categories: Nl ("Number, Letter") contains
precomposed Roman numerals and Greek letter-numbers (and similar), and
No ("Number, Other") contains everything else, including U+0BF0 and
things like VULGAR FRACTION ONE QUARTER, superscripts, circled and
parenthesised numbers, and some other scripts with non-place-based or
non-decimal number systems (but not CJK ideographs, which are treated
as letters). All of those are regarded as numbers but only the first
set are digits.
I am also interested in what is a legitimate operator character, since
those are extensible as well. At the moment my operators are
everything in "Symbol, Mathematical" along with -, &, |, :, %, *, and
/, but that was pretty arbitrary. Those are also valid identifier
characters in method names to some degree, although I'm not sure
exactly what.
-Michael
More information about the Grace-core
mailing list