[Grace-core] Encoding classes

Tue Jul 19 10:08:06 PDT 2011

On 19 Jul 2011, at 0:41, Kim Bruce wrote:

> On Jul 18, 2011, at 11:26 PM, Andrew P. Black wrote:
> 
>> I'm never sure what you mean when you say "encoding classes as objects".   Are you referring to the choice we made to give the class syntax a meaning as a nested object syntax, rather than adding another basic construct to the language?   
> 
> That is correct.  The proposal we have been dealing with has been to make classes a derived construct so that it need not be a primitive in the language.  
> 
>> I've never considered this to be controversial, since it's been working well in Smalltalk for 30 years — and also in Emerald, but no one uses Emerald, so maybe it doesn't count :-)  It's also working in Self and NewtonScript and JavaScript, but none of those languages has a static type system.   So is the issue that you are concerned about whether there can be a sound static type system for an object that creates another object?
> 
> An object that creates another object is not an issue.  The issue is creating another object and simultaneously providing the basis for inheritance -- which is what classes do.  The particular problem is whether this can be done in a statically typed manner where we have information hiding for protected methods.  The claim is that the information about protected methods (or instance variables if they were not entirely hidden) must leak out of the representation, and thus into the type system, breaking the desired encapsulation.
>> 
...

OK, so my understanding of what Grace does right now is that Objects, not Classes, are the basis for inheritance.  An OBJECT extends another OBJECT.   This is the essence of the Taivalsaari proposal, and it's realization in Grace with the extends keyword.

It works because the representation of a method in an object is actually the generator of the method, rather than the method itself.  The generators are "fixed" at method execution time, by parameterizing them with the correct value of "self".  Another way of looking at this is to realize that the representation of the methods in an object is _exactly the same_ as the representation in a class — shared code with self as a parameter.

You are right that few statically-typed languages do object inheritance.  Clearly, if the object A that B extends can change dynamically, as it can in Self, then all static typing is off.   Moreover, if the type of the object A is not known statically, then we can't know the type of B.  Hence, we have two restrictions on extends:

	(1) The object being extended is fixed once-and-for-all at extension time (creation of the sub-object).
	(2) The type of the object being extended must be manifest.  

We were just having a discussion of what (2) would mean.  Clearly, a class.new expression would qualify.  So would a factory.makeOne expression  for any programmer-defined factory where makeOne is a method whose body is a manifest object constructor, or (transitively), a manifest object constructor that extends another object.    We can add more clauses to this definition if we need to, but my experience with Emerald leads me to believe that these two are enough for all the programs that programmers are likely to want to write.

>> Are you talking about what a programmer would write, or what some type theoretician would use to convince him or herself that such a program is OK?   If it's what the programmer would write, then there is a serious problem: how do we ensure that the mkDiffPt method is in fact the fixed point of the mkDiffPt' method?
> 
> That is exactly my point.  Programmers cannot write such stuff accurately.  My argument is that no human programmer would be willing to write this.  

My point is that with Grace as currently designed, no human programmer would have to write this.  The fixed-pooint-taking would be for the semanticists only.

> As a result, programmers would inevitably write it with the class construct.  I would further argue that given the need for such a complex encoding, we would be better off just making "class" a language primitive so we can give error messages appropriate for the way the class would be written by the programmer, rather than providing error messages that might explain errors in terms of the encoding.
>> 

I believe that the semantics will have to involve fixed-points of generators whatever the surface language does.   I certainly agree that error messages must be given in terms of what the programmer writes, not of some translation.

>> I'm not sure what you are referring to when you say "painful re-encoding".  What the semantics functions do to give meaning to a class defined by inheritance?  Or what a programmer would write? 
> 
> What a programmer would have to write if they wanted to avoid using "class" and instead write it in terms of objects.
>> 
>>> could be made to go away by defining an "extends" operator that does a lot of this gluing together, but even if that is the case, we're still looking at something quite complex, with the "shadow" copies of all methods (hidden and public) showing up in the object encoding.

The extends operation is exactly the solution.  It must be part of Grace (rather than being defined in Grace, because it introspects on the definition of methods and extracts method generators form methods, something that can't be done with mathematical functions.   If you like, you can think of the implementation's representation of a method as being the pair <method generator, method>: extends works on the first element of the pair, and execution works with the second.  

>> 
>> Classes in Emerald were just objects with a "create" method, so I'm pretty confident that this is not a problem.  What Emerald lacked with inheritance.  Are you saying that there is some problem with the "shallow copy" semantics that makes it impossible to formalizze in a type-safe way?
> 
> Inheritance is exactly the problem here.  As I stated above, there is no difficulty with object generators or factories.

I certainly agree that we have to work out the details of typing extends.   I haven't yet done it, and we need to convince ourselves that it can be done.  (I was hoping that our types expert would do it!)

> Inheritance plus information hiding with protected methods is what makes getting the typing so hard.

As I have said before, I don't think that information hiding has anything to do with types in an OO language.  That's where OO languages and ADT languages part company.   Our proposal for information hiding (hidden methods) depended on restricting who could request a hidden method; it works without types, and is a syntactic restriction on the form of method requests: they must be to self or super.

Our language and our type system will be simplified if we remember to keep these things separate.

>>> 
>>> Bad news:  No one in their right mind would want to write out classes by hand in this way.  (Can you imagine how helpful the error messages would be if your wrote it with this encoding and made a slight mistake?)
>> 
>> Again, I'm lost.  Are you really proposing that programmers have to write both the generators and their fixed points explicitly?  Surely the whole point (of OO Languages) is that given the generator (in programmer speak, the late-bound self), the fixed point (in programmer speak, the method with a bound self) can be constructed automatically?
> 
> Again, my point is that no programmer would or should be expected to do this encoding by hand.  Instead they would/should use the class construct.  I don't have a problem saying that classes can be defined away in a type-safe way using the encoding presented above.  I do have a problem in expecting programmers to do that.  Moreover, the encoding that we've been talking about cannot (as far as I can tell -- and lots of smart people have worked on this for a long time) by statically typed in a way that provides correct static typing -- and support protected methods and inheritance.

Looks like I'll have to work on this and see if it's as hard as you say!  But, once again, I'll leave hidden methods out of it!   (Incidentally, we should decide what to call them; I don't want to use "private methods" because that means something else in other languages.  Protected methods is a better name; I've been using "hidden methods" in this email.  

>> 
>> I don't understand your proposal.  If classes are not objects, then they need to be something else.  What?  Functions?  That's what they look like in Scala and in Fortress: they are things that can be parameterized and applied; doing so has an effect, and produces and object.   Would they be the ONLY way of making a function, or would there be a general-prurpose function creating construct?  How would this simplify the type system?
> 
> My proposal would be to have classes as primitives that have operations that can be used to construct new objects and can be extended to subclasses in ways that preserve information hiding.  While theoretically this can be semantically defined away by using objects only, this has no impact on the language design -- in particular we will write error messages in terms of the class construct rather than the generated objects (assuming they are implemented that way).

Well, we have classes as primitives right now.   My concern is that when classes do the wrong thing (as in my example below), what is the path to generalizing them?    Not getting this right is a great failing in many languages.   The extends primitive on objects is our way of providing the generalization path.   The reasons for explaining classes in terms of objects is for us as language designers: to make sure that the semantics are regular, and that there is no discontinuity when the programmer switches from one form to the other.  It wasn't to make programmers write explicit fixed-points.

> We know we can write the semantics of OO languages in terms of the (polymorphic) lambda calculus (e.g., as in my book) or in terms of object primitives (see Abadi and Cardelli).  So, yes we could write the semantics of classes as objects exactly as described above -- and write functions as objects with apply methods as well.  Moreover, there is no difficulty in having programmers write objects that just serve as factories for creating new objects -- those are easy as long as we don't also want to later be able to create subclasses.
>> 
>> Fortress, in effect, has real functions and classes-as-functions.  The idea was that if you wanted a factory that did something different from the one that you got from the class construct, you could write it as a factory function, and the client would not see a change in the interface.  After I had been working on Fortress for a couple of weeks, I realized that I needed to do exactly that.  IIRC, I had a CatString class that took two parameters, both strings, and generated a new CatString object that represented the concatenation of the two strings.   Instead, I wanted a CatString function that might or might not generate a CatString object, depending on the representations of the arguments and their lengths. (e.g., if either string argument was the empty string, there was not need to concatenate anything; if both were short base strings, then the fastest thing to do was to copy the characters into a new, longer base string; etc.)
>> 
>> So I went around the group asking how to make the transformation from CatString as a class to CatString as a function.  It turned out that no one knew how to do it.   We figured it out, eventually; we had to rename the class CatString to something else, which unfortunately had the side effect of also renaming the TYPE CatString, which WAS a change visible to clients.   
> 
> [Yet another reason to separate types from classes?]
> 
>> Perhaps there was a way of creating a type alias ...  Suffice it to say, it was very complicated.  Since I didn't actually have any users outside of the immediate group, we just changed the interface.
>> 
>> It was to avoid this sort of complexity that I originally suggested that classes just be objects, so that if the new method needed to do something other than creating a new object, it could just be written down as a method, and if new wasn't the write name, it could just be changed.  
>> 
>> So, to summarize, I'm aware that giving a semantics for inheritance means treating object descriptions as generators and objects as their fixed-points.  I'm not sure what the problem is that you are highlighting here.   Defining the extends operation?  Typing the extends operation?    I'm also not aware of any way of avoiding the need to deal in generators and fixed-points, regardless of the object model chosen for the language.
> 
> The bottom line of my argument is that the only type-safe ways that I know to encode classes as objects (preserving information hiding, generating objects, and supporting inheritance) is too complex for a programmer to write.  Given that, we should make "class" a primitive of the language rather than just a convenient abbreviation for the object encoding.

What's not type-safe about the way that Smalltalk (Strongtalk) does it?
> 
> To make an analogy, we know that we can write everything in a functional language in terms of the lambda calculus -- and one could argue that some LISP/Scheme programmers come close to that.  However, most functional languages provide many other primitives (e.g., data types like lists, pattern matching, etc.) to make programming simpler and more efficient. 
> 
> All I'm saying is that we should do the same for classes.  Underneath the covers, everything may be functions or may be objects, but that's not particularly relevant to the programmer.

It's relevant to the programmer when the canned class construct proves too restrictive, and she needs to move to something more general.  I believe that move should be a small sideways step, and not require a systemic cross-module refactoring.
> 
> If, on the other hand, someone comes up with a correct type-safe encoding for classes that is something that a programmer would find just as easy as classes, then I'm willing to reconsider.

In Grace today, the programmer sees classes and objects.   They are not are of any "encoding".   The "encoding" comes in when we write the semantics of the language: there is less work for us to do, because we need to give teh semantics to a smaller core.  The cost is that we do have to define extends on objects, which Java doesn't have to do.

	Andrew