[Grace-core] Encoding classes

Tue Jul 19 00:41:37 PDT 2011

On Jul 18, 2011, at 11:26 PM, Andrew P. Black wrote:

> 
> On 15 Jul 2011, at 14:34, Kim Bruce wrote:
> 
>> [THIS IS LONG BUT IMPORTANT -- PLEASE READ!]
>> 
>> This conversation led me to take a closer look to see what prior research has done with encoding classes as objects.  
> 
> I'm never sure what you mean when you say "encoding classes as objects".   Are you referring to the choice we made to give the class syntax a meaning as a nested object syntax, rather than adding another basic construct to the language?   

That is correct.  The proposal we have been dealing with has been to make classes a derived construct so that it need not be a primitive in the language.  

> I've never considered this to be controversial, since it's been working well in Smalltalk for 30 years — and also in Emerald, but no one uses Emerald, so maybe it doesn't count :-)  It's also working in Self and NewtonScript and JavaScript, but none of those languages has a static type system.   So is the issue that you are concerned about whether there can be a sound static type system for an object that creates another object?

An object that creates another object is not an issue.  The issue is creating another object and simultaneously providing the basis for inheritance -- which is what classes do.  The particular problem is whether this can be done in a statically typed manner where we have information hiding for protected methods.  The claim is that the information about protected methods (or instance variables if they were not entirely hidden) must leak out of the representation, and thus into the type system, breaking the desired encapsulation.
> 
>> Not surprisingly, the definitive research is by Abadi and Cardelli, and is explained in section 8.5 of their book, "A theory of objects".
>> 
>> Let's work with yesterday's example where we want pair objects with type:
>> 
>> type Pair {mkDiffPt(dx:Number,dy:Number) -> Pair}
>> 
>> Let's suppose we want a class that uses a hidden method called helper with type A -> B.  If all we want to do is instantiate objects then the encoding with just a method "new" works fine.  
>> 
>> However, if we want to inherit, we have to do quite a bit more.  The difficulty is that an object is a fixed point, where self represents the entire object.  A class can be seen as a generator of that fixed point.  When we define a subclass, we are actually extending the generator of the fixed point (object), not the fixed point (object) itself.  
>> 
>> Thus we need to keep hold of the methods before self has been instantiated as a fixed point.  As a result our "class object" will not only need a "new" method, but it will also need separate copies of each of the methods (public and protected) that have an explicit self parameter. 
> 
> I'm with you so far — this is indeed the core of the Taivalsaari "encoding" of inheritance as copy.  Methods in an OO language are never extensional: they always have an explicit self, which is late-bound to create the fixed-point.
> 
>> Thus the encoding of the class as an object will look as follows (where I've left out all code that actually does anything!)
>> 
>> const PairClass = object {
>> 	method new:(x:Number, y: Number) -> Pair {
>> 		object{
>> 			var x: Number
>> 			var y: Number
>> 			method mkDiffPt(dx:Number,dy:Number) -> Pair { ...  helper ... }
>> 			<private> method helper(a:A) -> B {...}
>> 		}
>> 	}
> 
> Yes, that's fine, although it's clearer, I think, if you give the objects names, and regard object as a name binder:
> 
> const PairClass = object ft {
> 	method new:(x:Number, y: Number) -> Pair {
> 		object pt{
> 			var x: Number
> 			var y: Number
> 			method mkDiffPt(dx:Number,dy:Number) -> Pair { ...  pt.helper ... }
> 			<private> method helper(a:A) -> B {...}
> 		}
> 	}
> 
> Think of object as a lambda or a mu binder: if you want to use the object, you get the mu binding, and if you want to inherit from it, you want the lambda binding.
> 
>> 
>> 	// returns a closure to be used later in the body of a method mkDiffPt to be inserted in a subclass
>> 	method mkDiffPt'(s:Point)-> { (dx:Number,dy:Number) -> Pair} {
>> 		{(dx:Number,dy:Number) -> Pair {.....  helper ... }}
>> 
>> 	// same idea for helper
>> 	method helper'(s:Point)-> { (a:A) -> B} {
>> 		{...} }
>> }
> 
> Now you have lost me.   Every OO language that I know of makes the taking of the fixed point of object generators implicit.   This may not be good theoretically, but it's established practice programmatically.   I don't believe that you are suggesting that the lambda and the mu version be declared separately!

That is what Abadi and Cardelli are forced to do to get their encoding.  If it could have been obtained simpler, I'm sure they would have done that.
> 
>> 
>> You will notice that the type of PairClass is given by
>> 
>> type PairClassType {
>> 	new: (x:Number, y: Number) -> Pair
>> 	mkDiffPt': (s:Point)-> { (dx:Number,dy:Number) -> Pair}
>> 	helper'(s:Point)-> { (a:A) -> B}
>> }
> 
> No, I don't see that at all.

That's just the typing for the class PairClass written above.  If you buy that definition (and you seem not to), then you have to buy that type.  
> 
>> I won't write out the entire encoding in detail for you here, but to construct a subclass of PairClass, you add in any new methods (explicitly parameterized by self -- as with the last two methods above) and you then replace the inherited "new" method by one that returns an object that is formed by assembling together each of the inherited "primed" methods (each applied to the "self" of this new object) and adding on and/or replacing methods given by the subclass.  
> 
> Are you talking about what a programmer would write, or what some type theoretician would use to convince him or herself that such a program is OK?   If it's what the programmer would write, then there is a serious problem: how do we ensure that the mkDiffPt method is in fact the fixed point of the mkDiffPt' method?

That is exactly my point.  Programmers cannot write such stuff accurately.  My argument is that no human programmer would be willing to write this.  As a result, programmers would inevitably write it with the class construct.  I would further argue that given the need for such a complex encoding, we would be better off just making "class" a language primitive so we can give error messages appropriate for the way the class would be written by the programmer, rather than providing error messages that might explain errors in terms of the encoding.
> 
>> 
>> Here is an example of part of what would have to be done to create the update "new" method for a subclass adding a color field
>> 
>> const ColorPairClass = object {
>> 	method new:(x:Number, y: Number,c:Color) -> ColorPair {
>> 		object{
>> 		   var x: Number   // the x,y would not actually appear as they would be replaced by getter
>> 		   var y: Number   // and setter methods but let's ignore that here
>> 		   var c: Color
>> 			
>> 		   method mkDiffPt(dx:Number,dy:Number) -> Pair {PairClass.mkDiffPt'(self).app(dx,dy) }
> 
> If I interpret this correctly, you are observing that mkDiffPt in a subclass must depend not on mkDiffPt in the superclass, but on the generator of MkDiffPr.  Yes, I agree that this is the case.  Which is why all OO languages represent methods as generators (= code with an unbound self pointer that will be bound later, to take the fixed point).  An inheriting object will need to take the fixed point (= bind self) AFTER the inheritance operation, not before.   
> 
> The way this is handled in all OO languages that I know of is to not to actually bind self until run-time.  There are many pragmatic advantages to doing this, not least amongst them being that all objects with the same code can in fact share one copy of the machine- or byte-code, even though conceptually their methods are different (since each has a different self).

I agree.

> 
>> 		   <private> method helper(a:A) -> B {...}
>> 		}
>> 	}
>> 
>> 	method mkDiffPt': (s:Point)-> { (dx:Number,dy:Number) -> Pair} {...}
>> 	method helper'(s:Point)-> { (a:A) -> B} {...}
>> 	... // other methods corresponding to new methods of subclass
>> 
>> }
>> 
>> 
>> I leave it as an exercise for the Grace programmer to encode this new subclass properly as an object using only legal Grace code.  (To make it simple, just add a method that returns the sum of the two numbers in the pair.)
>> 
>> I suspect that some of this painful re-encoding of the new method
> 
> I'm not sure what you are referring to when you say "painful re-encoding".  What the semantics functions do to give meaning to a class defined by inheritance?  Or what a programmer would write? 

What a programmer would have to write if they wanted to avoid using "class" and instead write it in terms of objects.
> 
>> could be made to go away by defining an "extends" operator that does a lot of this gluing together, but even if that is the case, we're still looking at something quite complex, with the "shadow" copies of all methods (hidden and public) showing up in the object encoding.
>> 
>> So we have both good news and bad news:
>> 
>> Good news:  Classes can be encoded as objects in ways that respect the type system.  
>> 
>> (Caveat -- I didn't write out all the details of the encoding and I'm not even sure how to write a method that returns a closure -- would writing it with {{ ... }} be proper?)
> 
> Classes in Emerald were just objects with a "create" method, so I'm pretty confident that this is not a problem.  What Emerald lacked with inheritance.  Are you saying that there is some problem with the "shallow copy" semantics that makes it impossible to formalizze in a type-safe way?

Inheritance is exactly the problem here.  As I stated above, there is no difficulty with object generators or factories.

Inheritance plus information hiding with protected methods is what makes getting the typing so hard.
> 
>> 
>> Bad news:  No one in their right mind would want to write out classes by hand in this way.  (Can you imagine how helpful the error messages would be if your wrote it with this encoding and made a slight mistake?)
> 
> Again, I'm lost.  Are you really proposing that programmers have to write both the generators and their fixed points explicitly?  Surely the whole point (of OO Languages) is that given the generator (in programmer speak, the late-bound self), the fixed point (in programmer speak, the method with a bound self) can be constructed automatically?

Again, my point is that no programmer would or should be expected to do this encoding by hand.  Instead they would/should use the class construct.  I don't have a problem saying that classes can be defined away in a type-safe way using the encoding presented above.  I do have a problem in expecting programmers to do that.  Moreover, the encoding that we've been talking about cannot (as far as I can tell -- and lots of smart people have worked on this for a long time) by statically typed in a way that provides correct static typing -- and support protected methods and inheritance.
> 
>> 
>> To get a properly-type encoding of classes as objects in their object calculus takes a fair amount of work, and is significantly more complex than the one we've been talking about.
>> 
>> There may be an alternative type-safe way of doing this encoding (there are a few simple modifications that I see, but they don't simplify things much), but I suspect that overall they will be of the same complexity as this.
>> 
>> So, if we want to continue to treat classes as encodings we need to:
>> 
>> 1.  Work out all the gory details of any proposal to ensure that it works in a type-safe way.
>> 
>> 2.  Convince ourselves that the results is (nearly as) understandable as the more traditional way of handling classes.
>> 
>> Unless we can do this, I highly recommend that we scrub the idea of encoding classes as objects and just have classes as primitives.  (Of course it is trivial to write objects as encodings using classes -- just write the obvious class with parameterless constructor and invoke new.)
> 
> I don't understand your proposal.  If classes are not objects, then they need to be something else.  What?  Functions?  That's what they look like in Scala and in Fortress: they are things that can be parameterized and applied; doing so has an effect, and produces and object.   Would they be the ONLY way of making a function, or would there be a general-prurpose function creating construct?  How would this simplify the type system?

My proposal would be to have classes as primitives that have operations that can be used to construct new objects and can be extended to subclasses in ways that preserve information hiding.  While theoretically this can be semantically defined away by using objects only, this has no impact on the language design -- in particular we will write error messages in terms of the class construct rather than the generated objects (assuming they are implemented that way).

We know we can write the semantics of OO languages in terms of the (polymorphic) lambda calculus (e.g., as in my book) or in terms of object primitives (see Abadi and Cardelli).  So, yes we could write the semantics of classes as objects exactly as described above -- and write functions as objects with apply methods as well.  Moreover, there is no difficulty in having programmers write objects that just serve as factories for creating new objects -- those are easy as long as we don't also want to later be able to create subclasses.
> 
> Fortress, in effect, has real functions and classes-as-functions.  The idea was that if you wanted a factory that did something different from the one that you got from the class construct, you could write it as a factory function, and the client would not see a change in the interface.  After I had been working on Fortress for a couple of weeks, I realized that I needed to do exactly that.  IIRC, I had a CatString class that took two parameters, both strings, and generated a new CatString object that represented the concatenation of the two strings.   Instead, I wanted a CatString function that might or might not generate a CatString object, depending on the representations of the arguments and their lengths. (e.g., if either string argument was the empty string, there was not need to concatenate anything; if both were short base strings, then the fastest thing to do was to copy the characters into a new, longer base string; etc.)
> 
> So I went around the group asking how to make the transformation from CatString as a class to CatString as a function.  It turned out that no one knew how to do it.   We figured it out, eventually; we had to rename the class CatString to something else, which unfortunately had the side effect of also renaming the TYPE CatString, which WAS a change visible to clients.   

[Yet another reason to separate types from classes?]

> Perhaps there was a way of creating a type alias ...  Suffice it to say, it was very complicated.  Since I didn't actually have any users outside of the immediate group, we just changed the interface.
> 
> It was to avoid this sort of complexity that I originally suggested that classes just be objects, so that if the new method needed to do something other than creating a new object, it could just be written down as a method, and if new wasn't the write name, it could just be changed.  
> 
> So, to summarize, I'm aware that giving a semantics for inheritance means treating object descriptions as generators and objects as their fixed-points.  I'm not sure what the problem is that you are highlighting here.   Defining the extends operation?  Typing the extends operation?    I'm also not aware of any way of avoiding the need to deal in generators and fixed-points, regardless of the object model chosen for the language.

The bottom line of my argument is that the only type-safe ways that I know to encode classes as objects (preserving information hiding, generating objects, and supporting inheritance) is too complex for a programmer to write.  Given that, we should make "class" a primitive of the language rather than just a convenient abbreviation for the object encoding.

To make an analogy, we know that we can write everything in a functional language in terms of the lambda calculus -- and one could argue that some LISP/Scheme programmers come close to that.  However, most functional languages provide many other primitives (e.g., data types like lists, pattern matching, etc.) to make programming simpler and more efficient. 

All I'm saying is that we should do the same for classes.  Underneath the covers, everything may be functions or may be objects, but that's not particularly relevant to the programmer.

If, on the other hand, someone comes up with a correct type-safe encoding for classes that is something that a programmer would find just as easy as classes, then I'm willing to reconsider.

	Kim

> 
> 	Andrew
> 
> 
> _______________________________________________
> Grace-core mailing list
> Grace-core at cecs.pdx.edu
> https://mailhost.cecs.pdx.edu/mailman/listinfo/grace-core