[Grace-core] Encoding classes

Mon Jul 18 23:26:47 PDT 2011

On 15 Jul 2011, at 14:34, Kim Bruce wrote:

> [THIS IS LONG BUT IMPORTANT -- PLEASE READ!]
> 
> This conversation led me to take a closer look to see what prior research has done with encoding classes as objects.  

I'm never sure what you mean when you say "encoding classes as objects".   Are you referring to the choice we made to give the class syntax a meaning as a nested object syntax, rather than adding another basic construct to the language?   I've never considered this to be controversial, since it's been working well in Smalltalk for 30 years — and also in Emerald, but no one uses Emerald, so maybe it doesn't count :-)  It's also working in Self and NewtonScript and JavaScript, but none of those languages has a static type system.   So is the issue that you are concerned about whether there can be a sound static type system for an object that creates another object?

> Not surprisingly, the definitive research is by Abadi and Cardelli, and is explained in section 8.5 of their book, "A theory of objects".
> 
> Let's work with yesterday's example where we want pair objects with type:
> 
> type Pair {mkDiffPt(dx:Number,dy:Number) -> Pair}
> 
> Let's suppose we want a class that uses a hidden method called helper with type A -> B.  If all we want to do is instantiate objects then the encoding with just a method "new" works fine.  
> 
> However, if we want to inherit, we have to do quite a bit more.  The difficulty is that an object is a fixed point, where self represents the entire object.  A class can be seen as a generator of that fixed point.  When we define a subclass, we are actually extending the generator of the fixed point (object), not the fixed point (object) itself.  
> 
> Thus we need to keep hold of the methods before self has been instantiated as a fixed point.  As a result our "class object" will not only need a "new" method, but it will also need separate copies of each of the methods (public and protected) that have an explicit self parameter. 

I'm with you so far — this is indeed the core of the Taivalsaari "encoding" of inheritance as copy.  Methods in an OO language are never extensional: they always have an explicit self, which is late-bound to create the fixed-point.

> Thus the encoding of the class as an object will look as follows (where I've left out all code that actually does anything!)
> 
> const PairClass = object {
> 	method new:(x:Number, y: Number) -> Pair {
> 		object{
> 			var x: Number
> 			var y: Number
> 			method mkDiffPt(dx:Number,dy:Number) -> Pair { ...  helper ... }
> 			<private> method helper(a:A) -> B {...}
> 		}
> 	}

Yes, that's fine, although it's clearer, I think, if you give the objects names, and regard object as a name binder:

const PairClass = object ft {
	method new:(x:Number, y: Number) -> Pair {
		object pt{
			var x: Number
			var y: Number
			method mkDiffPt(dx:Number,dy:Number) -> Pair { ...  pt.helper ... }
			<private> method helper(a:A) -> B {...}
		}
	}

Think of object as a lambda or a mu binder: if you want to use the object, you get the mu binding, and if you want to inherit from it, you want the lambda binding.

> 
> 	// returns a closure to be used later in the body of a method mkDiffPt to be inserted in a subclass
> 	method mkDiffPt'(s:Point)-> { (dx:Number,dy:Number) -> Pair} {
> 		{(dx:Number,dy:Number) -> Pair {.....  helper ... }}
> 
> 	// same idea for helper
> 	method helper'(s:Point)-> { (a:A) -> B} {
> 		{...} }
> }

Now you have lost me.   Every OO language that I know of makes the taking of the fixed point of object generators implicit.   This may not be good theoretically, but it's established practice programmatically.   I don't believe that you are suggesting that the lambda and the mu version be declared separately!

> 
> You will notice that the type of PairClass is given by
> 
> type PairClassType {
> 	new: (x:Number, y: Number) -> Pair
> 	mkDiffPt': (s:Point)-> { (dx:Number,dy:Number) -> Pair}
> 	helper'(s:Point)-> { (a:A) -> B}
> }

No, I don't see that at all.

> I won't write out the entire encoding in detail for you here, but to construct a subclass of PairClass, you add in any new methods (explicitly parameterized by self -- as with the last two methods above) and you then replace the inherited "new" method by one that returns an object that is formed by assembling together each of the inherited "primed" methods (each applied to the "self" of this new object) and adding on and/or replacing methods given by the subclass.  

Are you talking about what a programmer would write, or what some type theoretician would use to convince him or herself that such a program is OK?   If it's what the programmer would write, then there is a serious problem: how do we ensure that the mkDiffPt method is in fact the fixed point of the mkDiffPt' method?

> 
> Here is an example of part of what would have to be done to create the update "new" method for a subclass adding a color field
> 
> const ColorPairClass = object {
> 	method new:(x:Number, y: Number,c:Color) -> ColorPair {
> 		object{
> 		   var x: Number   // the x,y would not actually appear as they would be replaced by getter
> 		   var y: Number   // and setter methods but let's ignore that here
> 		   var c: Color
> 			
> 		   method mkDiffPt(dx:Number,dy:Number) -> Pair {PairClass.mkDiffPt'(self).app(dx,dy) }

If I interpret this correctly, you are observing that mkDiffPt in a subclass must depend not on mkDiffPt in the superclass, but on the generator of MkDiffPr.  Yes, I agree that this is the case.  Which is why all OO languages represent methods as generators (= code with an unbound self pointer that will be bound later, to take the fixed point).  An inheriting object will need to take the fixed point (= bind self) AFTER the inheritance operation, not before.   

The way this is handled in all OO languages that I know of is to not to actually bind self until run-time.  There are many pragmatic advantages to doing this, not least amongst them being that all objects with the same code can in fact share one copy of the machine- or byte-code, even though conceptually their methods are different (since each has a different self).

> 		   <private> method helper(a:A) -> B {...}
> 		}
> 	}
> 
> 	method mkDiffPt': (s:Point)-> { (dx:Number,dy:Number) -> Pair} {...}
> 	method helper'(s:Point)-> { (a:A) -> B} {...}
> 	... // other methods corresponding to new methods of subclass
> 
> }
> 
> 
> I leave it as an exercise for the Grace programmer to encode this new subclass properly as an object using only legal Grace code.  (To make it simple, just add a method that returns the sum of the two numbers in the pair.)
> 
> I suspect that some of this painful re-encoding of the new method

I'm not sure what you are referring to when you say "painful re-encoding".  What the semantics functions do to give meaning to a class defined by inheritance?  Or what a programmer would write? 

> could be made to go away by defining an "extends" operator that does a lot of this gluing together, but even if that is the case, we're still looking at something quite complex, with the "shadow" copies of all methods (hidden and public) showing up in the object encoding.
> 
> So we have both good news and bad news:
> 
> Good news:  Classes can be encoded as objects in ways that respect the type system.  
> 
> (Caveat -- I didn't write out all the details of the encoding and I'm not even sure how to write a method that returns a closure -- would writing it with {{ ... }} be proper?)

Classes in Emerald were just objects with a "create" method, so I'm pretty confident that this is not a problem.  What Emerald lacked with inheritance.  Are you saying that there is some problem with the "shallow copy" semantics that makes it impossible to formalizze in a type-safe way?

> 
> Bad news:  No one in their right mind would want to write out classes by hand in this way.  (Can you imagine how helpful the error messages would be if your wrote it with this encoding and made a slight mistake?)

Again, I'm lost.  Are you really proposing that programmers have to write both the generators and their fixed points explicitly?  Surely the whole point (of OO Languages) is that given the generator (in programmer speak, the late-bound self), the fixed point (in programmer speak, the method with a bound self) can be constructed automatically?

> 
> To get a properly-type encoding of classes as objects in their object calculus takes a fair amount of work, and is significantly more complex than the one we've been talking about.
> 
> There may be an alternative type-safe way of doing this encoding (there are a few simple modifications that I see, but they don't simplify things much), but I suspect that overall they will be of the same complexity as this.
> 
> So, if we want to continue to treat classes as encodings we need to:
> 
> 1.  Work out all the gory details of any proposal to ensure that it works in a type-safe way.
> 
> 2.  Convince ourselves that the results is (nearly as) understandable as the more traditional way of handling classes.
> 
> Unless we can do this, I highly recommend that we scrub the idea of encoding classes as objects and just have classes as primitives.  (Of course it is trivial to write objects as encodings using classes -- just write the obvious class with parameterless constructor and invoke new.)

I don't understand your proposal.  If classes are not objects, then they need to be something else.  What?  Functions?  That's what they look like in Scala and in Fortress: they are things that can be parameterized and applied; doing so has an effect, and produces and object.   Would they be the ONLY way of making a function, or would there be a general-prurpose function creating construct?  How would this simplify the type system?

Fortress, in effect, has real functions and classes-as-functions.  The idea was that if you wanted a factory that did something different from the one that you got from the class construct, you could write it as a factory function, and the client would not see a change in the interface.  After I had been working on Fortress for a couple of weeks, I realized that I needed to do exactly that.  IIRC, I had a CatString class that took two parameters, both strings, and generated a new CatString object that represented the concatenation of the two strings.   Instead, I wanted a CatString function that might or might not generate a CatString object, depending on the representations of the arguments and their lengths. (e.g., if either string argument was the empty string, there was not need to concatenate anything; if both were short base strings, then the fastest thing to do was to copy the characters into a new, longer base string; etc.)

So I went around the group asking how to make the transformation from CatString as a class to CatString as a function.  It turned out that no one knew how to do it.   We figured it out, eventually; we had to rename the class CatString to something else, which unfortunately had the side effect of also renaming the TYPE CatString, which WAS a change visible to clients.   Perhaps there was a way of creating a type alias ...  Suffice it to say, it was very complicated.  Since I didn't actually have any users outside of the immediate group, we just changed the interface.

It was to avoid this sort of complexity that I originally suggested that classes just be objects, so that if the new method needed to do something other than creating a new object, it could just be written down as a method, and if new wasn't the write name, it could just be changed.  

So, to summarize, I'm aware that giving a semantics for inheritance means treating object descriptions as generators and objects as their fixed-points.  I'm not sure what the problem is that you are highlighting here.   Defining the extends operation?  Typing the extends operation?    I'm also not aware of any way of avoiding the need to deal in generators and fixed-points, regardless of the object model chosen for the language.

	Andrew