[Grace-core] inheritance for stateless traits and beyond

Thu Jun 18 19:49:26 PDT 2015

Here we go again!  This is a slightly edited version of the earlier
proposal from March, with an added section on "Design Tradeoffs"
below.  The original proposal and discussion are here:

https://mailhost.cecs.pdx.edu/pipermail/grace-core/2015-March/001858.html

    ********     ********     ********     ********

The primarly aim of this proposal is to generalise Grace's current
inheritance design to support multiple inheritance from multiple
"stateless traits" as well as a single "stateful class".    This
design works in that case, and has defined semantics in other cases.

This proposal builds on this existing "fresh objects" design.  Any
program that is valid in the current design should have the same
behaviour under this proposal.  This proposal focuses on the dynamic
semantics of inheritance. Additional restrictions on behaviour (what
can be inherited from and how, how inheritance should be restricted to
be static, to avoid or permit family polymorphism, whether classes
should be dotted, should traits or records be separate from classes or
objects, etc) may be enforced by a dialect or fought over seperately.

* Background: Current design

In current "fresh objects" design, an inherits statement includes a
method request expression, the resulting method of which must
tail-return an object constructor. (A method tail-returns an object
constructor iff the last (or only) element of the method body returns
an object constructor.)  An object constructor that is invoked
directly (i.e. not via an inherits clause) binds "self" to a new
object identity, while an object constructor invoked indirectly via an
inherits statement executs with the same binding of "self" as the
object constructor that contained the inherits statement.  All code in
the body of any object constructor executes in an environment
containing all the declarations in that constructor. Code after an
inherits statement executes in same environment extended by any
(potentially overriding) declarations introduced by that inherits
statment.  The inherits statement must be the first thing inside the
object constructor body.

* Multiple inheritance with named parents

Multiple "inherits" statements can appear in a single object
constructor, executing as they appear in the code. A single inherits
statement at the top has the behaviour of the current system
(supporting safe upcalls but not downclass) .

An inherits statement can include an "as" clause, just like a module
import. The given name can only be used as the receiver for "directed
super-requests" up a particular inheritance tree, where "self" remains
bound to the object produced by the constructor.  A single inherits
statement without an "as" clause is implicitly named "super". This
allows all existing code to function exactly as now.

* Ambiguity

If an object inherits multiple methods by the same name, the one
provided by the last inherits statement wins. In all cases, the
eventual method defined in the object (whether inherited or local)
SHOULD be fully compatible with the signatures of all inherited
methods by the same name, with a local override being provided to
satisfy this criterion if applicable.

In the case of single-inheritance chains, no ambiguity arises. A local
definition in the initiating object constructor will always override
an inherited version.

In the case of trait-style inheritance of disjoint sets of methods, no
ambiguity arises. Multiple independent parents may be composed without
issue.

Verifying this compatibility statically is up to the dialect, with
potentially additional checks occurring at runtime. This is required
in order to allow the "when the programmer does it, that means that it
is not illegal" style.

* Trait Inheritance Without Diamonds

This design does not admit diamond inheritance (C++ virtual
inheritance) as every fresh object is embedded. If you only inherit
from one statefull class, plus any number of stateless traits, this
isn't a problem as diamonds are only a problem with state, and there
won't be any stateful diamonds.  This can (probably) be checked
statically (a la Donna Mayaleri -- do you think she can get us some M$
funding?).

* Positional inheritance and initialisation

Inherits statements can appear at any position in the object
constructor body **and the inheritance and initialisation occurs
visibly at that position**. All side effects of the inherit statement
occur when execution flow reaches that statement, and by the start of
the following statement all initialisation in that inheritance chain
has completed and all its methods are defined.

Methods and fields defined directly in an object constructor are
available at the start of the execution of its body, although it may
not be possible to execute them successfully until other
initialisation has completed. Methods and fields defined in a
superobject are available after the "inherits" statement introducing
that superobject, and their initialisation will be complete.

The effect is that, with "inherits" at the top of the object body,
upcalls are safe; with it at the bottom, downcalls to you are safe;
and with care, you can order things so both work. Common programming
styles are likely to favour either upcalls or downcalls, so a dialect
may enforce that inherits statements only appear at either the top or
bottom for its code.

Note that this resolves issues with what Kim wants to do in the
current system, but may require some motivation to be clear on what
the point is. I have put the sequence of motivating examples in a
sidebar so they don't stretch this out - http://ecs.vuw.ac.nz/~mwh/positional-motivation.txt

* Inheriting from Modules

Inheriting from modules (or other existing objects) is by shallow
cloning.  Modules can export a public "clone" method which counts as a
method tail-returning an object construtor and so can be inherited from.

    ********     ********     ********     ********

Design Tradeoffs: 

any design for inheritance must trade off (at least) the following
forces. Which is why it's hard, of course...

* Upcalls 

If you want to make upcalls safely during construction then the
superclass should have been initialised when you make the call. This
means the superclass initialisation must be run before the subclass.

* Downcalls

Conversely, if you want to make downcalls safely during construction
then the subclass must have been initialised. This means the subclass
initialisation must be run before the superclass.

* Upcalls vs Downcalls

These two forces conflict: if you have only one initialisation rule,
it can support one or other but not both.  If you want to allow both,
there must be some way of selecting between the two behaviours.  Scala
for example has a way of getting "early" initialisers - 

(see http://stackoverflow.com/questions/4712468/in-scala-what-is-an-early-initializer/4716273#4716273 )

Allowing "inherits" statements anywhere within a class body gives this
flexibility. Banning "self" in constructor bodies removes this
problem, but means you can't do upcalls or downcalls.  A dialect could
concievably ban "self" in (non-top-level) object constructors to avoid
most of these problems.  

* Visible changes to an object's class (method resolution)

If inherits clauses take effect at their position in an object
constructor then an object will apparently change "class" during
construction. This is not so different from most "hardhat" style
proposals, which forbid programmers from seeing their potentially
uninitialised super- or sub-objects, whereas in our case the 
uninitialised super- or sub-objects wouldn’t be there yet...

* Procedural code & ordering

The bodies of object constructors are straight-line imperative code
--- and otherwise straight-line imperative code (module bodies,
scripts, the repl) are seen as the same as object construtors.  If
initialisation occurs inside special initialiser blocks (Java's "{}"
initialisers, O'Caml's "initialiser {}" blocks) or in a different
scope (O'Caml's "defs" are initialised by expressions in the
surrounding lexical scope) then procedural code inside objects doesn't
have straightforward semantics.  If all straight-line code is still
seen as object constructors, then cannot be procedural code at all.

* Imperative objects vs declarative classes

With declarative static classes, inheritance can be calculated
statically before any code executes.  In a system based on dynamic
objects, with only method requests, it is necessary to send those
requests at run time, as method requests can have arbitrary return
values and side effects.  Sending requests twice, or re-ordering them,
also leads to the "procedural code & ordering" problem.

* Resolution 

If everything is resolved by method requests, then to know statically
what is being inherited, it must be possible to find the code that
will eventually respond to the request.  If the request chain starts
at something that may be overridden, resolving this "statically" is
more difficult, and cannot mean "modularly at the time the class is
compiled".   Allowing a non-overridable declarations ("let", "final")
can ease this analysis.

* "inherits" statement recieves an object

The inherits statements recieves (something close to) a normal object,
with a normal protocol. Requests sent to objects in the "inherits"
statement should be normal messages, not reflexivle messages (like
"deleteSlots" or "override").

* Direct inheritance vs trait algebra

A full trait algebra (with subtraction, omission, etc) is more complex
than a system that just supports incorporation or inheritance of whole
objects.  A trait algebra seems important for unanticipated reuse, but
straight inheritance seems enough for mostly anticipated resuse
(especailly if we can conceptually replace the whole library anyway).

* Static vs Dynamic checks

Dialects can check pretty much anything that can be resolved
statically. We can also mandate dynamic checkcs to enforce class
compatibility --- e.g. that overriding methods are declared "is
overrides" or/and that overriding methods retain signature
compatibility, or/and that traits are statelesss, and/or that
the inherits clauses resolve to methods that trail return object
constructors or are otherwise "fresh" objects.

* One versus many

Do we want one "class-like thing" that lets us declare both statefull
classes and stateless traits, or more than one?  
Do we want one "inherits" clause that lets us inherit from either a
class or a stateless trait, or more than one?
This design assumes just one thing is better than many similar things.

* Implicit vs explicit

Should we explicit declare and/or mark in types: "methods that
tail-return object constructors"?  "stateless traits"?  "stateful
classes"? ... etc.

* Other kinds of inheritance

Libraries could provide additional methods that "count as if they were
tail-returning an object constructor" with various semantics including
cloning, but also delegation, forwarding, proxies, &c.  Objects would
have to choose to export these methods so that they do not need
additional privileged access to self reflectively from outside.

    ********     ********     ********     ********

* Other unresolved issues:

- Evaluation order of types that depend on types defined in a
 superclass.

- "Definitively static" rules out all inheritance chains not rooted in
 an object defined in the local scope of the method or an imported
 module.

- Static binding of names with type parameters.

All of the above may be related to, but are not the same as, Tim's
proposed "let".

- Introduction of method names in a subclass/use of unqualified names
 in the superclass.

- Do repeated field declarations give rise to new storage spaces? Some
 relevant examples are in another sidebar at
 <http://ecs.vuw.ac.nz/~mwh/catfish.txt>.

- Non-local returns and exceptions during object construction can lead
 to leaked partially-initialised objects.

- The Raw and The Cooked. Should we track object's initialisation
 state explicitly / make it availale in the debugger / reflexively?
 Should this also be tracked in types (presumably types normally mean
 a cooked version unless annotated raw?)   Should sending a message
 to a raw object raise an execption (unless done by magic / from
 within the dynamic extent of that object's constructor)?