[Grace-core] Minigrace on LLVM

Mon May 30 17:15:00 PDT 2011

A high level question.  Why did you choose LLVM as the target?  I have no objection to it in principle, but am concerned that there is no garbage collector.  Is there one easily available that we can integrate into our system?  Obviously garbage collection is not important at this stage, but it will be when people start using it.  Thus, I am concerned that if we (you) spend the effort on targeting a relatively low-level VM then it might not be a good long-term investment.  I don't have any emotional commitment to the JVM (especially because of generics and arrays), but it does include a garbage collector.  I'm certainly open to other options as well, but eventually we do need garbage collection.

Our own implementation is still very high level (read, very inefficient) to help work out the language design issues.  I'm very happy that you are getting so far with this, but I don't want you to go to lots of extra effort if you'll have to switch later (unless you think it will be an easy port).

Thanks for all of your good work on this!

Kim

On May 28, 2011, at 3:12 AM, Michael Homer wrote:

> Hi,
> Following up on the Minigrace-on-Parrot post of a few weeks ago, I've
> been working on a native compiler, written in Grace and targeting LLVM
> bitcode. LLVM (<http://llvm.org/>) provides standard optimisations and
> cross-platform code generation, so the compiler should be able to work
> reasonably well on a fairly broad range of platforms.
> 
> Minigrace is getting progressively closer to Grace, and is an untyped
> implementation of most of the current specification with a few missing
> features. The compiler is capable of compiling itself, to native code
> if desired. Details follow, or skip to the bottom for download and
> build instructions.
> 
> The main limitations in the language at this point are a lack of
> tuples, types, return statements, method annotations, subclassing, and
> non-decimal numeric literals, and there's a lot of library missing (of
> course). It supports classes as syntactic sugar for object literals,
> while all operators have equal precedence and are right-associative,
> and all numbers are Float64s.
> 
> It does support objects, methods, var and const fields, mixfix
> methods, string escapes, iterators, (statically-linked) modules,
> operator overloading, Unicode identifiers, and blocks. It supports a
> native "Array" type, which is really a vector and has a literal syntax
> [...], Unicode strings, and "Octets", a binary data type. Indexes
> currently start from zero, and can be used both with .at and with
> postcircumfix [index]. The postcircumfix version for String has a
> legacy behaviour and is not the same as .at currently.
> 
> The control structures if-then-else, for-do, and while-do are
> builtins. If and for, at least, can be implemented as multi-part
> methods using blocks:
>  method myif(cond) mythen (block) {
>      cond.ifTrue(block)
>  }
>  myif (true) mythen {
>      print("Success.")
>  }
>  method myfor(coll) do (block) {
>      var it := coll.iter()
>      while {it.havemore()} do {
>          block.apply(it.next())
>      }
>  }
>  myfor ([1,2,3]) do { i->
>      print(i)
>  }
> The builtin if-then-else does not support an elseif branch yet, and
> for-do does not support multiple parameters to the block (because it's
> special-cased; generic blocks do).
> 
> var declarations inside an object create accessor methods, while const
> creates just a reader. These methods are the only way to access their
> values, and work with the o.foo := syntax. Local variables are real
> and must be declared with var before use, and local consts are
> currently identical to vars. A block or method defined within their
> scope can act as a closure over that variable if it uses the outer
> variable and does not define another of the same name. Nested closures
> are not reliable yet, however.
> 
> Operators are defined in an object or class declaration in the same
> manner as other methods:
>  class foo { alist->
>    var list := alist
>    method ++(other) {
>        foo.new(self.list ++ other.list)
>    }
>  }
> class declarations are currently syntactic sugar for the corresponding
> const X := object { method new() { object {...} } } declaration. A
> method can be used as an infix operator if it consists entirely of the
> symbols described later, and is dispatched in the normal way other
> than the syntax.
> 
> All Minigrace programs must be in UTF-8, and the compiler will reject
> them if not. Identifiers consist of characters from Unicode categories
> "Letter" and "Number", and underscore. Operators can be defined using
> the characters -, &, |, :, %, *, /, and all characters in the Unicode
> category "Symbols, Mathematical". No control characters except
> linefeed and carriage return are permitted, and no characters from
> Unicode category "Separator" other than ASCII space and U+2028 LINE
> SEPARATOR can appear anywhere in the program. Inside a string literal,
> \uXXXX represents the BMP character whose codepoint in hex is XXXX.
> There isn't an escape for characters beyond U+FFFF yet.
> 
> The compiler can either read its program on standard input and write
> to standard output or take a filename on the command line. With a
> filename and no options the compiler will generate a .ll textual
> bitcode file of the corresponding name. If --make is given the
> compiler will process any import statements it finds, ensure that the
> modules are compiled, and link them together into modulename.bc, which
> can be run with `lli` or further processed somehow. --run acts as
> --make but then runs the file with lli itself. --native will use
> llvm-ld to generate native code in modulename. Native code is slightly
> faster at runtime for big programs, but can take a while to produce,
> especially if the Unicode module is linked in. Bitcode is mostly
> unproblematic, but I suggest a native build at least of the compiler
> itself. It is also fairly unforgiving in its input sometimes - one
> manifestation of a syntax error is non-termination and
> continuously-increasing memory allocation, so keep an eye out.
> --verbose will produce some output on where it's up to on standard
> error.
> 
> The compiler requires LLVM to interpret or compile the bitcode it
> generates. The bitcode generated is architecture-independent, but
> currently needs to link against a C library handling some of the
> standard library and memory allocation and a C module containing the
> Unicode Character Database, both of which are built with LLVM's C
> compiler `clang` for each architecture. I have native binaries for
> Linux-i686, Linux-x86_64, and NetBSD-i386, and bitcode for other
> architectures. It's likely that the clang-compiled bitcode files
> (gracelib.o and unicode.gco) work on other architectures with the same
> bitwidth, but I'm not sure if that's universal.
> 
> Download from <http://homepages.ecs.vuw.ac.nz/~mwh/minigrace/dist/20110528/>
> according to your architecture.
> 
> To compile the system from source:
>  clang -emit-llvm -c gracelib.c
>  clang -emit-llvm -c -o unicode.gco unicode.c
> To link the components together (start here if the native executable
> doesn't work for you, or if you:
>  llvm-link -o minigrace.bc gracelib.o minigrace.ll unicode.gco
> At this point, `lli minigrace.bc` will work, but to produce a native executable:
>  llvm-ld -o minigrace -native minigrace.bc
> Once built, the compiler can recompile itself:
>  ./minigrace --make --native compiler.gc
> 
> There is also a git repository and makefile capable of building the
> system up from scratch starting with a version of the compiler that
> runs on both itself and the Parrot implementation, if anybody wants to
> do that. It all ought to run anywhere, although the --make mode
> currently makes some POSIXy assumptions and may need GNU `[` on top.
> 
> The best and worst existing example of the language is the compiler
> itself; it uses most of the features, although not all of the most
> recent ones, but much of the code is still working around limitations
> in the Parrot version or earlier versions of itself, and so may not
> make a lot of sense in places given the alternatives available. I
> suggest just trying to write things and seeing how they go, and I'm
> interested in reports of things that don't work but should. Hopefully
> it can provide a platform for experimentation and discussion, anyway.
> It'd be nice to build up a conformance test suite as well. It should
> be possible for the subset of the language that matches Grace exactly.
> -Michael
> _______________________________________________
> Grace-core mailing list
> Grace-core at cecs.pdx.edu
> https://mailhost.cecs.pdx.edu/mailman/listinfo/grace-core