[Grace-core] Minigrace on LLVM
Michael Homer
mwh at ecs.vuw.ac.nz
Sat May 28 03:12:18 PDT 2011
Hi,
Following up on the Minigrace-on-Parrot post of a few weeks ago, I've
been working on a native compiler, written in Grace and targeting LLVM
bitcode. LLVM (<http://llvm.org/>) provides standard optimisations and
cross-platform code generation, so the compiler should be able to work
reasonably well on a fairly broad range of platforms.
Minigrace is getting progressively closer to Grace, and is an untyped
implementation of most of the current specification with a few missing
features. The compiler is capable of compiling itself, to native code
if desired. Details follow, or skip to the bottom for download and
build instructions.
The main limitations in the language at this point are a lack of
tuples, types, return statements, method annotations, subclassing, and
non-decimal numeric literals, and there's a lot of library missing (of
course). It supports classes as syntactic sugar for object literals,
while all operators have equal precedence and are right-associative,
and all numbers are Float64s.
It does support objects, methods, var and const fields, mixfix
methods, string escapes, iterators, (statically-linked) modules,
operator overloading, Unicode identifiers, and blocks. It supports a
native "Array" type, which is really a vector and has a literal syntax
[...], Unicode strings, and "Octets", a binary data type. Indexes
currently start from zero, and can be used both with .at and with
postcircumfix [index]. The postcircumfix version for String has a
legacy behaviour and is not the same as .at currently.
The control structures if-then-else, for-do, and while-do are
builtins. If and for, at least, can be implemented as multi-part
methods using blocks:
method myif(cond) mythen (block) {
cond.ifTrue(block)
}
myif (true) mythen {
print("Success.")
}
method myfor(coll) do (block) {
var it := coll.iter()
while {it.havemore()} do {
block.apply(it.next())
}
}
myfor ([1,2,3]) do { i->
print(i)
}
The builtin if-then-else does not support an elseif branch yet, and
for-do does not support multiple parameters to the block (because it's
special-cased; generic blocks do).
var declarations inside an object create accessor methods, while const
creates just a reader. These methods are the only way to access their
values, and work with the o.foo := syntax. Local variables are real
and must be declared with var before use, and local consts are
currently identical to vars. A block or method defined within their
scope can act as a closure over that variable if it uses the outer
variable and does not define another of the same name. Nested closures
are not reliable yet, however.
Operators are defined in an object or class declaration in the same
manner as other methods:
class foo { alist->
var list := alist
method ++(other) {
foo.new(self.list ++ other.list)
}
}
class declarations are currently syntactic sugar for the corresponding
const X := object { method new() { object {...} } } declaration. A
method can be used as an infix operator if it consists entirely of the
symbols described later, and is dispatched in the normal way other
than the syntax.
All Minigrace programs must be in UTF-8, and the compiler will reject
them if not. Identifiers consist of characters from Unicode categories
"Letter" and "Number", and underscore. Operators can be defined using
the characters -, &, |, :, %, *, /, and all characters in the Unicode
category "Symbols, Mathematical". No control characters except
linefeed and carriage return are permitted, and no characters from
Unicode category "Separator" other than ASCII space and U+2028 LINE
SEPARATOR can appear anywhere in the program. Inside a string literal,
\uXXXX represents the BMP character whose codepoint in hex is XXXX.
There isn't an escape for characters beyond U+FFFF yet.
The compiler can either read its program on standard input and write
to standard output or take a filename on the command line. With a
filename and no options the compiler will generate a .ll textual
bitcode file of the corresponding name. If --make is given the
compiler will process any import statements it finds, ensure that the
modules are compiled, and link them together into modulename.bc, which
can be run with `lli` or further processed somehow. --run acts as
--make but then runs the file with lli itself. --native will use
llvm-ld to generate native code in modulename. Native code is slightly
faster at runtime for big programs, but can take a while to produce,
especially if the Unicode module is linked in. Bitcode is mostly
unproblematic, but I suggest a native build at least of the compiler
itself. It is also fairly unforgiving in its input sometimes - one
manifestation of a syntax error is non-termination and
continuously-increasing memory allocation, so keep an eye out.
--verbose will produce some output on where it's up to on standard
error.
The compiler requires LLVM to interpret or compile the bitcode it
generates. The bitcode generated is architecture-independent, but
currently needs to link against a C library handling some of the
standard library and memory allocation and a C module containing the
Unicode Character Database, both of which are built with LLVM's C
compiler `clang` for each architecture. I have native binaries for
Linux-i686, Linux-x86_64, and NetBSD-i386, and bitcode for other
architectures. It's likely that the clang-compiled bitcode files
(gracelib.o and unicode.gco) work on other architectures with the same
bitwidth, but I'm not sure if that's universal.
Download from <http://homepages.ecs.vuw.ac.nz/~mwh/minigrace/dist/20110528/>
according to your architecture.
To compile the system from source:
clang -emit-llvm -c gracelib.c
clang -emit-llvm -c -o unicode.gco unicode.c
To link the components together (start here if the native executable
doesn't work for you, or if you:
llvm-link -o minigrace.bc gracelib.o minigrace.ll unicode.gco
At this point, `lli minigrace.bc` will work, but to produce a native executable:
llvm-ld -o minigrace -native minigrace.bc
Once built, the compiler can recompile itself:
./minigrace --make --native compiler.gc
There is also a git repository and makefile capable of building the
system up from scratch starting with a version of the compiler that
runs on both itself and the Parrot implementation, if anybody wants to
do that. It all ought to run anywhere, although the --make mode
currently makes some POSIXy assumptions and may need GNU `[` on top.
The best and worst existing example of the language is the compiler
itself; it uses most of the features, although not all of the most
recent ones, but much of the code is still working around limitations
in the Parrot version or earlier versions of itself, and so may not
make a lot of sense in places given the alternatives available. I
suggest just trying to write things and seeing how they go, and I'm
interested in reports of things that don't work but should. Hopefully
it can provide a platform for experimentation and discussion, anyway.
It'd be nice to build up a conformance test suite as well. It should
be possible for the subset of the language that matches Grace exactly.
-Michael
More information about the Grace-core
mailing list