Compiler Tower (Guile Reference Manual)

9.4.1 Compiler Tower

Guile’s compiler is quite simple – its compilers, to put it more accurately. Guile defines a tower of languages, starting at Scheme and progressively simplifying down to languages that resemble the VM instruction set (see Instruction Set).

Each language knows how to compile to the next, so each step is simple and understandable. Furthermore, this set of languages is not hardcoded into Guile, so it is possible for the user to add new high-level languages, new passes, or even different compilation targets.

Languages are registered in the module, (system base language):

(use-modules (system base language))

They are registered with the define-language form.

Scheme Syntax: define-language [#:name] [#:title] [#:reader] [#:printer] [#:parser=#f] [#:compilers='()] [#:decompilers='()] [#:evaluator=#f] [#:joiner=#f] [#:for-humans?=#t] [#:make-default-environment=make-fresh-user-module] [#:lowerer=#f] [#:analyzer=#f] [#:compiler-chooser=#f]

Define a language.

This syntax defines a <language> object, bound to name in the current environment. In addition, the language will be added to the global language set. For example, this is the language definition for Scheme:

(define-language scheme
  #:title	"Scheme"
  #:reader      (lambda (port env) ...)
  #:compilers   `((tree-il . ,compile-tree-il))
  #:decompilers `((tree-il . ,decompile-tree-il))
  #:evaluator	(lambda (x module) (primitive-eval x))
  #:printer	write
  #:make-default-environment (lambda () ...))

The interesting thing about having languages defined this way is that they present a uniform interface to the read-eval-print loop. This allows the user to change the current language of the REPL:

scheme@(guile-user)> ,language tree-il
Happy hacking with Tree Intermediate Language!  To switch back, type `,L scheme'.
tree-il@(guile-user)> ,L scheme
Happy hacking with Scheme!  To switch back, type `,L tree-il'.
scheme@(guile-user)>

Languages can be looked up by name, as they were above.

Scheme Procedure: lookup-language name

Looks up a language named name, autoloading it if necessary.

Languages are autoloaded by looking for a variable named name in a module named (language name spec).

The language object will be returned, or #f if there does not exist a language with that name.

When Guile goes to compile Scheme to bytecode, it will ask the Scheme language to choose a compiler from Scheme to the next language on the path from Scheme to bytecode. Performing this computation recursively builds transformations from a flexible chain of compilers. The next link will be obtained by invoking the language’s compiler chooser, or if not present, from the language’s compilers field.

A language can specify an analyzer, which is run before a term of that language is lowered and compiled. This is where compiler warnings are issued.

If a language specifies a lowerer, that procedure is called on expressions before compilation. This is where optimizations and canonicalizations go.

Finally a language’s compiler translates a lowered term from one language to the next one in the chain.

There is a notion of a “current language”, which is maintained in the current-language parameter, defined in the core (guile) module. This language is normally Scheme, and may be rebound by the user. The run-time compilation interfaces (see Read/Load/Eval/Compile) also allow you to choose other source and target languages.

The normal tower of languages when compiling Scheme goes like this:

Scheme
Tree Intermediate Language (Tree-IL)
Continuation-Passing Style (CPS)
Bytecode

As discussed before (see Object File Format), bytecode is in ELF format, ready to be serialized to disk. But when compiling Scheme at run time, you want a Scheme value: for example, a compiled procedure. For this reason, so as not to break the abstraction, Guile defines a fake language at the bottom of the tower:

Value

Compiling to value loads the bytecode into a procedure, turning cold bytes into warm code.

Perhaps this strangeness can be explained by example: compile-file defaults to compiling to bytecode, because it produces object code that has to live in the barren world outside the Guile runtime; but compile defaults to compiling to value, as its product re-enters the Guile world.

Indeed, the process of compilation can circulate through these different worlds indefinitely, as shown by the following quine:

((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))