Piggybacking on the Scopes compiler

11 October 2020 (updated: 25 September 2023)

This week a friend was playing around with fe, a very minimalist lisp implementation written in C. While too simple for my tastes, it gave me the itch to touch on a project I'd been thinking about for a while: writing a language implementation that targets the Scopes compiler. I've never implemented anything quite like this before, so it was fun to figure it out as I went, and hopefully you'll enjoy the process too.

You can see the end result at the github repo, or keep reading to see where this little adventure took me.

the plan

It was an explicit objective to use as much of the host infrastructure as possible. Since the whole thing started with fe and Scopes already uses S-expressions, I went for a lisp, which enabled me to reuse the parser and a few primitives. The initial idea was to implement a small subset of Scheme, but I quickly forgot about that and winged it.

The intention wasn't to write a VM / interpreter. Instead the code is compiled as a scopes program, and macros and type magic are used to bend it into something else. I guess this makes it a weird JIT compiled thing? I'm honestly not sure.

a clean slate

I start by loading the program with a filename passed as a CLI argument.

load-module (parse-module-name filename) filename
    scope = (Scope)

At this point we pass an empty environment in the scope parameter. If you omit this third argument, the global scope is used, but by passing an empty scope our loaded module doesn't have access to anything, even built-ins. From here I can start adding features one by one.

It's time to set up our environment:

let rl-primitives =
    do
        let print
        locals;

A let binding without a value binds the symbol from the upper scope, so we're making the original print function available to our module. A call to locals returns the current scope as an object. After replacing the scope argument, we can write something in our test.lisp file:

(print "hello world")

Which does what you expect. Because I'm using the original parser, it's still possible to write the same expression without the parenthesis, and I can't use ; as a comment starter (# does it instead). Multi-line strings are also implemented at the parser level.

where it gets interesting

At first glance it seems like I'm pretty much done at this point. Just add a bunch more primitives, alias car and cdr, write "language designer" on my CV. What an underwhelming project! Fortunately, this wasn't the end.

Consider the following program:

(fn applyf (f a b c)
    (f a b c))
(applyf print 1 2 3)

Passing functions around is a lisp staple. It's particularly useful for recursion! However, when you try this you get:

error: constant value of type Closure expected, got Parameter

This is because in Scopes, a Closure is a template of a function, and is only typed on instantiation (use). For example, by calling f 1 2 3 code is generated for a function of arguments (i32 i32 i32) (the return type is inferred from what the function does). But for this typing to occur at compile-time our Closure must be constant, and function arguments never are.

In a normal program there are two ways to fix this:

Make applyf an inline (a type of Closure defined with the inline keyword). Constants passed to inlines remain constant in the function body, since inline basically copy-pastes the code on call site.
assign a type to the function explicitly.

I didn't go with the first option because I didn't want my hypothetical user to have to think about inlines, so they aren't made available. Instead I could then make every function an inline under the hood, but that's problematic because inline recursion happens at compile time. The limit for inline recursion is low by default (64 times), and it happens by just expanding over and over. Other than that, it just doesn't work with non constant conditions.

The second option is more interesting. Naturally I didn't want The User to have to specify types for functions every time, so I had to work it out automatically.

a new type system

After a few a-ha moments, I started to see a potential solution. A lot of the friction on this is because Scopes is a typed language, and I was trying to implement an untyped language. So I made a concession, and started to think of ways to pass dynamic values around.

At first I experimented with using the Value type that Scopes uses to box things at compile time, but eventually arrived at a dead end. So I made my own dynamic type instead:

enum RLValue
    Bool   : bool
    Number : f64
    String : string
    Symbol : Symbol
    List   : list
    Nil    = none

This kind of enum is also called a sum type or a tagged union and is basically a union that stores an index to identify which of the possible types is the value contained. You can then unbox this value by dispatch via index. Notice my values can't contain a function yet! Some work is still necessary to allow for that.

I also implemented a few helpers to box / unbox, return the type of the value or a textual representation of what's contained.

redefining functions

Now we need a way to both encode our functions, and define a calling convention. Every function in our language will have the same type signature as far as the compiler is concerned, which eliminates the function passing problem; that is because since we already know what kind of arguments are going in, we can convert them to function pointers in advance.

let rlfnptrT =
    pointer
        raises
            function (uniqueof RLValue -1) i32 (viewof (mutable@ RLValue) 2)
            Error

typedef RLFunction : rlfnptrT

This looks a bit intimidating, but it's simple. You can ignore the uniqueof/viewof stuff, those are annotations for the borrow checker (necessary because the Enum type is unique). Our functions take an i32 and a mutable pointer to RLValue as arguments, returning another RLValue. This is what a function using this convention could look like:

fn my-function (argc args)
    raising Error
    assert (argc == 1)
    let arg = (args @ 0)
    print arg
    RLValue.Nil;

# convert to a function pointer, then cast to RLFunction
let my-rlfunction = 
    bitcast
        # again, the return type is inferred.
        static-typify my-function i32 (viewof (mutable@ RLValue))
        RLFunction

Still, it is quite verbose, and to pass arguments in one has to first convert them to RLValue. For this, I'll use a powerful tool: the __call metamethod. As it turns out, one can add the ability to be "called" to anything, or modify the way anything callable is called. The plan here is to make a macro that takes a series of arguments, generates code to box and put them into an array then calls the real function with the count and a pointer. Feel free to read the full implementation of the rl-call macro on Github. If you read the code you'll notice alloca-array is used to get an array pointer. This way we dodge managing memory, since it'll just get cleaned up when the frame collapses^[1].

Then, we set our macro as the call behaviour:

typedef+ RLFunction
    let __call = (box-pointer rl-call)

With that, we can call (my-rlfunction "hello world") without worrying about boxing on callsite.

Finally, I implement a sugar (syntax macro) so that user code can define their own functions without ever knowing what's going on. Again I'll just link the complete implementation.

It is used exactly like the original fn, but generates code to unroll the array into name bindings (according to the argument list^[2]) and boxes the return value. An arity check is performed by comparing the argument list with the argc at runtime. This is the 'safest' approach, but one could simply insert Nils if argc is less than expected, and bind extra args to some sort of hidden varargs parameter. Maybe I will in a later iteration.

To allow for recursion, I pass this-function (a value that is bound to every Scopes Closure to represent itself) through the same function pointer conversion^[3], and bind that to the name provided.

Now the RLValue type can be completed. However, because RLFunction depends on RLValue, I use a proxy type instead:

typedef _RLFunction : voidstar

enum RLValue
    # other types skipped for brevity
    Function : _RLFunction

inline __call (self args...)
    dispatch self
    case Function (f)
        f args...
    default
        hide-traceback;
        error (.. "cannot call value of type " ('type self))

Then later we have it forward to the real type:

typedef+ _RLFunction
    inline __call (self args...)
        (bitcast self RLFunction) args...

In truth the proxy type is not explicitly necessary - I could have defined RLFunction inside RLValue. However I found this to be less messy, and since we control all the boxing it isn't an issue.

the other stuff

After all that, we can finally do recursion:

(fn recurse (s)
    (print s)
    (recurse s))

(recurse "hello world")

It works so well in fact that it will run until the stack overflows and our program crashes. So it's time to add some other important features, like an if form:

sugar _if (args...)
    sugar-match args...
    case (condition tclause fclause)
        qq
            [embed]
                [if] ([rlvalue-unbox-as] [condition] [RLValue.Bool])
                    [box-value]
                        [tclause]
                else
                    [box-value]
                        [fclause]
    default
        error "incorrect if syntax"

At first I wanted to do it by using the fn sugar and just wrapping, but it's important that the else clause only be evaluated if the condition is false, so it has to be special. Similarly I've defined a let form that is more familiar to lisp programmers.

Other than that, a lot of the other primitives are for example operators wrapped in a RLFunction. They're very easily generalized like this:

inline gen-binary-op (f)
    _fn (a b)
        f (rlvalue-unbox-as a RLValue.Number) (rlvalue-unbox-as b RLValue.Number)

Again it is quite limited because of the strict arity checking, but eg. adding multiple things at once could be implemented with chaining.

Perhaps ironically the last thing I added were the list manipulation functions. I use the default list type that comes with the runtime, but because it uses its own untyped container to hold values inside the list I had to do some work to convert it back and forth.

With these final touches I concluded what I wanted to achieve with this silly experiment. You can see what it can do in the test file.

conclusion

This mini-project took me roughly two days to complete. While the result was pretty uninteresting in the grand scheme of things (I'd hesitate to even call it a language), I am pretty satisfied with the experience. I learned quite a few things about Scopes internals, and some ideas of what to explore next. For example, I'm confident one could add a GC to this setup without too much trouble, or replace the parser instead and use a completely different syntax while leaving semantics intact. Macros could also be implemented by translating them into sugars and adding a compiler stage after their definition.

The only RLValue types that would have to be managed dynamically are List and String, but I used the respective types from Scopes, which are managed by the runtime. This was to keep the scope of the project small. ↩︎
At this point whenever I mention argument list I'm talking about the definition of the function interface, and not the RLValue array. It is something determined during syntax expansion. ↩︎
I still can't believe you can just do that! ↩︎