Solving Advent of Code With a Java DSL For Creating Classes at Runtime

Back in December, I saw a tweet that piqued my interest.

https://twitter.com/hundredrabbits/status/1466441362354573314

I don't have an esolang, but I did have the seeds of a Java DSL for building classes at runtime. That DSL builds on top of ASM to let you write any code you want in what looks like a high level language. I'd been letting the idea languish for a bit, but after seeing the tweet, I got back to work. Unfortunately, December is a busy time, so you're getting this post in the second half of January.

My idea to write a compiler/DSL1 started with needle, my project to compile regexes to Java bytecode. In that project, each regular expression is compiled to a class implementing its particular matching logic.

I'd been using ASM in that project, alongside some very small uses of ByteBuddy. ASM is a low level library that largely mirrors Java bytecode. It does handle fiddly details, such as computing jump locations and stack maps. ByteBuddy adds several conveniences for reducing boilerplate (things like delegating methods), but doesn't substantially simplify generating algorithmic code.2

Writing to those APIs worked, but wasn't pleasant--I was able to make progress, but found debugging slow and difficult. Simple changes would cause errors either at code generation time or verification time, and debugging those errors was slow. I relied heavily on asmifier and a web front-end for it.

Let's implement the recursive fibonacci using ASM:

public void defineFibonacci(ClassWriter classWriter) {
    var mv = cw.visitMethod(ACC_PUBLIC, "fibonacci", "(I)I", null, null);
    mv.visitCode();
    mv.visitVarInsn(ILOAD, 1);
    Label l0 = new Label();
    mv.visitJumpInsn(IFEQ, l0);
    mv.visitVarInsn(ILOAD, 1);
    mv.visitInsn(ICONST_1);
    Label l1 = new Label();
    mv.visitJumpInsn(IF_ICMPNE, l1);
    mv.visitLabel(l0);
    mv.visitInsn(ICONST_1);
    mv.visitInsn(IRETURN);
    mv.visitLabel(l1);
    mv.visitVarInsn(ALOAD, 0);
    mv.visitVarInsn(ILOAD, 1);
    mv.visitInsn(ICONST_1);
    mv.visitInsn(ISUB);
    mv.visitMethodInsn(INVOKEVIRTUAL, "MainClass", "fibonacci", "(I)I", false);
    mv.visitVarInsn(ALOAD, 0);
    mv.visitVarInsn(ILOAD, 1);
    mv.visitInsn(ICONST_2);
    mv.visitInsn(ISUB);
    mv.visitMethodInsn(INVOKEVIRTUAL, "MainClass", "fibonacci", "(I)I", false);
    mv.visitInsn(IADD);
    mv.visitInsn(IRETURN);
    mv.visitMaxs(-1, -1);
    mv.visitEnd();
}

This shows the verbosity, but not the other difficulties (managing the stack and positions of local variables).

A side note about this and all the other examples in this blog post: they're pointless. The idea of my compiler/DSL, or any code-generation tool, is to generate code that you wouldn't/couldn't write yourself--but code generation is pointless overhead for these examples. In the case of regular expressions, code generation can be both convenient and essential: if you accept the regular expression at runtime, you can't code it ahead of time. But even where you could, in principle, write the code by hand, hand-coding a matcher would be tedious and error-prone. Better to use a library.

Improving on ASM

My first attempt to improve the situation built a layer over ASM, which reified blocks, methods, and stack operations into their own objects. This helped with a point where I was particularly stuck, but didn't fundamentally change things--even with the changes, I found developing rather painful. Sometimes the added elements were useful for debugging, but it was still slow going, and I'm not sure if they're worth their weight.

Here's what fibonacci looks like in that first (nameless) abstraction I created over ASM:

public static Class<?> compilerMark1Fibonacci() {
    ClassBuilder cb = new ClassBuilder("fibonacci1", "java/lang/Object", new String[]{});
    cb.addEmptyConstructor();
    var vars = new GenericVars("x");
    var method = cb.mkMethod("fibonacci", List.of("I"), "I", vars);
    var initialBlock = method.addBlock();
    var return1Block = method.addBlock();

    initialBlock.readVar(vars, "x", "I").jump(return1Block, IFEQ);
    initialBlock.readVar(vars, "x", "I").push(1).jump(return1Block, IF_ICMPEQ);
    initialBlock.readThis().readVar(vars, "x", "I").push(1).operate(ISUB).
            call("fibonacci", "fibonacci1", "(I)I");
    initialBlock.readThis().readVar(vars, "x", "I").push(2).operate(ISUB).
            call("fibonacci", "fibonacci1", "(I)I");
    initialBlock.operate(IADD).addReturn(IRETURN);

    return1Block.push(1).addReturn(IRETURN);
    return new ClassCompiler(cb).generateClass();
}

The Compiler

The DSL is another layer on top of the class/method/block classes I created on top of ASM. To get a feel for it, let's see fiboacci:

public static Method fibonacci() {
    var method = new Method("fibonacci", List.of("I"), "I", new GenericVars("n"));
    method.cond(eq(read("n"), 0)).withBody(List.of(
            returnValue(1)));
    method.cond(eq(read("n"), 1)).withBody(List.of(
            returnValue(1)));
    method.returnValue(plus(
            call("fibonacci", Builtin.I, thisRef(), sub(read("x"), 1)),
            call("fibonacci", Builtin.I, thisRef(), sub(read("x"), 2))));
    return method;
}

@Test
public void testFibonacci() throws Exception {
    var builder = new ClassBuilder("FibonacciClass", "java/lang/Object", new String[]{});
    builder.addMethod(fibonacci());
    builder.addMethod(builder.emptyConstructor());
    var cls = new ClassCompiler(builder);
    Class<?> compiled = cls.generateClass();
    var instance = compiled.getConstructors()[0].newInstance();
    var output = compiled.getMethod(TEST_METHOD, List.of(int.class).toArray(new Class[0])).invoke(instance, 5);
    assertEquals(8, output);
}

The code is available on GitHub, alongside several more examples (though this is the most complicated).

I grabbed the "cond" name from scheme, but that's about it. The DSL hides these details, but each statement in the Java code corresponds to one top level element in the method, which are stored as a list of CodeElement internally. Each top level element is a node in the AST, and has children which are either Expression or Statement objects.

There's local variable type inference, but methods require explicit type signatures, and calls to methods require specifying the output type. There are while loops (spelled loop), with break and continue spelled escape and skip, but variables have function level scope.3 The "enhanced" for loop will probably have to appear eventually.

The API is typed (an Assignment is a Statement, not an Expression), but somewhat loosely. If you use a numeric literal as the condition for an if-statement, the API will let you, though you won't be able to generate a loadable Java class.4

Advent of Code Problem 7

With that done, here's the code to solve advent of code problem 7:


private Method mkSolveMethod() {
     var vars = new GenericVars("crabPositions", "min", "index", "guess", "total", "difference");
     var method = new Method("solve", List.of(), CompilerUtil.descriptor(Integer.class), vars);
     method.set("crabPositions", callStatic("com/justinblank/classcompiler/aocexamples/Problem7",
             "readInput", ArrayType.of(Builtin.I)));
     method.set("min", Integer.MAX_VALUE);
     method.set("guess", 0);

     method.loop(null,
             List.of(
                 set("index", 0),
                 set("total", 0),
                 loop(lt(read("index"), arrayLength(read("crabPositions"))),
                     List.of(
                             set("difference", callStatic(Math.class, "abs", Builtin.I,
                                 sub(read("guess"), arrayRead(read("crabPositions"), read("index"))))),
                             set("total", plus(read("total"), read("difference"))),
                             set("index", plus(read("index"), 1))
                 )),
                 set("guess", plus(read("guess"), literal(1))),
                 cond(lt(read("total"), read("min"))).withBody(List.of(
                     set("min", read("total"))
                 )),
                 cond(gt(read("total"), read("min"))).withBody(List.of(
                         returnValue(callStatic(CompilerUtil.internalName(Integer.class), "valueOf",
                                 ReferenceType.of(Integer.class), read("min"))))
                 ))
             );
     method.returnValue(callStatic(CompilerUtil.internalName(Integer.class), "valueOf", ReferenceType.of(Integer.class), read("min")));
     return method;
 }

The code implements a naive double loop. For each location the crabs could meet, it sums the distance for each crab, and chooses the location with the minimum distance.

Status and Future Plans

At the moment, this project is extremely raw. There's lots of duplication in the code, owing to the way the project evolved, and the API needs more thought, but those are the small problems. More importantly, I've solved problems as I went, and haven't tested language features systematically (for instance, I have yet to write a program using floating point). There's no way to throw or catch an exception. I don't know if the type inference algorithm is sound, I don't know how it will play with generics. I'll keep working on those issues, and if things seem promising, try to get it in a state where I can reincorporate it into needle.

Still, I'm optimistic there's something here, whether or not it looks exactly like what I've done so far. Runtime code generation is difficult, and I don't think it has to be.

Footnotes


  1. Is it a compiler? A DSL? Meh.

  2. As far as I know. I asked around for any projects that would do what I wanted, but couldn't find any.

  3. This is 100% because it's easier to implement, but hot-take: function level scope is fine.

  4. Talking about runtime vs. compile-time is awkward here. You will get the error at runtime in the sense that your code will compile. But that will be a compilation error, since you will try to compile a Java class, and fail.