Speeding Up Java Date Formatting With Code Generation

I previously introduced mako, my library for writing high level code and compiling it to JVM bytecode. That introduction used the recursive fiboacci algorithm, which is convenient, but doesn't represent a real use case. A bit later, I realized that implementing datetime formatting would be a simple but interesting use case. The result is a new library, TemporalFormats, that supports formatting dates and times as strings.

I said simple, but that was about 6 weeks and 85 hours of work ago. 1 For one thing, didn't realize how many more features mako needed, for another, I thought date formats were simpler than they were.

Regardless, date formatting is appealing because it's ubiquitous and mako should be able to speed it up. Bytecode generation is one of the main metaprogramming tools on the JVM. Frameworks like Spring or Hibernate use it extensively for proxies, interceptors, aspect oriented programming, etc. My own current interest is using code generation for performance. Whenever code builds a little interpreter, like a regex engine, it's possible that generating classes to do that could perform better.

The Java 8 DateTimeFormatter class is effectively a mini interpreter that dispatches to various other objects to handle pieces of the date format. The core operation comes from DateTimeFormatterBuilder$CompositePrinterParser#format.

@Override
public boolean format(DateTimePrintContext context, StringBuilder buf) {
    int length = buf.length();
    if (optional) {
        context.startOptional();
    }
    try {
        for (DateTimePrinterParser pp : printerParsers) {
            if (pp.format(context, buf) == false) {
                buf.setLength(length);  // reset buffer
                return true;
            }
        }
    } finally {
        if (optional) {
            context.endOptional();
        }
    }
    return true;
}

printerParsers has type DateTimePrinterParser[], and CompositePrinterParser is itself an instance of a DateTimePrinterParser. So the call is recursive, and the call will end up resolving to many different subclasses' methods at runtime (this is known as a megamorphic call-site, and it changes the way the JIT compiler optimizes code).2 That combination makes it plausible that we can improve on the DateTimeFormatter class with code generation. If I were a better scientist, I'd have analyzed how the compiler handles the DateTimeFormatter, but I didn't even read that code until well after I'd started this project. Luckily, I guessed right.

Implementation

The library defines a TemporalFormatter interface, with one method, TemporalFormatter#format(TemporalAccessor) (no parsing yet). TemporalFormatter plus a handful of helper methods are defined in the temporalformatter-core module. That separation lets users distribute generated TemporalFormatter implementations, without including the libraries for runtime code generation.3

Instances of TemporalFormatter are created by the TemporalFormatterCreator class. The TemporalFormatterCreator operates on a List<FormatSpecifier> (or a string like ~"yyyy-MM-dd"~ that's parsed into a List<FormatSpecifier> instances). For each FormatSpecifier the TemporalFormatCreator emits code into the generated format method.

Here's the decompiled code of a generated TemporalFormatter instance (the ISOOffsetDateTime class):4


public String format(TemporalAccessor var1) {
    boolean var10000 = true;
    StringBuilder var2 = new StringBuilder();
    TemporalFormatterComponents.formatyyyy(var1, var2);
    var2.append(DASH);
    TemporalFormatterComponents.formatMM(var1, var2);
    var2.append(DASH);
    TemporalFormatterComponents.formatdd(var1, var2);
    var2.append(T);
    TemporalFormatterComponents.formatHH(var1, var2);
    var2.append(COLON);
    TemporalFormatterComponents.formatmm(var1, var2);
    var2.append(COLON);
    TemporalFormatterComponents.formatss(var1, var2);
    var2.append(DOT);
    TemporalFormatterComponents.formatSSS(var1, var2);
    this.appendOffset(var2, var1);
    return var2.toString();
}

// this method could also be moved to TemporalFormatterComponents,
// and I probably will do that in the future
public StringBuilder appendOffset(StringBuilder var1, TemporalAccessor var2) {
    int var3 = Math.toIntExact((long)var2.get(ChronoField.OFFSET_SECONDS));
    if (var3 == 0) {
        return var1.append(ISOOffsetDateTime.Z);
    } else {
        if (var3 < 0) {
            var1.append(ISOOffsetDateTime.DASH);
        }

        if (var3 > 0) {
            var1.append(ISOOffsetDateTime.PLUS);
        }

        int var4 = Math.abs(var3 / 3600 % 100);
        var1.append(var4 / 10);
        var1.append(var4 % 10);
        var1.append(ISOOffsetDateTime.COLON);
        int var5 = Math.abs(var3 / 60 % 60);
        var1.append(var5 / 10);
        var1.append(var5 % 10);
        return var1;
    }
}

The code in TemporalFormatterComponents is distributed as part of the library, not generated code. It contains reusable pieces of formatting code that can be called from generated classes.


// there are about a half dozen methods like these in this class
public static StringBuilder formatyyyy(TemporalAccessor time, StringBuilder sb) {
    int year = time.get(ChronoField.YEAR) % 10000;
    if (year < 1000) {
        sb.append('0');
        if (year < 100) {
            sb.append('0');
            if (year < 10) {
                sb.append('0');
            }
        }
    }
    return sb.append(year);
}

The distinction between generated code and library code is flexible. The first iteration of TemporalFormats only used code generation, until I realized I should limit code generation to only the parts that had to be dynamic. Mako aims to be easier than ASM, but it takes perhaps twice as much code to do something by generating methods as it does to write the equivalent Java code.

Benchmarks

I wrote a handful of JMH benchmarks to see how the TemporalFormatter performed relative to the DateTimeFormatter.

The DateFormatBenchmark class contains tests comparing a standard library DateTimeFormatter to a handwritten implementation and a TemporalFormatter. Anything I do in the TemporalFormatter can be done by hand, so we should expect similar performance between the two implementations.5

In DateFormatBenchmark the date used is constant during the benchmark (the benchmark initializes a ZonedDateTime with the system default time zone at startup time). Depending on what that date is, certain branches of the formatting code will never be hit, which affects the decisions the JIT compiler makes. So in DateFormatBenchmarkRolling, each iteration updates the ZonedDateTime by 111111111111229 nanoseconds (just over 1.1 second, prime).

In MultiDateFormatBenchmarkRolling, we follow the same pattern of rolling dates forward, but also include multiple DateTimeFormatter or TemporalFormatter instances in the test.

Benchmarks were run on an i7-7500U CPU @ 2.70GHz laptop, with openjdk 11.0.14.1 2022-02-08, on Fedora 34.

Benchmark Time
DateFormatBenchmark.testISOLocalDateDateTimeFormatter 124.872 ± 7.826 ns/op
DateFormatBenchmark.testISOLocalDateHandwritten 55.840 ± 4.378 ns/op
DateFormatBenchmark.testISOLocalDateTemporalFormatter 57.924 ± 4.189 ns/op
DateFormatBenchmark.testISOLocalDateTimeDateTimeFormatter 401.890 ± 14.730 ns/op
DateFormatBenchmark.testISOLocalDateTimeFormatHandWritten 118.367 ± 6.137 ns/op
DateFormatBenchmark.testISOLocalDateTimeTemporalFormatter 184.885 ± 31.351 ns/op
DateFormatBenchmark.testISOLocalTimeDateTimeFormatter 278.307 ± 13.701 ns/op
DateFormatBenchmark.testISOLocalTimeTemporalFormatter 108.594 ± 11.853 ns/op
DateFormatBenchmarkRolling.doNothing 20.332 ± 0.956 ns/op
DateFormatBenchmarkRolling.testISOLocalDateDateTimeFormatter 160.465 ± 8.276 ns/op
DateFormatBenchmarkRolling.testISOLocalDateHandwritten 100.228 ± 5.212 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTemporalFormatter 86.873 ± 4.858 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeDateTimeFormatter 483.456 ± 28.815 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeFormatHandWritten 160.186 ± 8.638 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeTemporalFormatter 245.035 ± 6.944 ns/op
DateFormatBenchmarkRolling.testISOLocalTimeDateTimeFormatter 329.288 ± 22.904 ns/op
DateFormatBenchmarkRolling.testISOLocalTimeTemporalFormatter 170.674 ± 24.618 ns/op
MultiDateFormatBenchmarkRolling.doNothing 20.417 ± 1.384 ns/op
MultiDateFormatBenchmarkRolling.testDateTimeFormatters 944.729 ± 62.230 ns/op
MultiDateFormatBenchmarkRolling.testTemporalFormatters 510.145 ± 27.717 ns/op

In addition, I tried running the same benchmarks under Graal. I personally haven't worked with Graal before, but Graal includes a number of new optimizations, and handles megamorphic calls a bit differently. I ran the same set of benchmarks, but using Graal Enterprise Edition version 22.1.0 for Java 11.

Benchmark Average Time
DateFormatBenchmark.testISOLocalDateDateTimeFormatter 64.105 ± 5.134 ns/op
DateFormatBenchmark.testISOLocalDateHandwritten 28.055 ± 2.166 ns/op
DateFormatBenchmark.testISOLocalDateTemporalFormatter 22.466 ± 1.270 ns/op
DateFormatBenchmark.testISOLocalDateTimeDateTimeFormatter 358.479 ± 28.265 ns/op
DateFormatBenchmark.testISOLocalDateTimeFormatHandWritten 53.167 ± 2.736 ns/op
DateFormatBenchmark.testISOLocalDateTimeTemporalFormatter 96.734 ± 10.700 ns/op
DateFormatBenchmark.testISOLocalTimeDateTimeFormatter 217.649 ± 14.642 ns/op
DateFormatBenchmark.testISOLocalTimeTemporalFormatter 72.224 ± 3.375 ns/op
DateFormatBenchmarkRolling.doNothing 20.482 ± 0.698 ns/op
DateFormatBenchmarkRolling.testISOLocalDateDateTimeFormatter 112.081 ± 8.518 ns/op
DateFormatBenchmarkRolling.testISOLocalDateHandwritten 65.407 ± 4.636 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTemporalFormatter 45.951 ± 3.057 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeDateTimeFormatter 414.813 ± 28.934 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeFormatHandWritten 122.953 ± 8.911 ns/op
DateFormatBenchmarkRolling.testISOLocalDateTimeTemporalFormatter 148.564 ± 7.588 ns/op
DateFormatBenchmarkRolling.testISOLocalTimeDateTimeFormatter 262.539 ± 10.207 ns/op
DateFormatBenchmarkRolling.testISOLocalTimeTemporalFormatter 109.813 ± 10.053 ns/op
MultiDateFormatBenchmarkRolling.doNothing 20.508 ± 0.887 ns/op
MultiDateFormatBenchmarkRolling.testDateTimeFormatters 814.970 ± 45.643 ns/op
MultiDateFormatBenchmarkRolling.testTemporalFormatters 377.463 ± 28.145 ns/op

Graal performs much better than Hotspot overall, but the TemporalFormatter still provided major speedups. Both compilers show that the TemporalFormatter mostly formats a date in less than half the time it takes the DateTimeFormatter to do so.

There are some obvious questions from the benchmarks. In particular, I'll want to look at the cases where the TemporalFormatters lag the handwritten formatters, to see if I can improve them. In general, I haven't put nearly enough effort or analysis into this project to claim that the generated formatters are close to optimal.

Status, Usability and Limitations

temporalformatter-compiler, and temporalformatter-core jar are available on maven, while I'm still fiddling with figuring out how to publish the generated jar for standard-formats. The TemporalFormatterCreator is functional, and passes a range of tests (suggestions for improvement welcome), as do the standardformat instances. Conveniently, we can use the standard library as an oracle--the goal is to produce the same output every time.

There are some format strings the standard library supports that this library doesn't (see tracking issue). Anyone using the library would still be advised to do a lot of testing before prod use, and for the time being, I'd recommend precompiling and testing a known set of good temporal formats, rather than using the full dynamic behavior.

I also might investigate specialized methods for formatting LocalDateTime, ZonedDateTime, and OffsetDateTime. Currently, the code generates repeated calls to TemporalAccessor#get(TemporalField) to extract components of the Date (year, month, day, etc). This requires some indirection and computation to ensure the TemporalAccessor supports the given field and extract the value. Working with the more specific classes would let the generated code use methods like OffsetDateTime#getYear that do simple field lookups without any additional logic. There are no interfaces that cover the getYear method for all these objects, meaning we'd generate more code, but it might well be worth it.

Thanks to Jonathan Lange, Alpha Chen, flippac and casually-blue for feedback.

Footnotes


  1. About half the time was spent making improvements to mako, about half on date formats--though much of the latter was a cycle of making a change, finding the compiler wouldn't handle it properly, then working on the compiler. Another large chunk was figuring out the convoluted maven publication cycle, especially how to handle the temporalformatter library, where one module depends on a jar of generated code from another module. In the end, I decided to hack that, and don't know how to do it "properly".

  2. It may always be megamorphic--this gets into inliner behavior I don't know.

  3. Using mako this way, mako generates the bytecode for a class, and writes it to a file, just as if we'd written source code and processed it with the Java compiler. So there is code generation, but it's done at build time.

  4. There is an unused boolean variable--it's a bug to be squashed later.

  5. The implementations drifted a tiny bit recently--it's tedious keeping them exactly in sync, and doesn't provide much value.