Speeding Up Java Date Formatting With Code Generation
I previously introduced mako, my library for writing high level code and compiling it to JVM bytecode. That introduction used the recursive fiboacci algorithm, which is convenient, but doesn't represent a real use case. A bit later, I realized that implementing datetime formatting would be a simple but interesting use case. The result is a new library, TemporalFormats, that supports formatting dates and times as strings.
I said simple, but that was about 6 weeks and 85 hours of work ago. 1 For one thing, didn't realize how many more features mako needed, for another, I thought date formats were simpler than they were.
Regardless, date formatting is appealing because it's ubiquitous and mako should be able to speed it up. Bytecode generation is one of the main metaprogramming tools on the JVM. Frameworks like Spring or Hibernate use it extensively for proxies, interceptors, aspect oriented programming, etc. My own current interest is using code generation for performance. Whenever code builds a little interpreter, like a regex engine, it's possible that generating classes to do that could perform better.
The Java 8 DateTimeFormatter
class is effectively a mini interpreter that dispatches to various other objects to handle pieces of the date format. The core operation comes from DateTimeFormatterBuilder$CompositePrinterParser#format
.
@Override
public boolean format(DateTimePrintContext context, StringBuilder buf) {
int length = buf.length();
if (optional) {
context.startOptional();
}
try {
for (DateTimePrinterParser pp : printerParsers) {
if (pp.format(context, buf) == false) {
buf.setLength(length); // reset buffer
return true;
}
}
} finally {
if (optional) {
context.endOptional();
}
}
return true;
}
printerParsers
has type DateTimePrinterParser[]
, and CompositePrinterParser
is itself an instance of a DateTimePrinterParser
. So the call is recursive, and the call will end up resolving to many different subclasses' methods at runtime (this is known as a megamorphic call-site, and it changes the way the JIT compiler optimizes code).2 That combination makes it plausible that we can improve on the DateTimeFormatter
class with code generation. If I were a better scientist, I'd have analyzed how the compiler handles the DateTimeFormatter
, but I didn't even read that code until well after I'd started this project. Luckily, I guessed right.
Implementation
The library defines a TemporalFormatter
interface, with one method, TemporalFormatter#format(TemporalAccessor)
(no parsing yet). TemporalFormatter
plus a handful of helper methods are defined in the temporalformatter-core
module. That separation lets users distribute generated TemporalFormatter
implementations, without including the libraries for runtime code generation.3
Instances of TemporalFormatter
are created by the TemporalFormatterCreator
class. The TemporalFormatterCreator
operates on a List<FormatSpecifier>
(or a string like ~"yyyy-MM-dd"~ that's parsed into a List<FormatSpecifier>
instances). For each FormatSpecifier
the TemporalFormatCreator
emits code into the generated format
method.
Here's the decompiled code of a generated TemporalFormatter
instance (the ISOOffsetDateTime
class):4
public String format(TemporalAccessor var1) {
boolean var10000 = true;
StringBuilder var2 = new StringBuilder();
TemporalFormatterComponents.formatyyyy(var1, var2);
var2.append(DASH);
TemporalFormatterComponents.formatMM(var1, var2);
var2.append(DASH);
TemporalFormatterComponents.formatdd(var1, var2);
var2.append(T);
TemporalFormatterComponents.formatHH(var1, var2);
var2.append(COLON);
TemporalFormatterComponents.formatmm(var1, var2);
var2.append(COLON);
TemporalFormatterComponents.formatss(var1, var2);
var2.append(DOT);
TemporalFormatterComponents.formatSSS(var1, var2);
this.appendOffset(var2, var1);
return var2.toString();
}
// this method could also be moved to TemporalFormatterComponents,
// and I probably will do that in the future
public StringBuilder appendOffset(StringBuilder var1, TemporalAccessor var2) {
int var3 = Math.toIntExact((long)var2.get(ChronoField.OFFSET_SECONDS));
if (var3 == 0) {
return var1.append(ISOOffsetDateTime.Z);
} else {
if (var3 < 0) {
var1.append(ISOOffsetDateTime.DASH);
}
if (var3 > 0) {
var1.append(ISOOffsetDateTime.PLUS);
}
int var4 = Math.abs(var3 / 3600 % 100);
var1.append(var4 / 10);
var1.append(var4 % 10);
var1.append(ISOOffsetDateTime.COLON);
int var5 = Math.abs(var3 / 60 % 60);
var1.append(var5 / 10);
var1.append(var5 % 10);
return var1;
}
}
The code in TemporalFormatterComponents
is distributed as part of the library, not generated code. It contains reusable pieces of formatting code that can be called from generated classes.
// there are about a half dozen methods like these in this class
public static StringBuilder formatyyyy(TemporalAccessor time, StringBuilder sb) {
int year = time.get(ChronoField.YEAR) % 10000;
if (year < 1000) {
sb.append('0');
if (year < 100) {
sb.append('0');
if (year < 10) {
sb.append('0');
}
}
}
return sb.append(year);
}
The distinction between generated code and library code is flexible. The first iteration of TemporalFormats
only used code generation, until I realized I should limit code generation to only the parts that had to be dynamic. Mako aims to be easier than ASM, but it takes perhaps twice as much code to do something by generating methods as it does to write the equivalent Java code.
Benchmarks
I wrote a handful of JMH benchmarks to see how the TemporalFormatter
performed relative to the DateTimeFormatter
.
The DateFormatBenchmark
class contains tests comparing a standard library DateTimeFormatter
to a handwritten implementation and a TemporalFormatter
. Anything I do in the TemporalFormatter
can be done by hand, so we should expect similar performance between the two implementations.5
In DateFormatBenchmark
the date used is constant during the benchmark (the benchmark initializes a ZonedDateTime
with the system default time zone at startup time). Depending on what that date is, certain branches of the formatting code will never be hit, which affects the decisions the JIT compiler makes. So in DateFormatBenchmarkRolling
, each iteration updates the ZonedDateTime
by 111111111111229 nanoseconds (just over 1.1 second, prime).
In MultiDateFormatBenchmarkRolling
, we follow the same pattern of rolling dates forward, but also include multiple DateTimeFormatter
or TemporalFormatter
instances in the test.
Benchmarks were run on an i7-7500U CPU @ 2.70GHz
laptop, with openjdk 11.0.14.1 2022-02-08
, on Fedora 34.
Benchmark | Time |
---|---|
DateFormatBenchmark.testISOLocalDateDateTimeFormatter | 124.872 ± 7.826 ns/op |
DateFormatBenchmark.testISOLocalDateHandwritten | 55.840 ± 4.378 ns/op |
DateFormatBenchmark.testISOLocalDateTemporalFormatter | 57.924 ± 4.189 ns/op |
DateFormatBenchmark.testISOLocalDateTimeDateTimeFormatter | 401.890 ± 14.730 ns/op |
DateFormatBenchmark.testISOLocalDateTimeFormatHandWritten | 118.367 ± 6.137 ns/op |
DateFormatBenchmark.testISOLocalDateTimeTemporalFormatter | 184.885 ± 31.351 ns/op |
DateFormatBenchmark.testISOLocalTimeDateTimeFormatter | 278.307 ± 13.701 ns/op |
DateFormatBenchmark.testISOLocalTimeTemporalFormatter | 108.594 ± 11.853 ns/op |
DateFormatBenchmarkRolling.doNothing | 20.332 ± 0.956 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateDateTimeFormatter | 160.465 ± 8.276 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateHandwritten | 100.228 ± 5.212 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTemporalFormatter | 86.873 ± 4.858 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeDateTimeFormatter | 483.456 ± 28.815 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeFormatHandWritten | 160.186 ± 8.638 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeTemporalFormatter | 245.035 ± 6.944 ns/op |
DateFormatBenchmarkRolling.testISOLocalTimeDateTimeFormatter | 329.288 ± 22.904 ns/op |
DateFormatBenchmarkRolling.testISOLocalTimeTemporalFormatter | 170.674 ± 24.618 ns/op |
MultiDateFormatBenchmarkRolling.doNothing | 20.417 ± 1.384 ns/op |
MultiDateFormatBenchmarkRolling.testDateTimeFormatters | 944.729 ± 62.230 ns/op |
MultiDateFormatBenchmarkRolling.testTemporalFormatters | 510.145 ± 27.717 ns/op |
In addition, I tried running the same benchmarks under Graal. I personally haven't worked with Graal before, but Graal includes a number of new optimizations, and handles megamorphic calls a bit differently. I ran the same set of benchmarks, but using Graal Enterprise Edition version 22.1.0 for Java 11.
Benchmark | Average Time |
---|---|
DateFormatBenchmark.testISOLocalDateDateTimeFormatter | 64.105 ± 5.134 ns/op |
DateFormatBenchmark.testISOLocalDateHandwritten | 28.055 ± 2.166 ns/op |
DateFormatBenchmark.testISOLocalDateTemporalFormatter | 22.466 ± 1.270 ns/op |
DateFormatBenchmark.testISOLocalDateTimeDateTimeFormatter | 358.479 ± 28.265 ns/op |
DateFormatBenchmark.testISOLocalDateTimeFormatHandWritten | 53.167 ± 2.736 ns/op |
DateFormatBenchmark.testISOLocalDateTimeTemporalFormatter | 96.734 ± 10.700 ns/op |
DateFormatBenchmark.testISOLocalTimeDateTimeFormatter | 217.649 ± 14.642 ns/op |
DateFormatBenchmark.testISOLocalTimeTemporalFormatter | 72.224 ± 3.375 ns/op |
DateFormatBenchmarkRolling.doNothing | 20.482 ± 0.698 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateDateTimeFormatter | 112.081 ± 8.518 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateHandwritten | 65.407 ± 4.636 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTemporalFormatter | 45.951 ± 3.057 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeDateTimeFormatter | 414.813 ± 28.934 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeFormatHandWritten | 122.953 ± 8.911 ns/op |
DateFormatBenchmarkRolling.testISOLocalDateTimeTemporalFormatter | 148.564 ± 7.588 ns/op |
DateFormatBenchmarkRolling.testISOLocalTimeDateTimeFormatter | 262.539 ± 10.207 ns/op |
DateFormatBenchmarkRolling.testISOLocalTimeTemporalFormatter | 109.813 ± 10.053 ns/op |
MultiDateFormatBenchmarkRolling.doNothing | 20.508 ± 0.887 ns/op |
MultiDateFormatBenchmarkRolling.testDateTimeFormatters | 814.970 ± 45.643 ns/op |
MultiDateFormatBenchmarkRolling.testTemporalFormatters | 377.463 ± 28.145 ns/op |
Graal performs much better than Hotspot overall, but the TemporalFormatter
still provided major speedups. Both compilers show that the TemporalFormatter mostly formats a date in less than half the time it takes the DateTimeFormatter to do so.
There are some obvious questions from the benchmarks. In particular, I'll want to look at the cases where the TemporalFormatters
lag the handwritten formatters, to see if I can improve them. In general, I haven't put nearly enough effort or analysis into this project to claim that the generated formatters are close to optimal.
Status, Usability and Limitations
temporalformatter-compiler
, and temporalformatter-core
jar are available on maven, while I'm still fiddling with figuring out how to publish the generated jar for standard-formats
. The TemporalFormatterCreator
is functional, and passes a range of tests (suggestions for improvement welcome), as do the standardformat
instances. Conveniently, we can use the standard library as an oracle--the goal is to produce the same output every time.
There are some format strings the standard library supports that this library doesn't (see tracking issue). Anyone using the library would still be advised to do a lot of testing before prod use, and for the time being, I'd recommend precompiling and testing a known set of good temporal formats, rather than using the full dynamic behavior.
I also might investigate specialized methods for formatting LocalDateTime
, ZonedDateTime
, and OffsetDateTime
. Currently, the code generates repeated calls to TemporalAccessor#get(TemporalField)
to extract components of the Date (year, month, day, etc). This requires some indirection and computation to ensure the TemporalAccessor
supports the given field and extract the value. Working with the more specific classes would let the generated code use methods like OffsetDateTime#getYear
that do simple field lookups without any additional logic. There are no interfaces that cover the getYear method for all these objects, meaning we'd generate more code, but it might well be worth it.
Thanks to Jonathan Lange, Alpha Chen, flippac and casually-blue for feedback.
Footnotes
About half the time was spent making improvements to mako, about half on date formats--though much of the latter was a cycle of making a change, finding the compiler wouldn't handle it properly, then working on the compiler. Another large chunk was figuring out the convoluted maven publication cycle, especially how to handle the temporalformatter library, where one module depends on a jar of generated code from another module. In the end, I decided to hack that, and don't know how to do it "properly".↩
It may always be megamorphic--this gets into inliner behavior I don't know.↩
Using mako this way, mako generates the bytecode for a class, and writes it to a file, just as if we'd written source code and processed it with the Java compiler. So there is code generation, but it's done at build time.↩
There is an unused boolean variable--it's a bug to be squashed later.↩
The implementations drifted a tiny bit recently--it's tedious keeping them exactly in sync, and doesn't provide much value.↩