Simple Binary Formats and Terrible Hacks
Last weekend me and my friend Marc went to TerribleHack III and made Dayder, a neat little website for finding spurious correlations in lots of time series data sets. I did the ingestion of our initial data set of causes of death over time, as well as the JS/HTML front end. Marc made the correlation finding web server in Rust and also did the final prettying up of the CSS. I’m quite proud of how well it turned out given that it was made in 12 hours.
The coolest part of Dayder is how fast it is. All the DOM and JS Canvas rendering code is custom built for rendering hundreds of graphs in milliseconds. Marc and I also designed a custom simple binary format for storing time series data in a compact way. We called the format btsf and it is a key reason why our app can quickly send tons of time series data sets to the client as well as store them on the server in a compact way. All 6591 time series fit in less than 1 megabyte of data, allowing them all to be sent to the client for instantaneous filtering.
The following week I gave a short talk at a UWaterloo CS Club event about simple binary formats and how they can make your project faster, easier and cooler:
Now that I’ve used simple binary formats for both Rate With Science and Dayder, I’m a big fan. Although outside a hackathon context where I have time to learn libraries and where I don’t have the incentive to design new formats for fun, I think I would probably go with something established like Cap’n Proto or Thrift instead of a custom format.