Today, we’re excited to share initial results from some big performance initiatives we've been working on, including 90% reductions in frontend page load time for large projects, up to 10x improvements to execution speeds for projects without Python code, and more!
These are meaningful improvements, but we know there’s a lot of work still to be done. We deeply understand this and have entire teams internally that continue to be dedicated to making Hex faster 🫡.
😴 Lazy Dataframes
Certain types of projects now execute up to 5-10x faster by skipping unnecessary pandas DataFrame creation.
We’ve rolled out a complete overhaul of the fundamental data interchange format cells in Hex use to store and manipulate data.
Instead of materializing slow, inefficient pandas DataFrames into memory to power our no-code cells, our new lazy loading architecture stores data as Arrow files in S3 and provides a lightweight “Lazy Dataframe” stub for cells to reference and pass around. When cells need to run operations on a dataset, they issue SQL queries directly against the remote arrow files using DuckDB.
This means data is never sent to the Python kernel unless it’s required by a Python cell. No slow, unnecessary dataframes are created, and most importantly: everything still works exactly the same way with no noticeable changes for users.
Well, other than speed! This new architecture has massive performance benefits, specifically for large projects that operate on large datasets without Python. Across three test projects, we saw average runtimes go from:
- Specific performance test project: 12s → 1s
- Real internal project: 15-20s → 3-5s
- A critical customer report: >2min → 30s
The under-the-hood details are pretty neat here! Read all about them on the blog: Optimizing Multi-Modal Analysis by Lazy Loading Dataframes
🫥 Frontend Virtualization Improvements
Big projects (100+ cells) load and render up to 90% faster.
This video shows a project with 1408 cells (!) scrolling quickly and behaving normally. This wouldn't have been possible with our previous system!
We’ve made significant updates to the virtualization engine that powers large projects, to keep the UI snappy and quick even when projects get up to hundreds of cells.
Testing against a particularly large Hex project, our updated virtualization method resulted in:
- 90% reduction in initial notebook render time
- 33% reduction in UI interaction lag
- 14% reduction in render lag during scrolling
- 10% reduction in overall memory usage
🍿 Fast Kernel Startup
tl;dr: Fresh project runs are up to 10s quicker, thanks to major optimizations on kernel startup time.
We’ve made many improvements to the way we initialize Python kernels for projects, bringing p95 kernel startup overhead from 12s down towards 1s.
Many of these fixes were little optimizations on already fairly quick workloads. For example, we reduced the time needed to locate a particular kernel in our network from 200ms → 10ms. We also found some big wins though, saving up to 8s on every kernel initialization by optimizing the way we spun up their initial code.
We’ll keep looking for improvements big and small here!
🧭 Faster Explores
Exploring from charts is 5-10x quicker, unburned by the weight of ...the ancestors?
When you Explore off a chart, we used to obtain and execute all of the “ancestor cells” of that chart in order to create the data needed to power the Explore. This could make exploring really slow, depending on how a particular project was structured.
We can now instantly fetch just the immediate dataset required to power an explore, rather than derive it on the fly from all the ancestor cells.
The impact of this is variable depending on how slow or numerous those upstream cells are, but it can be quite significant. On a real internal project I used for testing, my explore load time went from 30+ seconds to around 5 seconds.
🏎️ Advanced Compute Profiles (beta)
We’ve done a lot of work to make our code run faster, but unfortunately we can’t make your code run faster. Oh wait, no, we totally can— if you’re willing to pay for it!
We have two new heavy duty compute profiles available in closed beta, with usage-based billing:
2XL: 8 CPUs, 64 GB memory
4XL: 16 CPUs, 128 GB memory
These larger profiles will let you work with larger datasets and parallelize heavy workloads. We will be experimenting with other profiles in the future, including GPUs.
If you’re interested in joining the beta, please fill out this form. And if you are a Hex user, but still have to run a workload in another environment because of compute requirements, please let us know! We want to meet all your compute needs.
📊 Run Stats (beta)
Detailed execution data is now available for projects, so you can easily identify bottlenecks and slow queries.
You’ll find Run Stats in the Run dropdown menu, where you’d go to stop or restart a kernel. It gives you detailed execution timing for every cell and the entire project, as well as some of the internal things I’ve mentioned above like kernel initialization.
This should make it a lot easier to pluck out slow queries, long Python blocks, or anything else that seems out of sorts.
Pro-tip: Combine this with the graph viewer to get a really detailed sense of a project’s execution flow and performance!
🚀 Onwards
We’re really excited about the headway we’ve made so far, but our quest is not even close to over. Perf work is like whack a mole, or maybe an ultramarathon. Perhaps one of those bootcamps where you have to crawl in the mud under barbed wire while someone hollers at you.
Either way, we’re committed to keep improving Hex’s performance on all fronts, from tiny kernel initialization gains to major execution overhauls and frontend rendering improvements.
Please continue to report places where things feel sluggish, because it’s our best data for making improvements. All the things you just read about are directly based off of example projects sent in by users!
Until next time 🖖.