Analyzing Data 180,000x Faster with Rust (
from to on 21 Oct 2023 04:00 +0000

#rust on 21 Oct 2023 07:34 +0000 next

Referring to lazy Python as “baseline” is a joke. on 21 Oct 2023 08:52 +0000 next

But isn’t it kind of obvious that if you are able to do 180k times improvement, then the baseline is probably not very impressive to begin with. Still, that doesn’t take away that the optimizations were impressive, and that it was interesting to read about it. on 21 Oct 2023 09:07 +0000

I think your last sentence has one negation too much.

If it was interesting to read about it, then the criticism did not take away that the optimizations were impressive. on 21 Oct 2023 09:16 +0000

Fixed it… I come from a language culture were we like our negations :) Also, not native english speaker, so combine the two and you are in for a ride! on 21 Oct 2023 12:59 +0000 next

With rust is the joke as if you couldn't do it otherwise. Maybe c would be only 179,999x faster, or FORTRAN 180,001x, (numbers made up). Python could probably be made 60,000x faster as well. on 21 Oct 2023 18:16 +0000 next

Yet it’s a fine baseline. The actual speedup for switching to rust was 8x, the rest was all about changing data structures, using SIMD, parallelism and batching. on 22 Oct 2023 08:05 +0000

I think it’s a great baseline. Within academic context, Python (and perhaps Matlab) are extremely common for data analysis. I doubt many would transition code to other languages unless strictly needed such as the case in the article. Showing how to “simply” speed up code like the article does is a great way to snag speed even if you don’t analyze timing, and just replicate steps from this article.

Having done stuff myself as part of research, and having people I know go from developer jobs to research jobs, I can safely say scientists generally do not make good code. Regardless of language. An article like this gives good steps to take from start to end, and would be a valuable tool in a possible transition to better code. on 22 Oct 2023 23:09 +0000

I can absolutely confirm. I work in a specialized industry where we have a team of PhDs in our R&D team that writes quick and dirty code in Fortran. That’s what they know, and it’s what they’re most productive with.

Our production code is in Python. We took one of their solutions and made it way faster, and the main improvement was to restructure the code to not need everything in memory at once. Basically, they were processing data in 4D space, but only needed 3D worth of data at a time, and most of the time was being spent in memory allocation. So we drastically dropped memory usage and memory allocation by simply changing how they iterated, which also meant we could parallelize it, further increasing performance.

They’re paid to come up with the algorithms, I’m paid to make it run fast in production. We looked into Rust, but for this project, Python got us well within a reasonable running time and Rust would’ve requested retaining a lot of our team, but it’s still on the table if we need more performance. on 22 Oct 2023 17:06 +0000

Yeah, this one really had me scratching my head:

✓Note: there are lots of ways we could make the Python code faster, but the point of this post isn’t to compare highly-optimized Python to highly-optimized Rust. The point is to compare “standard-Jupyter-notebook” Python to highly-optimized Rust. on 21 Oct 2023 07:47 +0000 next

Move infinitly faster with teleportation!!!

Walking baseline on 21 Oct 2023 18:05 +0000 next

Their actual speedup for switching languages was 8x. The rest was all about using better data structures and parallelism. on 22 Oct 2023 19:10 +0000

Would be interesting to see how fast polars (a dataframe library written in rust) would be as it can be used in python.