What are you working on this week? (June. 30, 2024)
from secana@programming.dev to rust@programming.dev on 30 Jun 17:34

Hi rustaceans! What are you working on this week? Did you discover something new, you want to share?


threaded - newest

G0ldenSp00n@lemmy.jacaranda.club on 30 Jun 19:05 next collapse

Been working through Andrej Karpathy’s ML lectures in Rust. The backprop one went pretty well, but I had to learn how to do type indirection and interior mutabilty because of the backprop graph structure. I’m now on the makemore lecture, but having a lot of trouble building the bi-gram model in Burn (the rust native ML framework), because it seems like directly incrementing the tensor values is insanely slow. His example that takes like 10 seconds to run in Python takes two and a half minutes in Rust with Burn, so trying to figure out how to optimize or speed that up.

secana@programming.dev on 30 Jun 20:01 collapse

How is the overall ML story with Rust? Is it usable in comparison to Python?

G0ldenSp00n@lemmy.jacaranda.club on 30 Jun 21:12 collapse

So far I have only really scratch built backprop. And had severe performance problems with Burn trying to do something it probably wasn’t built to do. Once I get further in makemore I should have a better idea.

secana@programming.dev on 01 Jul 06:27 collapse

I’m not ML pro and never used Python or Rust for it, but I know that our ML team uses Python extensively for it. My gut feeling is that Python stays the king in the ML field but the underlying libraries are going to progress from C++ to Rust in the future. Or at least, if Rust gets stronger math/statistics libraries. If you get something cool running with Rust and ML, I’m interested to read about it.

secana@programming.dev on 30 Jun 20:02 next collapse

I ported the frontend for kellnr.io from vuex to pinia, which makes the code to hold state in the frontend much cleaner.

onlinepersona@programming.dev on 30 Jun 21:53 collapse

Every post, I see you commenting progress. Respectable! Do you work on kellnr fulltime?

Anti Commercial-AI license

secana@programming.dev on 01 Jul 06:24 collapse

Unfortunately not. But I try to work on it a few hours every week in my spare time. I think that having an easy and free crate registry is crucial for the adaption of Rust in the commercial space. Companies don’t want to share their code publicly on crates.io. My full time job is in the IT security sector. My hope is that by pushing Rust as a safe language, we can close some fundamental design flaws that languages like C/C++ introduced and make software landscape more secure.

onlinepersona@programming.dev on 01 Jul 07:38 collapse

With all the work you put in, I hope you’ll be able to find sponsors for the project. It is an admirable goal 👍

Anti Commercial-AI license

tuna@discuss.tchncs.de on 01 Jul 09:36 collapse

More progress on the Finite Projective Plane (incidence matrix) generation from last week. There already exists an algorithm to generate boards of order p+1 where p is prime. It is stateless, so with CUDA we can generate huge boards in seconds since all you need is the x, y position and board size. 258x258 under 3s!

However, p+1 isn’t the only sequence. It seems by our observations that the fermat numbers also generate valid boards, using our “naïve” algorithm.

Unfortunately 3x3, 5x5, and 17x17 might not contain all the nuggets of generality to find a nice algorithm like the p+1, so we’re gonna generate the next up: 257x257. We’ve been improving the naïve algorithm since it is too slow. (The resulting image would be 65793x65793)

  • Rather than allocating the 2d boolean grid, we represent where the true elements would be using row and column indexes. This is okay because of the constraint which limits how many true elements can be in a row/column
    • benefit 1 — less memory usage: “O(2n)” vs O(n²) ((for 257x257: 129MiB vs 4GiB))
    • benefit 2 — faster column-major lookups (flamegraph spent a lot of time sitting in iterators)
    • overall speedup: about 2.7x
  • Speed up index lookup with binary search
    • The index list is sorted by nature. To exhaustively check a dot is valid, it checks n² spots in 2 lists of size n. Slightly more expensive than the grid given the 2 index lists. Rather than slice::contains, use slice::binary_search(…).is_ok()
    • overall speedup: about 2.1x

Next steps:

  • Assume a square grid and exploit its diagonal symmetry to treat row lookups as column lookups
  • Use multi threading to gain a partial speedup
    • Essentially if row 1 is 50% completed, row 2 can be up to 50% completed.
    • I think you get different speeds depending whether the threads and symmetry folds are both row/column major or one is row-major and the other is column-major. My gut says both need to be aligned because there’s less waiting involved.
tuna@discuss.tchncs.de on 01 Jul 18:30 collapse

Another optimization:

  • The first index of each index array can be filled by a function. For 257x257 that would remove 8,487,169 checks out of… 2,164,392,321. Not much, but it’s basically a free optimization, so might as well!