Heapless (data structures for embedded) v0.9.1 has been released!
(blog.rust-embedded.org)
from sgued@programming.dev to rust@programming.dev on 21 Aug 06:57
https://programming.dev/post/36059237
from sgued@programming.dev to rust@programming.dev on 21 Aug 06:57
https://programming.dev/post/36059237
This releases includes some pretty nice improvements to the usage of the crate.
If you want to know how the View
types I talk about in the release post are built, take a look at my post from back when I contributed them:
threaded - newest
There is some irony in
heapless::BinaryHeap
.Cool library though.
alloca?
Not really. Heapless uses compile time sized backing buffers to implement Vec, string etc with a max upper size. You would typically use heapless with a statically allocated variable, but it is possible to use it on the stack too.
Alloca is different and allocates a dynamically sized block on the stack. Rust doesn’t really support alloca, but there is a crate for it that works by calling through a helper function in C: lib.rs/crates/alloca
Awww. why doesn’t rust do alloca actually?
Alloca for buffers is generally a bad idea:
The only reasonable usage I have seen is in the Swift ABI. See this blog post on the topic by a rust compiler developer: faultlore.com/blah/swift-abi/ (and even there it is only for some cases that it can be used).
First of all: alloca is fun.
Second of all: sub is my favourite allocator
Third of all: doesn’t llvm literally have an alloca primitive for stack allocation? Like it should handle it, no?
Fourth of all: second point is literally a skill issue idk, especially if your compiler is already proving bounds anyway.
Fifth of all: oh damn, that swift thing looks cool!
Sixth of all: don’t take any of this too seriously
Seventh of all: but like actually tho, the nontrivial alloca case not being easily optimizable does not mean that we can’t optimize the bounded case, no?
Eight of all: ig rust doesn’t really do that kind of low-level access.
Ninth of all: this reminds me of those dynamic stack array warcrime gcc extensions, like shit’s craazy, you just size a stack array with an int from stdin.
Tenth level: I suppose the main benefit of this over a stack allocator allocator interface is compile time bounds checks.
Eleventh stage: and maybe it doesn’t quite fit the allocator interface?
12th verse: I’m sane, I promise
Hmm…
As to LLVM and alloca, it doesn’t optimise or even work well in practise. Some basic cases work, others are less well tested. There are lots of “should” that “doesn’t” in practice in LLVM.
I have not looked at alloca in LLVM myself but from what I have heard from those who are experts on this, it is quite brittle.
docs.rs/bumpalo/latest/bumpalo/ (and bump allocators in general).
In general proving bounds for stack growth is very difficult. With recursion it is undecidable. This follows directly from Rice’s Theorem. (This is my favourite theorem, it is nice to know that something is impossible rather than a skill issue.)
(Of course you could have a static analyser that instead of yes/no returns yes/no/don’t know, and then you assign don’t know to be either of the other classes depending on if you care more about false positives or false negatives. This is how the rust borrow checker works: forbid if it can’t prove it is safe, but there will be safe code that it doesn’t allow.)
I always felt “for embedded” was always selling this crate short, or rather pushing potential users away who aren’t aware some data structures here are useful in non-embedded scenarios.
I for example use the admittedly simple
Deque
(with itsis_full()
) method in one of my projects for pushing/popping tasks to/from a fixed-size thread queue, which one could argue is a very unembedded-like use-case 😉.The limitation of needing compile-time knowledge of the max size of the
Deque
is pretty rough though.In this case, it seems like a feature.
It does make me wonder why not use a bounded channel instead (assuming these tasks are shared between threads, maybe because it’s multi-consumer?) but a deque is more flexible if that flexibility is needed.
Personally, I can think of a use for this myself. I have a project where I’m queuing audio to play in the background, and using this kind of deque for the queue would work here as well (though I use a bounded channel currently).
There are also a lot of times when i’ve wanted a stack-only data structure to avoid allocations where these can theoretically come in.
Also worth noting is smallvec and compact_str crates. These are useful when you most of the time have small data that you want inline, but are OK with falling back to heap allocation for the occasional outlier.
I have used both inside structs to optimise cache usage. For these uses i tend to use rather short smallvec.
And smallvec in particular is also useful on the stack in a larger variant in hot functions where you might have a Vec that almost always is less than (say) 32 elements, but the program should gracefully handle the occasional long case. I found this offered a small speed up in some batch data processing code I wrote.