Rust Optimization: `Vec::into_iter().collect()`
from arendjr@programming.dev to rust@programming.dev on 10 Apr 2024 09:00
https://programming.dev/post/12598716

I just had a random thought: a common pattern in Rust is to things such as:

let vec_a: Vec<String> = /* ... */;
let vec_b: Vec<String> = vec_a.into_iter().filter(some_filter).collect();

Usually, we need to be aware of the fact that Iterator::collect() allocates for the container we are collecting into. But in the snippet above, we’ve consumed a container of the same type. And since Rust has full ownership of the vector, in theory the memory allocated by vec_a could be reused to store the collected results of vec_b, meaning everything could be done in-place and no additional allocation is necessary.

It’s a highly specific optimization though, so I wonder if such a thing has been implemented in the Rust compiler. Anybody who has an idea about this?

#rust

threaded - newest

Schmeckinger@feddit.de on 10 Apr 2024 09:18 next collapse

I don’t know and don’t think so, but what you are doing is better done with retain anyways.

arendjr@programming.dev on 10 Apr 2024 09:22 collapse

I mean, the actual operation is just an example, of course. Feel free to make it a .map() operation instead. The strings couldn’t be reused then, but the vector’s allocation still could… in theory.

myotherself@lemmings.world on 10 Apr 2024 09:31 collapse

map() can still be used with Vec::iter_mut(), filter_map() can be replaced with Vec::retain_mut().

arendjr@programming.dev on 10 Apr 2024 09:45 collapse

Yeah, that’s helpful if I would be currently optimizing a hot loop now. But I was really just using it as an example. Also, retain_mut() doesn’t compose as well.

I’d much rather write:

let vec_a: Vec<String> = /* ... */;
let vec_b: Vec<String> = vec_a
    .into_iter()
    .filter(some_filter)
    .map(some_map_fn)
    .collect();

Over:

let mut vec_a: Vec<String> = /* ... */;
vec_a.retain_mut(|x| if some_filter(x) {
    *x = some_map_fn(*x); // Yikes, cannot move out of reference.
    true
} else {
    false
});

And it would be nice if that would be optimized the same. After all, the point of Rust’s iterators is to provide zero-cost abstractions. In my opinion, functions like retain_mut() represent a leakiness to that abstraction, because the alternative turns out to not be zero cost.

taladar@sh.itjust.works on 10 Apr 2024 10:25 next collapse

blog.polybdenum.com/…/identifying-the-collect-vec… might be relevant to your question.

along with the related github.com/rust-lang/rust/issues/120091

arendjr@programming.dev on 10 Apr 2024 11:00 collapse

Thanks! That’s very much what I was looking for!

porgamrer@programming.dev on 10 Apr 2024 11:37 next collapse

Is it really fair to say retain doesn’t compose as well just because it requires reference-based update instead of move-based? I also think using move semantics for in-place updates makes it harder to optimise things like a single field being updated on a large struct.

It also seems harsh to say iterators aren’t a zero-cost abstraction if they miss an optimisation that falls outside what the API promises. It’s natural to expect collect to allocate, no?

But I’m only writing this because I wonder if I haven’t understood your point fully.

(Side note: I think you could implement the API you want on top of retain_mut by using std::mem::replace with a default value, but you’d be hoping that the compiler optimises away all the replace calls when it inlines and sees the code can’t panic. Idk if that would actually work.)

arendjr@programming.dev on 10 Apr 2024 13:04 next collapse

The composability doesn’t have much to do with whether it’s a reference or a move, it’s because it bypasses usage of the Iterator methods. Iterators chains can consist of filter, map and other operations, which allows various functions and/or closures to be composed together. Whereas with retain_mut() there isn’t really a chain and functions you may otherwise use in an iterator chain become harder to (re)use.

arendjr@programming.dev on 10 Apr 2024 13:09 collapse

It also seems harsh to say iterators aren’t a zero-cost abstraction if they miss an optimisation that falls outside what the API promises. It’s natural to expect collect to allocate, no?

You’re right, I wouldn’t say iterators aren’t a zero-cost abstraction. But most abstractions are also leaky – it’s just the extent in which the leakiness is exposed that makes them more or less effective. As such, saying to just use retain_mut instead of the iterator approach highlights the shortcoming of the abstraction. But if the same results could be achieved while still using the same iterator, that would make that abstraction more useful and consistent. And that’s great, because that means we need to worry less when using iterators :)

myotherself@lemmings.world on 10 Apr 2024 11:53 collapse

To be fair, these alternatives are also limited to the case where the item type stays the same.

myotherself@lemmings.world on 10 Apr 2024 09:20 next collapse

I think the better solution would be to use Vec::retain().

Gobbel2000@programming.dev on 10 Apr 2024 10:27 next collapse

This blog post goes into some specifics of Rust reusing Vec allocations and some of the consequences. I think it’s really worth a read to better understand Vecs. From what I understand, it is possible that Rust will reuse the allocation of vec_a in your case, but it ultimately is quite complicated.

arendjr@programming.dev on 10 Apr 2024 10:59 collapse

That was a super interesting and informative read! Exactly what I was hoping to find when I posted this, thanks!

zemja@programming.dev on 10 Apr 2024 15:47 next collapse

I thought this was the point of vec.drain(…).collect().

Vorpal@programming.dev on 11 Apr 21:27 collapse

The standard library does have some specialisation internally for certain iterators and collection combinations. Not sure if it will optimise that one specifically, but Vec::into_iter().collect::<Vec>() is optimised (it may look silly, but it comes up with functions returning impl Iterator