Append Vec to BinaryHeap?

Neutron3529 · August 22, 2021, 6:30pm

AFAIK, there only exists a function that append BinaryHeap to a BinaryHeap, but its implememtation is just append a Vec to BinaryHeap. If someone really want to append a vec to the BinaryHeap, the conversion from Vec to BinaryHeap cost some time and it is a totally waste since the source code is

    #[stable(feature = "binary_heap_append", since = "1.11.0")]
    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }

        let start = self.data.len();

        self.data.append(&mut other.data);//regard `&mut other.data` as a `Vec`, rather than a `BinaryHeap`.

        self.rebuild_tail(start);// assume self.data[0..start] is a vaild heap.
    }

things might be better changing the function to

    #[stable(feature = "binary_heap_append", since = "1.11.0")]
    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }
        self.append_vec(&mut other.data)
    }
    pub fn append_vec(&mut self, vec: &mut Vec<T>){ // not tested, maybe something simillar like what I've shown below.
    // T is defined in `impl <T:Ord> for BinaryHeap`, no need to re-define it.
        let start = self.data.len();
        self.data.append(vec);
        self.rebuild_tail(start);
    }

Is it worth a PR or just a Pre-RFC?

If you find it could be an acceptable choice, please help me submit the PR and leave a message here. Due to some reason, I can hardly connect to github. I would be appreciate that you help me submit this possible PR.

Anyway, thank you for reading:)

steffahn · August 22, 2021, 7:07pm

You’ll usually use the .extend(…) method to append from a different collection. In order to do it with a &mut Vec<T> and keep it’s allocation and capacity intact, you can use Vec::drain. I.e. your proposed heap.append_vec(&mut vector) would behave similar to or the same as heap.extend(vector.drain(..)).

use std::collections::BinaryHeap;

pub fn append_binary_heap_with_vec<T: Ord>(heap: &mut BinaryHeap<T>, vector: &mut Vec<T>) {
    heap.extend(vector.drain(..));
}

(in the playground)

SkiFire13 · August 22, 2021, 8:16pm

However looks like extending a BinaryHeap with a Drain falls back to just pushing each element, which may be much slower than rebuilding the heap if the Drain contained a lot of elements.

steffahn · August 22, 2021, 8:59pm

Good catch. Maybe we should simply change

    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let iterator = iter.into_iter();
        let (lower, _) = iterator.size_hint();

        self.reserve(lower);

        iterator.for_each(move |elem| self.push(elem));
    }

to look more like

    pub fn append(&mut self, other: &mut Self) {
        if self.len() < other.len() {
            swap(self, other);
        }

        let start = self.data.len();

        self.data.append(&mut other.data);

        self.rebuild_tail(start);
    }

i.e. something like:

reserve lower
for_each: push to the underlying vec
finally, rebuild_tail

Regarding panic-safety, we don’t need leave the heap in a valid state (w.r.t. the ordering) in case of a panic.

Actually… we could just use extend on the Vector, right?

    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let start = self.data.len();

        self.data.extend(iter);

        self.rebuild_tail(start);
    }

CAD97 · August 23, 2021, 12:59am

There's also the possibility of "just" specializing to do the efficient thing here.

Neutron3529 · August 23, 2021, 4:18am

That could be a really cool choice

..Since I finally figure out that, for std::Vec, .extend seems as fast as .copy_from_slice.

This question could be mark as solved if a PR is submitted to github.

But... I just found another problem: why we couldn't operate private fields or methods?

The first solution for me is, try unsafe{BinaryHeap.data.extend(iter)}, I know operate private field could be very unsafe, but I know what I am doing..

Why couldn't we operate private field even with the unsafe block?

access to private field, at least, is safer than std::mem::transmute, why we have the latter one?

scottmcm · August 23, 2021, 4:59am

BinaryHeap::append already has heuristics for how to do that; updating extend to do the same seems perfectly reasonable. (Though generally-descending vs unknown-order can affect things slightly.)

Because it's important to let libraries change internal implementation details without breaking compilation. It's unacceptable to say that you can't even rename private fields without a major version bump.

SkiFire13 · August 23, 2021, 8:12am

steffahn:

    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let start = self.data.len();

        self.data.extend(iter);

        self.rebuild_tail(start);
    }

At this point specializing would just enable a more limited version of the swap trick used in append (but just for copying less memory, it wouldn't work to reduce the number of comparisons) and I'm not sure how important that may be.

Neutron3529 · August 24, 2021, 6:42am

I found a solution

// RebuildTail is a struct that ensure `rebuild_tail` in `fn extend_desugared`
// is executed
struct RebuildTail<'a, T: Ord>(&'a mut BinaryHeap<T>, usize);
impl<'a, T: Ord> Drop for RebuildTail<'a, T> {
    #[inline]
    fn drop(&mut self) {
        self.0.rebuild_tail(self.1);
    }
}

impl<T: Ord> BinaryHeap<T> {
    fn extend_desugared<I: IntoIterator<Item = T>>(&mut self, iter: I) {
        let start=self.data.len();
        let extend=RebuildTail(&mut self, start);
        extend.0.data.extend(iter);
        // `self.rebuild_tail(start)` is done by `drop(extend)`
    }
}

even the iterator panic, the rebuild_tail is executed.

Now I am fighting with github... It is not easy for me to connect the github...

The PR is here

hyeonu · August 24, 2021, 8:16am

But we don't have the problem. The BinaryHeap<T>'s document says:

The behavior resulting from such a logic error is not specified, but will not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

Keeping heap ordering after the panic is not needed nor expected.

SkiFire13 · August 24, 2021, 8:28am

Doesn't that "panic" refer to what the BinaryHeap could do if you modify an element such that its ordering compared to the other elements changes. Although it's true that currently any BinaryHeap's method doesn't keep the heap ordering if the comparison function panics (and this includes the proposed solution)

Neutron3529 · August 24, 2021, 8:35am

The question is not just what belongs to the BinaryHeap.

if self.data.extend(iter) panics (just the iter panics), the heap should keep its order, but in the previous implementation of @steffahn self.rebuild_tail(start); could not be executed.

That is what I want to provide (although I wasn't realized that the compare function could also panic then..)

steffahn · August 24, 2021, 9:21am

In Rust, mutable references are generally not considered unwind safe which means that using them anyway after catching a panic is always a logic error (unless the type in question specifies that it isn’t).

Also, the Ord function panicking with append, or the Ord function or closure panicking in retain (the other usage of rebuild_tail currently) leads to a broken binary heap as-well.

system · November 22, 2021, 9:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
`Vec::append` but by value language design	31	1234	September 19, 2024
Add BinaryHeap::into_iter_sorted() libs	3	877	June 30, 2019
Should I add as_slice method to BinaryHeap? libs	5	897	July 4, 2021
Add a "Vec.sorted()" function	27	11370	June 12, 2020
Pre-RFC-ish: add Vec::sorted and friends libs	7	1495	March 25, 2019

Append Vec to BinaryHeap?

Related topics