Skip to content

Regex::captures forces an allocation on every call #219

Closed
@BurntSushi

Description

@BurntSushi

Every time one calls captures, a new allocation for storing the location of captures is created. This allocation has size proportional to the number of captures in the regex.

This is also true for captures_iter, where every iteration results in a new allocation. An iterator could reuse the allocation in theory, but ownership of the captures is transferred to the caller. Even if we could reuse the capture locations, we couldn't give the caller a mutable borrow, since that immediately puts us in the "streaming iterator" conundrum.

The most sensible API I can think of is to:

  1. Permit the caller to build empty Captures values from a given Regex such that it has the right size.
  2. Pass a mutable borrow to a Captures to a call to captures, which lets the caller control the allocation.

It's not quite clear how to apply this to captures_iter while still implementing Iterator. I suspect we should probably borrow from the io::BufReader::read_line style methods. e.g.,

impl Regex {
    // Returns empty storage for captures for use with read_captures.
    fn new_captures(&self) -> Captures { ... }

    // On successful match, returns true and sets capture locations.
    // Otherwise returns false.
    fn read_captures(&self, caps: &mut Captures, text: &str) -> bool { ... }
    fn read_captures_iter<'r, 't>(&'r self, text: &'t str) -> ReadCapturesIter<'r, 't> { ... }
}

struct ReadCapturesIter<'r, 't> { ... }

impl<'r, 't> ReadCapturesIter<'r, 't> {
    // On successful match, returns true and sets capture locations.
    // Otherwise returns false.
    fn captures(&mut self, caps: &mut Captures) -> bool { ... }
}

And I think this would work well.

Main questions:

  1. Any alternatives?
  2. Do we replace the existing API with the one from above (in 1.0)? Or do we add it? My inclination is to add it, but I really hate expanding the API with more choices.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions