mirror of https://github.com/rust-lang/rfcs.git
972 lines
40 KiB
Markdown
972 lines
40 KiB
Markdown
![]() |
- Feature Name: regex-1.0
|
||
|
- Start Date: 2016-05-11
|
||
![]() |
- RFC PR: [rust-lang/rfcs#1620](https://github.com/rust-lang/rfcs/pull/1620)
|
||
|
- Rust Issue: N/A
|
||
![]() |
|
||
|
# Table of contents
|
||
|
|
||
|
* [Summary][summary]
|
||
|
* [Motivation][motivation]
|
||
|
* [Detailed design][design]
|
||
|
* [Syntax][syntax]
|
||
|
* [Evolution][evolution]
|
||
|
* [Concrete syntax][concrete-syntax]
|
||
|
* [Expansion concerns][expansion-concerns]
|
||
|
* [Core API][core-api]
|
||
|
* [RegexBuilder][regexbuilder]
|
||
|
* [Replacer][replacer]
|
||
|
* [quote][quote]
|
||
|
* [RegexSet][regexset]
|
||
|
* [The `bytes` submodule][the-bytes-submodule]
|
||
|
* [Drawbacks][drawbacks]
|
||
|
* [Guaranteed linear time matching][guaranteed-linear-time-matching]
|
||
|
* [Allocation][allocation]
|
||
|
* [Synchronization is implicit][synchronization-is-implicit]
|
||
|
* [The implementation is complex][the-implementation-is-complex]
|
||
|
* [Alternatives][alternatives]
|
||
|
* [Big picture][big-picture]
|
||
|
* [`bytes::Regex`][bytesregex]
|
||
|
* [A regex trait][a-regex-trait]
|
||
|
* [Reuse some types][reuse-some-types]
|
||
|
* [Unresolved questions][unresolved]
|
||
|
* [`regex-syntax`][regex-syntax]
|
||
|
* [`regex-capi`][regex-capi]
|
||
|
* [`regex_macros`][regex_macros]
|
||
|
* [Dependencies][dependencies]
|
||
|
* [Exposing more internals][exposing-more-internals]
|
||
|
* [Breaking changes][breaking-changes]
|
||
|
|
||
|
# Summary
|
||
|
[summary]: #summary
|
||
|
|
||
|
This RFC proposes a 1.0 API for the `regex` crate and therefore a move out of
|
||
|
the `rust-lang-nursery` organization and into the `rust-lang` organization.
|
||
|
Since the API of `regex` has largely remained unchanged since its inception
|
||
|
[2 years ago](https://github.com/rust-lang/rfcs/blob/master/text/0042-regexps.md),
|
||
|
significant emphasis is placed on retaining the existing API. Some minor
|
||
|
breaking changes are proposed.
|
||
|
|
||
|
# Motivation
|
||
|
[motivation]: #motivation
|
||
|
|
||
|
Regular expressions are a widely used tool and most popular programming
|
||
|
languages either have an implementation of regexes in their standard library,
|
||
|
or there exists at least one widely used third party implementation. It
|
||
|
therefore seems reasonable for Rust to do something similar.
|
||
|
|
||
|
The `regex` crate specifically serves many use cases, most of which are somehow
|
||
|
related to searching strings for patterns. Describing regular expressions in
|
||
|
detail is beyond the scope of this RFC, but briefly, these core use cases are
|
||
|
supported in the main API:
|
||
|
|
||
|
1. Testing whether a pattern matches some text.
|
||
|
2. Finding the location of a match of a pattern in some text.
|
||
|
3. Finding the location of a match of a pattern---and locations of all its
|
||
|
capturing groups---in some text.
|
||
|
4. Iterating over successive non-overlapping matches of (2) and (3).
|
||
|
|
||
|
The expected outcome is that the `regex` crate should be the preferred default
|
||
|
choice for matching regular expressions when writing Rust code. This is already
|
||
|
true today; this RFC formalizes it.
|
||
|
|
||
|
# Detailed design
|
||
|
[design]: #detailed-design
|
||
|
|
||
|
## Syntax
|
||
|
[syntax]: #syntax
|
||
|
|
||
|
### Evolution
|
||
|
[evolution]: #evolution
|
||
|
|
||
|
The public API of a `regex` library *includes* the syntax of a regular
|
||
|
expression. A change in the semantics of the syntax can cause otherwise working
|
||
|
programs to break, yet, we'd still like the option to expand the syntax if
|
||
|
necessary. Thus, this RFC proposes:
|
||
|
|
||
|
1. Any change that causes a previously invalid regex pattern to become valid is
|
||
|
*not* a breaking change. For example, the escape sequence `\y` is not a
|
||
|
valid pattern, but could become one in a future release without a major
|
||
|
version bump.
|
||
|
2. Any change that causes a previously valid regex pattern to become invalid
|
||
|
*is* a breaking change.
|
||
|
3. Any change that causes a valid regex pattern to change its matching
|
||
|
semantics *is* a breaking change. (For example, changing `\b` from "word
|
||
|
boundary assertion" to "backspace character.")
|
||
|
|
||
![]() |
Bug fixes and Unicode upgrades are exceptions to both (2) and (3).
|
||
![]() |
|
||
|
Another interesting exception to (2) is that compiling a regex can fail if the
|
||
|
entire compiled object would exceed some pre-defined user configurable size.
|
||
|
In particular, future changes to the compiler could cause certain instructions
|
||
|
to use more memory, or indeed, the representation of the compiled regex could
|
||
|
change completely. This could cause a regex that fit under the size limit to
|
||
|
no longer fit, and therefore fail to compile. These cases are expected to be
|
||
|
extremely rare in practice. Notably, the default size limit is `10MB`.
|
||
|
|
||
|
### Concrete syntax
|
||
|
[concrete-syntax]: #concrete-syntax
|
||
|
|
||
|
The syntax is exhaustively documented in the current public API documentation:
|
||
|
http://doc.rust-lang.org/regex/regex/index.html#syntax
|
||
|
|
||
|
To my knowledge, the evolution as proposed in this RFC has been followed since
|
||
![]() |
`regex` was created. The syntax has largely remained unchanged with few
|
||
![]() |
additions.
|
||
|
|
||
|
### Expansion concerns
|
||
|
[expansion-concerns]: #expansion-concerns
|
||
|
|
||
|
There are a few possible avenues for expansion, and we take measures to make
|
||
|
sure they are possible with respect to API evolution.
|
||
|
|
||
|
* Escape sequences are often blessed with special semantics. For example, `\d`
|
||
|
is a Unicode character class that matches any digit and `\b` is a word
|
||
|
boundary assertion. We may one day like to add more escape sequences with
|
||
|
special semantics. For this reason, any unrecognized escape sequence makes a
|
||
|
pattern invalid.
|
||
|
* If we wanted to expand the syntax with various look-around operators, then it
|
||
|
would be possible since most common syntax is considered an invalid pattern
|
||
|
today. In particular, all of the [syntactic forms listed
|
||
|
here](http://www.regular-expressions.info/refadv.html) are invalid patterns
|
||
|
in `regex`.
|
||
|
* Character class sets are another potentially useful feature that may be worth
|
||
|
adding. Currently, [various forms of set
|
||
|
notation](http://www.regular-expressions.info/refcharclass.html) are treated
|
||
|
as valid patterns, but this RFC proposes making them invalid patterns before
|
||
|
`1.0`.
|
||
|
* Additional named Unicode classes or codepoints may be desirable to add.
|
||
|
Today, any pattern of the form `\p{NAME}` where `NAME` is unrecognized is
|
||
|
considered invalid, which leaves room for expansion.
|
||
|
* If all else fails, we can introduce new flags that enable new features that
|
||
|
conflict with stable syntax. This is possible because using an unrecognized
|
||
|
flag results in an invalid pattern.
|
||
|
|
||
|
## Core API
|
||
|
[core-api]: #core-api
|
||
|
|
||
|
The core API of the `regex` crate is the `Regex` type:
|
||
|
|
||
|
```rust
|
||
|
pub struct Regex(_);
|
||
|
```
|
||
|
|
||
|
It has one primary constructor:
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Creates a new regular expression. If the pattern is invalid or otherwise
|
||
|
/// fails to compile, this returns an error.
|
||
|
pub fn new(pattern: &str) -> Result<Regex, Error>;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
And five core search methods. All searching completes in worst case linear time
|
||
|
with respect to the search text (the size of the regex is taken as a constant).
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Returns true if and only if the text matches this regex.
|
||
|
pub fn is_match(&self, text: &str) -> bool;
|
||
|
|
||
|
/// Returns the leftmost-first match of this regex in the text given. If no
|
||
|
/// match exists, then None is returned.
|
||
|
///
|
||
|
/// The leftmost-first match is defined as the first match that is found
|
||
|
/// by a backtracking search.
|
||
![]() |
pub fn find<'t>(&self, text: &'t str) -> Option<Match<'t>>;
|
||
![]() |
|
||
|
/// Returns an iterator of successive non-overlapping matches of this regex
|
||
|
/// in the text given.
|
||
![]() |
pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> Matches<'r, 't>;
|
||
![]() |
|
||
|
/// Returns the leftmost-first match of this regex in the text given with
|
||
|
/// locations for all capturing groups that participated in the match.
|
||
|
pub fn captures(&self, text: &str) -> Option<Captures>;
|
||
|
|
||
|
/// Returns an iterator of successive non-overlapping matches with capturing
|
||
|
/// group information in the text given.
|
||
![]() |
pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> CaptureMatches<'r, 't>;
|
||
![]() |
}
|
||
|
```
|
||
|
|
||
|
(N.B. The `captures` method can technically replace all uses of `find` and
|
||
|
`is_match`, but is potentially slower. Namely, the API reflects a performance
|
||
|
trade off: the more you ask for, the harder the regex engine has to work.)
|
||
|
|
||
|
There is one additional, but idiosyncratic, search method:
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Returns the end location of a match if one exists in text.
|
||
|
///
|
||
|
/// This may return a location preceding the end of a proper leftmost-first
|
||
|
/// match. In particular, it may return the location at which a match is
|
||
|
/// determined to exist. For example, matching `a+` against `aaaaa` will
|
||
|
/// return `1` while the end of the leftmost-first match is actually `5`.
|
||
|
///
|
||
|
/// This has the same performance characteristics as `is_match`.
|
||
|
pub fn shortest_match(&self, text: &str) -> Option<usize>;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
And two methods for splitting:
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Returns an iterator of substrings of `text` delimited by a match of
|
||
|
/// this regular expression. Each element yielded by the iterator corresponds
|
||
|
/// to text that *isn't* matched by this regex.
|
||
![]() |
pub fn split<'r, 't>(&'r self, text: &'t str) -> Split<'r, 't>;
|
||
![]() |
|
||
|
/// Returns an iterator of at most `limit` substrings of `text` delimited by
|
||
|
/// a match of this regular expression. Each element yielded by the iterator
|
||
|
/// corresponds to text that *isn't* matched by this regex. The remainder of
|
||
|
/// `text` that is not split will be the last element yielded by the
|
||
|
/// iterator.
|
||
![]() |
pub fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> SplitN<'r, 't>;
|
||
![]() |
}
|
||
|
```
|
||
|
|
||
|
And three methods for replacement. Replacement is discussed in more detail in a
|
||
|
subsequent section.
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Replaces matches of this regex in `text` with `rep`. If no matches were
|
||
|
/// found, then the given string is returned unchanged, otherwise a new
|
||
|
/// string is allocated.
|
||
|
///
|
||
|
/// `replace` replaces the first match only. `replace_all` replaces all
|
||
|
/// matches. `replacen` replaces at most `limit` matches.
|
||
|
fn replace<'t, R: Replacer>(&self, text: &'t str, rep: R) -> Cow<'t, str>;
|
||
|
fn replace_all<'t, R: Replacer>(&self, text: &'t str, rep: R) -> Cow<'t, str>;
|
||
|
fn replacen<'t, R: Replacer>(&self, text: &'t str, limit: usize, rep: R) -> Cow<'t, str>;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
And lastly, three simple accessors:
|
||
|
|
||
|
```rust
|
||
|
impl Regex {
|
||
|
/// Returns the original pattern string.
|
||
|
pub fn as_str(&self) -> &str;
|
||
|
|
||
|
/// Returns an iterator over all capturing group in the pattern in the order
|
||
|
/// they were defined (by position of the leftmost parenthesis). The name of
|
||
|
/// the group is yielded if it has a name, otherwise None is yielded.
|
||
![]() |
pub fn capture_names(&self) -> CaptureNames;
|
||
![]() |
|
||
|
/// Returns the total number of capturing groups in the pattern. This
|
||
|
/// includes the implicit capturing group corresponding to the entire
|
||
|
/// pattern.
|
||
|
pub fn captures_len(&self) -> usize;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Finally, `Regex` impls the `Send`, `Sync`, `Display`, `Debug`, `Clone` and
|
||
|
`FromStr` traits from the standard library.
|
||
|
|
||
|
## Error
|
||
|
|
||
|
The `Error` enum is an *extensible* enum, similar to `std::io::Error`,
|
||
|
corresponding to the different ways that regex compilation can fail. In
|
||
|
particular, this means that adding a new variant to this enum is not a breaking
|
||
|
change. (Removing or changing an existing variant is still a breaking change.)
|
||
|
|
||
|
```rust
|
||
|
pub enum Error {
|
||
|
/// A syntax error.
|
||
![]() |
Syntax(SyntaxError),
|
||
![]() |
/// The compiled program exceeded the set size limit.
|
||
|
/// The argument is the size limit imposed.
|
||
|
CompiledTooBig(usize),
|
||
|
/// Hints that destructuring should not be exhaustive.
|
||
|
///
|
||
|
/// This enum may grow additional variants, so this makes sure clients
|
||
|
/// don't count on exhaustive matching. (Otherwise, adding a new variant
|
||
|
/// could break existing code.)
|
||
|
#[doc(hidden)]
|
||
|
__Nonexhaustive,
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Note that the `Syntax` variant could contain the `Error` type from the
|
||
![]() |
`regex-syntax` crate, but this couples `regex-syntax` to the public API
|
||
|
of `regex`. We sidestep this hazard by defining a newtype in `regex` that
|
||
|
internally wraps `regex_syntax::Error`. This also enables us to selectively
|
||
|
expose more information in the future.
|
||
![]() |
|
||
|
## RegexBuilder
|
||
|
[regexbuilder]: #regexbuilder
|
||
|
|
||
|
In most cases, the construction of a regex is done with `Regex::new`. There are
|
||
|
however some options one might want to tweak. This can be done with a
|
||
|
`RegexBuilder`:
|
||
|
|
||
|
```rust
|
||
|
impl RegexBuilder {
|
||
|
/// Creates a new builder from the given pattern.
|
||
|
pub fn new(pattern: &str) -> RegexBuilder;
|
||
|
|
||
|
/// Compiles the pattern and all set options. If successful, a Regex is
|
||
|
/// returned. Otherwise, if compilation failed, an Error is returned.
|
||
|
///
|
||
|
/// N.B. `RegexBuilder::new("...").compile()` is equivalent to
|
||
|
/// `Regex::new("...")`.
|
||
![]() |
pub fn build(&self) -> Result<Regex, Error>;
|
||
![]() |
|
||
|
/// Set the case insensitive flag (i).
|
||
![]() |
pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the multi line flag (m).
|
||
![]() |
pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the dot-matches-any-character flag (s).
|
||
![]() |
pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the swap-greedy flag (U).
|
||
![]() |
pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the ignore whitespace flag (x).
|
||
![]() |
pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the Unicode flag (u).
|
||
![]() |
pub fn unicode(&mut self, yes: bool) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the approximate size limit (in bytes) of the compiled regular
|
||
|
/// expression.
|
||
|
///
|
||
|
/// If compiling a pattern would approximately exceed this size, then
|
||
|
/// compilation will fail.
|
||
![]() |
pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder;
|
||
![]() |
|
||
|
/// Set the approximate size limit (in bytes) of the cache used by the DFA.
|
||
|
///
|
||
|
/// This is a per thread limit. Once the DFA fills the cache, it will be
|
||
|
/// wiped and refilled again. If the cache is wiped too frequently, the
|
||
|
/// DFA will quit and fall back to another matching engine.
|
||
![]() |
pub fn dfa_size_limit(&mut self, limit: usize) -> &mut RegexBuilder;
|
||
![]() |
}
|
||
|
```
|
||
|
|
||
|
## Captures
|
||
|
|
||
|
A `Captures` value stores the locations of all matching capturing groups for
|
||
|
a single match. It provides convenient access to those locations indexed by
|
||
|
either number, or, if available, name.
|
||
|
|
||
|
The first capturing group (index `0`) is always unnamed and always corresponds
|
||
|
to the entire match. Other capturing groups correspond to groups in the
|
||
|
pattern. Capturing groups are indexed by the position of their leftmost
|
||
|
parenthesis in the pattern.
|
||
|
|
||
|
Note that `Captures` is a type constructor with a single parameter: the
|
||
|
lifetime of the text searched by the corresponding regex. In particular, the
|
||
|
lifetime of `Captures` is not tied to the lifetime of a `Regex`.
|
||
|
|
||
|
```rust
|
||
|
impl<'t> Captures<'t> {
|
||
![]() |
/// Returns the match associated with the capture group at index `i`. If
|
||
|
/// `i` does not correspond to a capture group, or if the capture group
|
||
|
/// did not participate in the match, then `None` is returned.
|
||
|
pub fn get(&self, i: usize) -> Option<Match<'t>>;
|
||
![]() |
|
||
![]() |
/// Returns the match for the capture group named `name`. If `name` isn't a
|
||
|
/// valid capture group or didn't match anything, then `None` is returned.
|
||
|
pub fn name(&self, name: &str) -> Option<Match<'t>>;
|
||
![]() |
|
||
|
/// Returns the number of captured groups. This is always at least 1, since
|
||
|
/// the first unnamed capturing group corresponding to the entire match
|
||
|
/// always exists.
|
||
|
pub fn len(&self) -> usize;
|
||
|
|
||
|
/// Expands all instances of $name in the text given to the value of the
|
||
|
/// corresponding named capture group. The expanded string is written to
|
||
|
/// dst.
|
||
|
///
|
||
|
/// The name in $name may be integer corresponding to the index of a capture
|
||
|
/// group or it can be the name of a capture group. If the name isn't a valid
|
||
|
/// capture group, then it is replaced with an empty string.
|
||
|
///
|
||
|
/// The longest possible name is used. e.g., $1a looks up the capture group
|
||
|
/// named 1a and not the capture group at index 1. To exert more precise
|
||
|
/// control over the name, use braces, e.g., ${1}a.
|
||
|
///
|
||
|
/// To write a literal $, use $$.
|
||
|
pub fn expand(&self, replacement: &str, dst: &mut String);
|
||
|
}
|
||
|
```
|
||
|
|
||
|
The `Captures` type impls `Debug`, `Index<usize>` (for numbered capture groups)
|
||
|
and `Index<str>` (for named capture groups). A downside of the `Index` impls is
|
||
|
that the return value is bounded to the lifetime of `Captures` instead of the
|
||
|
lifetime of the actual text searched because of how the `Index` trait is
|
||
|
defined. Callers can work around that limitation if necessary by using an
|
||
![]() |
explicit method such as `get` or `name`.
|
||
![]() |
|
||
|
## Replacer
|
||
|
[replacer]: #replacer
|
||
|
|
||
|
The `Replacer` trait is a helper trait to make the various `replace` methods on
|
||
|
`Regex` more ergonomic. In particular, it makes it possible to use either a
|
||
|
standard string as a replacement, or a closure with more explicit access to a
|
||
|
`Captures` value.
|
||
|
|
||
|
```rust
|
||
|
pub trait Replacer {
|
||
|
/// Appends text to dst to replace the current match.
|
||
|
///
|
||
|
/// The current match is represents by caps, which is guaranteed to have a
|
||
|
/// match at capture group 0.
|
||
|
///
|
||
|
/// For example, a no-op replacement would be
|
||
|
/// dst.extend(caps.at(0).unwrap()).
|
||
|
fn replace_append(&mut self, caps: &Captures, dst: &mut String);
|
||
|
|
||
|
/// Return a fixed unchanging replacement string.
|
||
|
///
|
||
|
/// When doing replacements, if access to Captures is not needed, then
|
||
|
/// it can be beneficial from a performance perspective to avoid finding
|
||
|
/// sub-captures. In general, this is called once for every call to replacen.
|
||
|
fn no_expansion<'r>(&'r mut self) -> Option<Cow<'r, str>> {
|
||
|
None
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Along with this trait, there is also a helper type, `NoExpand` that implements
|
||
|
`Replacer` like so:
|
||
|
|
||
|
```rust
|
||
|
pub struct NoExpand<'t>(pub &'t str);
|
||
|
|
||
|
impl<'t> Replacer for NoExpand<'t> {
|
||
![]() |
fn replace_append(&mut self, _: &Captures, dst: &mut String) {
|
||
|
dst.push_str(self.0);
|
||
![]() |
}
|
||
|
|
||
![]() |
fn no_expansion<'r>(&'r mut self) -> Option<Cow<'r, str>> {
|
||
|
Some(Cow::Borrowed(self.0))
|
||
![]() |
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
This permits callers to use `NoExpand` with the `replace` methods to guarantee
|
||
|
that the replacement string is never searched for `$group` replacement syntax.
|
||
|
|
||
|
We also provide two more implementations of the `Replacer` trait: `&str` and
|
||
|
`FnMut(&Captures) -> String`.
|
||
|
|
||
|
## quote
|
||
|
[quote]: #quote
|
||
|
|
||
|
There is one free function in `regex`:
|
||
|
|
||
|
```rust
|
||
|
/// Escapes all regular expression meta characters in `text`.
|
||
|
///
|
||
|
/// The string returned may be safely used as a literal in a regex.
|
||
|
pub fn quote(text: &str) -> String;
|
||
|
```
|
||
|
|
||
|
## RegexSet
|
||
|
[regexset]: #regexset
|
||
|
|
||
|
A `RegexSet` represents the union of zero or more regular expressions. It is a
|
||
|
specialized machine that can match multiple regular expressions simultaneously.
|
||
|
Conceptually, it is similar to joining multiple regexes as alternates, e.g.,
|
||
|
`re1|re2|...|reN`, with one crucial difference: in a `RegexSet`, multiple
|
||
|
expressions can match. This means that each pattern can be reasoned about
|
||
|
independently. A `RegexSet` is ideal for building simpler lexers or an HTTP
|
||
|
router.
|
||
|
|
||
|
Because of their specialized nature, they can only report which regexes match.
|
||
|
They do not report match locations. In theory, this could be added in the
|
||
|
future, but is difficult.
|
||
|
|
||
|
```rust
|
||
|
pub struct RegexSet(_);
|
||
|
|
||
|
impl RegexSet {
|
||
|
/// Constructs a new RegexSet from the given sequence of patterns.
|
||
|
///
|
||
|
/// The order of the patterns given is used to assign increasing integer
|
||
|
/// ids starting from 0. Namely, matches are reported in terms of these ids.
|
||
|
pub fn new<I, S>(patterns: I) -> Result<RegexSet, Error>
|
||
|
where S: AsRef<str>, I: IntoIterator<Item=S>;
|
||
|
|
||
|
/// Returns the total number of regexes in this set.
|
||
|
pub fn len(&self) -> usize;
|
||
|
|
||
|
/// Returns true if and only if one or more regexes in this set match
|
||
|
/// somewhere in the given text.
|
||
|
pub fn is_match(&self, text: &str) -> bool;
|
||
|
|
||
|
/// Returns the set of regular expressions that match somewhere in the given
|
||
|
/// text.
|
||
|
pub fn matches(&self, text: &str) -> SetMatches;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
`RegexSet` impls the `Debug` and `Clone` traits.
|
||
|
|
||
|
The `SetMatches` type is queryable and implements `IntoIterator`.
|
||
|
|
||
|
```rust
|
||
|
pub struct SetMatches(_);
|
||
|
|
||
|
impl SetMatches {
|
||
|
/// Returns true if this set contains 1 or more matches.
|
||
|
pub fn matched_any(&self) -> bool;
|
||
|
|
||
|
/// Returns true if and only if the regex identified by the given id is in
|
||
|
/// this set of matches.
|
||
|
///
|
||
|
/// This panics if the id given is >= the number of regexes in the set that
|
||
|
/// these matches came from.
|
||
|
pub fn matched(&self, id: usize) -> bool;
|
||
|
|
||
|
/// Returns the total number of regexes in the set that created these
|
||
|
/// matches.
|
||
|
pub fn len(&self) -> usize;
|
||
|
|
||
|
/// Returns an iterator over the ids in the set that correspond to a match.
|
||
|
pub fn iter(&self) -> SetMatchesIter;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
`SetMatches` impls the `Debug` and `Clone` traits.
|
||
|
|
||
|
Note that a builder is not proposed for `RegexSet` in this RFC; however, it is
|
||
|
likely one will be added at some point in a backwards compatible way.
|
||
|
|
||
|
## The `bytes` submodule
|
||
|
[the-bytes-submodule]: #the-bytes-submodule
|
||
|
|
||
|
All of the above APIs have thus far been explicitly for searching `text` where
|
||
|
`text` has type `&str`. While this author believes that suits most use cases,
|
||
|
it should also be possible to search a regex on *arbitrary* bytes, i.e.,
|
||
|
`&[u8]`. One particular use case is quickly searching a file via a memory map.
|
||
|
If regexes could only search `&str`, then one would have to verify it was UTF-8
|
||
|
first, which could be costly. Moreover, if the file isn't valid UTF-8, then you
|
||
|
either can't search it, or you have to allocate a new string and lossily copy
|
||
|
the contents. Neither case is particularly ideal. It would instead be nice to
|
||
|
just search the `&[u8]` directly.
|
||
|
|
||
|
This RFC including a `bytes` submodule in the crate. The API of this submodule
|
||
|
is a clone of the API described so far, except with `&str` replaced by `&[u8]`
|
||
|
for the search text (patterns are still `&str`). The clone includes `Regex`
|
||
|
itself, along with all supporting types and traits such as `Captures`,
|
||
|
`Replacer`, `FindIter`, `RegexSet`, `RegexBuilder` and so on. (This RFC
|
||
|
describes some alternative designs in a subsequent section.)
|
||
|
|
||
|
Since the API is a clone of what has been seen so far, it is not written out
|
||
|
again. Instead, we'll discuss the key differences.
|
||
|
|
||
|
Again, the first difference is that a `bytes::Regex` can search `&[u8]`
|
||
|
while a `Regex` can search `&str`.
|
||
|
|
||
|
The second difference is that a `bytes::Regex` can completely disable Unicode
|
||
|
support and explicitly match arbitrary bytes. The details:
|
||
|
|
||
|
1. The `u` flag can be disabled even when disabling it might cause the regex to
|
||
|
match invalid UTF-8. When the `u` flag is disabled, the regex is said to be in
|
||
|
"ASCII compatible" mode.
|
||
|
2. In ASCII compatible mode, neither Unicode codepoints nor Unicode character
|
||
|
classes are allowed.
|
||
|
3. In ASCII compatible mode, Perl character classes (`\w`, `\d` and `\s`)
|
||
|
revert to their typical ASCII definition. `\w` maps to `[[:word:]]`, `\d` maps
|
||
|
to `[[:digit:]]` and `\s` maps to `[[:space:]]`.
|
||
|
4. In ASCII compatible mode, word boundaries use the ASCII compatible `\w` to
|
||
|
determine whether a byte is a word byte or not.
|
||
|
5. Hexadecimal notation can be used to specify arbitrary bytes instead of
|
||
|
Unicode codepoints. For example, in ASCII compatible mode, `\xFF` matches the
|
||
|
literal byte `\xFF`, while in Unicode mode, `\xFF` is a Unicode codepoint that
|
||
|
matches its UTF-8 encoding of `\xC3\xBF`. Similarly for octal notation.
|
||
|
6. `.` matches any byte except for `\n` instead of any Unicode codepoint. When
|
||
|
the `s` flag is enabled, `.` matches any byte.
|
||
|
|
||
|
An interesting property of the above is that while the Unicode flag is enabled,
|
||
|
a `bytes::Regex` is *guaranteed* to match only valid UTF-8 in a `&[u8]`. Like
|
||
|
`Regex`, the Unicode flag is enabled by default.
|
||
|
|
||
|
N.B. The Unicode flag can also be selectively disabled in a `Regex`, but not in
|
||
|
a way that permits matching invalid UTF-8.
|
||
|
|
||
|
# Drawbacks
|
||
|
[drawbacks]: #drawbacks
|
||
|
|
||
|
## Guaranteed linear time matching
|
||
|
[guaranteed-linear-time-matching]: #guaranteed-linear-time-matching
|
||
|
|
||
|
A significant contract in the API of the `regex` crate is that all searching
|
||
|
has worst case `O(n)` complexity, where `n ~ length(text)`. (The size of the
|
||
|
regular expression is taken as a constant.) This contract imposes significant
|
||
|
restrictions on both the implementation and the set of features exposed in the
|
||
|
pattern language. A full analysis is beyond the scope of this RFC, but here are
|
||
|
the highlights:
|
||
|
|
||
|
1. Unbounded backtracking can't be used to implement matching. Backtracking can
|
||
|
be quite fast in practice (indeed, the current implementation uses bounded
|
||
|
backtracking in some cases), but has worst case exponential time.
|
||
|
2. Permitting backreferences in the pattern language can cause matching to
|
||
|
become NP-complete, which (probably) can't be solved in linear time.
|
||
|
3. Arbitrary look around is probably difficult to fit into a linear time
|
||
|
guarantee *in practice*.
|
||
|
|
||
|
The benefit to the linear time guarantee is just that: no matter what, all
|
||
|
searching completes in linear time with respect to the search text. This is a
|
||
|
valuable guarantee to make, because it means that one can execute arbitrary
|
||
|
regular expressions over arbitrary input and be absolutely sure that it will
|
||
|
finish in some "reasonable" time.
|
||
|
|
||
|
Of course, in practice, constants that are omitted from complexity analysis
|
||
|
*actually matter*. For this reason, the `regex` crate takes a number of steps
|
||
|
to keep constants low. For example, by placing a limit on the size of the
|
||
|
regular expression or choosing an appropriate matching engine when another
|
||
|
might result in higher constant factors.
|
||
|
|
||
|
This particular drawback segregates Rust's regular expression library from most
|
||
|
other regular expression libraries that programmers may be familiar with.
|
||
|
Languages such as Java, Python, Perl, Ruby, PHP and C++ support more flavorful
|
||
|
regexes by default. Go is the only language this author knows of whose standard
|
||
|
regex implementation guarantees linear time matching. Of course, RE2
|
||
|
is also worth mentioning, which is a C++ regex library that guarantees linear
|
||
|
time matching. There are other implementations of regexes that guarantee linear
|
||
|
time matching (TRE, for example), but none of them are particularly popular.
|
||
|
|
||
|
It is also worth noting that since Rust's FFI is zero cost, one can bind to
|
||
|
existing regex implementations that provide more features (bindings for both
|
||
|
PCRE1 and Oniguruma exist today).
|
||
|
|
||
|
## Allocation
|
||
|
[allocation]: #allocation
|
||
|
|
||
|
The `regex` API assumes that the implementation can dynamically allocate
|
||
|
memory. Indeed, the current implementation takes advantage of this. A `regex`
|
||
|
library that has no requirement on dynamic memory allocation would look
|
||
|
significantly different than the one that exists today. Dynamic memory
|
||
|
allocation is utilized pervasively in the parser, compiler and even during
|
||
|
search.
|
||
|
|
||
|
The benefit of permitting dynamic memory allocation is that it makes the
|
||
|
implementation *and* API simpler. This does make use of the `regex` crate in
|
||
|
environments that don't have dynamic memory allocation impossible.
|
||
|
|
||
|
This author isn't aware of any `regex` library that can work without dynamic
|
||
|
memory allocation.
|
||
|
|
||
|
With that said, `regex` may want to grow custom allocator support when the
|
||
|
corresponding traits stabilize.
|
||
|
|
||
|
## Synchronization is implicit
|
||
|
[synchronization-is-implicit]: #synchronization-is-implicit
|
||
|
|
||
|
Every `Regex` value can be safely used from multiple threads simultaneously.
|
||
|
Since a `Regex` has interior mutable state, this implies that it must do some
|
||
|
kind of synchronization in order to be safe.
|
||
|
|
||
|
There are some reasons why we might want to do synchronization
|
||
|
automatically:
|
||
|
|
||
|
1. `Regex` exposes an *immutable API*. That is, from looking at its set of
|
||
|
methods, none of them borrow the `Regex` mutably (or otherwise claim to
|
||
|
mutate the `Regex`). This author claims that since there is no *observable
|
||
|
mutation* of a `Regex`, it *not* being thread safe would violate the
|
||
|
principle of least surprise.
|
||
|
2. Often, a `Regex` should be compiled once and reused repeatedly in multiple
|
||
|
searches. To facilitate this, `lazy_static!` can be used to guarantee that
|
||
|
compilation happens exactly once. `lazy_static!` requires its types to be
|
||
|
`Sync`. A user of `Regex` could work around this by wrapping a `Regex` in a
|
||
|
`Mutex`, but this would make misuse too easy. For example, locking a `Regex`
|
||
|
in one thread would prevent simultaneous searching in another thread.
|
||
|
|
||
|
Synchronization has overhead, although it is extremely small (and dwarfed
|
||
|
by general matching overhead). The author has *ad hoc* benchmarked the
|
||
|
`regex` implementation with GNU Grep, and per match overhead is comparable in
|
||
|
single threaded use. It is this author's opinion, that it is good enough. If
|
||
|
synchronization overhead across multiple threads is too much, callers may elect
|
||
|
to clone the `Regex` so that each thread gets its own copy. Cloning a `Regex`
|
||
|
is no more expensive than what would be done internally automatically, but it
|
||
|
does eliminate contention.
|
||
|
|
||
|
An alternative is to increase the API surface and have types that are
|
||
|
synchronized by default and types that aren't synchronized. This was discussed
|
||
|
at length in
|
||
|
[this
|
||
|
thread](https://users.rust-lang.org/t/help-me-reduce-overhead-of-regex-matching/5220/1).
|
||
|
My conclusion from this thread is that we either expand the surface of the API,
|
||
|
or we break the current API or we keep implicit synchronization as-is. In this
|
||
|
author's opinion, neither expanding the API or breaking the API is worth
|
||
|
avoiding negligible synchronization overhead.
|
||
|
|
||
|
## The implementation is complex
|
||
|
[the-implementation-is-complex]: #the-implementation-is-complex
|
||
|
|
||
|
Regular expression engines have a lot of moving parts and it often requires
|
||
|
quite a bit of context on how the whole library is organized in order to make
|
||
|
significant contributions. Therefore, moving `regex` into `rust-lang` is a
|
||
|
*maintenance hazard*. This author has tried to mitigate this hazard somewhat by
|
||
|
doing the following:
|
||
|
|
||
|
1. Offering to mentor contributions. Significant contributions have thus far
|
||
|
fizzled, but minor contributions---even to complex code like the DFA---have
|
||
|
been successful.
|
||
|
2. Documenting not just the API, but the *internals*. The DFA is, for example,
|
||
|
heavily documented.
|
||
|
3. Wrote a `HACKING.md` guide that gives a sweeping overview of the design.
|
||
|
4. Significant test and benchmark suites.
|
||
|
|
||
|
With that said, there is still a lot more that could be done to mitigate the
|
||
|
maintenance hazard. In this author's opinion, the interaction between the three
|
||
|
parts of the implementation (parsing, compilation, searching) is not documented
|
||
|
clearly enough.
|
||
|
|
||
|
# Alternatives
|
||
|
[alternatives]: #alternatives
|
||
|
|
||
|
## Big picture
|
||
|
[big-picture]: #big-picture
|
||
|
|
||
|
The most important alternative is to decide *not* to bless a particular
|
||
|
implementation of regular expressions. We might want to go this route for any
|
||
|
number of reasons (see: Drawbacks). However, the `regex` crate is already
|
||
|
widely used, which provides at least some evidence that some set of programmers
|
||
|
find it good enough for general purpose regex searching.
|
||
|
|
||
|
The impact of not moving `regex` into `rust-lang` is, plainly, that Rust won't
|
||
|
have an "officially blessed" regex implementation. Many programmers may
|
||
|
appreciate the complexity of a regex implementation, and therefore might insist
|
||
|
that one be officially maintained. However, to be honest, it isn't quite clear
|
||
|
what would happen in practice. This author is speculating.
|
||
|
|
||
|
## `bytes::Regex`
|
||
|
[bytesregex]: #bytesregex
|
||
|
|
||
|
This RFC proposes stabilizing the `bytes` sub-module of the `regex` crate in
|
||
|
its entirety. The `bytes` sub-module is a near clone of the API at the crate
|
||
|
level with one important difference: it searches `&[u8]` instead of `&str`.
|
||
|
This design was motivated by a similar split in `std`, but there are
|
||
|
alternatives.
|
||
|
|
||
|
### A regex trait
|
||
|
[a-regex-trait]: #a-regex-trait
|
||
|
|
||
|
One alternative is designing a trait that looks something like this:
|
||
|
|
||
|
```rust
|
||
|
trait Regex {
|
||
|
type Text: ?Sized;
|
||
|
|
||
|
fn is_match(&self, text: &Self::Text) -> bool;
|
||
![]() |
fn find(&self, text: &Self::Text) -> Option<Match>;
|
||
|
fn find_iter<'r, 't>(&'r self, text: &'t Self::Text) -> Matches<'r, 't, Self::Text>;
|
||
![]() |
// and so on
|
||
|
}
|
||
|
```
|
||
|
|
||
|
However, there are a couple problems with this approach. First and foremost,
|
||
|
the use cases of such a trait aren't exactly clear. It does make writing
|
||
|
generic code that searches either a `&str` or a `&[u8]` possible, but the
|
||
|
semantics of searching `&str` (always valid UTF-8) or `&[u8]` are quite a bit
|
||
|
different with respect to the original `Regex`. Secondly, the trait isn't
|
||
|
obviously implementable by others. For example, some of the methods return
|
||
![]() |
iterator types such as `Matches` that are typically implemented with a
|
||
![]() |
lower level API that isn't exposed. This suggests that a straight-forward
|
||
|
traitification of the current API probably isn't appropriate, and perhaps,
|
||
|
a better trait needs to be more fundamental to regex searching.
|
||
|
|
||
|
Perhaps the strongest reason to not adopt this design for regex `1.0` is that
|
||
|
we don't have any experience with it and there hasn't been any demand for it.
|
||
|
In particular, it could be prototyped in another crate.
|
||
|
|
||
|
### Reuse some types
|
||
|
[reuse-some-types]: #reuse-some-types
|
||
|
|
||
|
In the current proposal, the `bytes` submodule completely duplicates the
|
||
|
top-level API, including all iterator types, `Captures` and even the `Replacer`
|
||
|
trait. We could parameterize many of those types over the type of the text
|
||
|
searched. For example, the proposed `Replacer` trait looks like this:
|
||
|
|
||
|
```rust
|
||
|
trait Replacer {
|
||
|
fn replace_append(&mut self, caps: &Captures, dst: &mut String);
|
||
|
|
||
|
fn no_expansion<'r>(&'r mut self) -> Option<Cow<'r, str>> {
|
||
|
None
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
We might add an associated type like so:
|
||
|
|
||
|
```rust
|
||
|
trait Replacer {
|
||
|
type Text: ToOwned + ?Sized;
|
||
|
|
||
|
fn replace_append(
|
||
|
&mut self,
|
||
|
caps: &Captures<Self::Text>,
|
||
|
dst: &mut <Self::Text as ToOwned>::Owned,
|
||
|
);
|
||
|
|
||
|
fn no_expansion<'r>(&'r mut self) -> Option<Cow<'r, Self::Text>> {
|
||
|
None
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
But parameterizing the `Captures` type is a little bit tricky. Namely, methods
|
||
![]() |
like `get` want to slice the text at match offsets, but this can't be done
|
||
![]() |
safely in generic code without introducing another public trait.
|
||
|
|
||
|
The final death knell in this idea is that these two implementations cannot
|
||
|
co-exist:
|
||
|
|
||
|
```rust
|
||
|
impl<F> Replacer for F where F: FnMut(&Captures) -> String {
|
||
|
type Text = str;
|
||
|
|
||
|
fn replace_append(&mut self, caps: &Captures, dst: &mut String) {
|
||
|
dst.push_str(&(*self)(caps));
|
||
|
}
|
||
|
}
|
||
|
|
||
|
impl<F> Replacer for F where F: FnMut(&Captures) -> Vec<u8> {
|
||
|
type Text = [u8];
|
||
|
|
||
|
fn replace_append(&mut self, caps: &Captures, dst: &mut Vec<u8>) {
|
||
|
dst.extend(&(*self)(caps));
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Perhaps there is a path through this using yet more types or more traits, but
|
||
|
without a really strong motivating reason to find it, I'm not convinced it's
|
||
|
worth it. Duplicating all of the types is unfortunate, but it's *simple*.
|
||
|
|
||
|
|
||
|
# Unresolved questions
|
||
|
[unresolved]: #unresolved-questions
|
||
|
|
||
|
The `regex` repository has more than just the `regex` crate.
|
||
|
|
||
|
## `regex-syntax`
|
||
|
[regex-syntax]: #regex-syntax
|
||
|
|
||
|
This crate exposes a regular expression parser and abstract syntax that is
|
||
|
completely divorced from compilation or searching. It is not part of `regex`
|
||
|
proper since it may experience more frequent breaking changes and is far less
|
||
|
frequently used. It is not clear whether this crate will ever see `1.0`, and if
|
||
|
it does, what criteria would be used to judge it suitable for `1.0`.
|
||
|
Nevertheless, it is a useful public API, but it is not part of this RFC.
|
||
|
|
||
|
## `regex-capi`
|
||
|
[regex-capi]: #regex-capi
|
||
|
|
||
|
Recently, `regex-capi` was built to provide a C API to this regex library. It
|
||
|
has been used to build [cgo bindings to this library for
|
||
|
Go](https://github.com/BurntSushi/rure-go). Given its young age, it is not part
|
||
|
of this proposal but will be maintained as a pre-1.0 crate in the same
|
||
|
repository.
|
||
|
|
||
|
## `regex_macros`
|
||
|
[regex_macros]: #regex_macros
|
||
|
|
||
|
The `regex!` compiler plugin is a macro that can compile regular expressions
|
||
|
when your Rust program compiles. Stated differently, `regex!("...")` is
|
||
|
transformed into Rust code that executes a search of the given pattern
|
||
|
directly. It was written two years ago and largely hasn't changed since. When
|
||
|
it was first written, it had two major benefits:
|
||
|
|
||
|
1. If there was a syntax error in your regex, your Rust program would not
|
||
|
compile.
|
||
|
2. It was faster.
|
||
|
|
||
|
Today, (1) can be simulated in practice with the use of a Clippy lint and (2)
|
||
![]() |
is no longer true. In fact, `regex!` is at least one order of magnitude slower
|
||
![]() |
than the standard `Regex` implementation.
|
||
|
|
||
|
The future of `regex_macros` is not clear. In one sense, since it is a
|
||
|
compiler plugin, there hasn't been much interest in developing it further since
|
||
|
its audience is necessarily limited. In another sense, it's not entirely clear
|
||
|
what its implementation path is. It would take considerable work for it to beat
|
||
|
the current `Regex` implementation (if it's even possible). More discussion on
|
||
|
this is out of scope.
|
||
|
|
||
|
## Dependencies
|
||
|
[dependencies]: #dependencies
|
||
|
|
||
|
As of now, `regex` has several dependencies:
|
||
|
|
||
|
* `aho-corasick`
|
||
|
* `memchr`
|
||
|
* `thread_local`
|
||
|
* `regex-syntax`
|
||
|
* `utf8-ranges`
|
||
|
|
||
|
All of them except for `thread_local` were written by this author, and were
|
||
|
primarily motivated for use in the `regex` crate. They were split out because
|
||
|
they seem generally useful.
|
||
|
|
||
|
There may be other things in `regex` (today or in the future) that may also be
|
||
|
helpful to others outside the strict context of `regex`. Is it beneficial to
|
||
|
split such things out and create a longer list of dependencies? Or should we
|
||
|
keep `regex` as tight as possible?
|
||
|
|
||
|
## Exposing more internals
|
||
|
[exposing-more-internals]: #exposing-more-internals
|
||
|
|
||
|
It is conceivable that others might find interest in the regex compiler or more
|
||
|
lower level access to the matching engines. We could do something similar to
|
||
|
`regex-syntax` and expose some internals in a separate crate. However, there
|
||
|
isn't a pressing desire to do this at the moment, and would probably require a
|
||
|
good deal of work.
|
||
|
|
||
|
# Breaking changes
|
||
|
[breaking-changes]: #breaking-changes
|
||
|
|
||
|
This section of the RFC lists all breaking changes between `regex 0.1` and the
|
||
|
API proposed in this RFC.
|
||
|
|
||
![]() |
* `find` and `find_iter` now return values of type `Match` instead of
|
||
|
`(usize, usize)`. The `Match` type has `start` and `end` methods which can
|
||
|
be used to recover the original offsets, as well as an `as_str` method to
|
||
|
get the matched text.
|
||
|
* The `Captures` type no longer has any iterators defined. Instead, callers
|
||
|
should use the `Regex::capture_names` method.
|
||
![]() |
* `bytes::Regex` enables the Unicode flag by default. Previously, it disabled
|
||
|
it by default. The flag can be disabled in the pattern with `(?-u)`.
|
||
|
* The definition of the `Replacer` trait was completely re-worked. Namely, its
|
||
|
API inverts control of allocation so that the caller must provide a `String`
|
||
|
to write to. Previous implementors will need to examine the new API. Moving
|
||
|
to the new API should be straight-forward.
|
||
|
* The `is_empty` method on `Captures` was removed since it always returns
|
||
|
`false` (because every `Captures` has at least one capture group
|
||
|
corresponding to the entire match).
|
||
|
* The `PartialEq` and `Eq` impls on `Regex` were removed. If you need this
|
||
|
functionality, add a newtype around `Regex` and write the corresponding
|
||
|
`PartialEq` and `Eq` impls.
|
||
|
* The lifetime parameters for the `iter` and `iter_named` methods on
|
||
|
`Captures` were fixed. The corresponding iterator types, `SubCaptures` and
|
||
|
`SubCapturesNamed`, grew an additional lifetime parameter.
|
||
|
* The constructor, `Regex::with_size_limit`, was removed. It can be replaced
|
||
|
with use of `RegexBuilder`.
|
||
|
* The `is_match` free function was removed. Instead, compile a `Regex`
|
||
|
explicitly and call the `is_match` method.
|
||
|
* Many iterator types were renamed. (e.g., `RegexSplits` to `SplitsIter`.)
|
||
|
* Replacements now return a `Cow<str>` instead of a `String`. Namely, the
|
||
|
subject text doesn't need to be copied if there are no replacements. Callers
|
||
|
may need to add `into_owned()` calls to convert the `Cow<str>` to a proper
|
||
|
`String`.
|
||
|
* The `Error` type no longer has the `InvalidSet` variant, since the error is
|
||
|
no longer possible. Its `Syntax` variant was also modified to wrap a `String`
|
||
|
instead of a `regex_syntax::Error`. If you need access to specific parse
|
||
|
error information, use the `regex-syntax` crate directly.
|
||
|
* To allow future growth, some character classes may no longer compile to make
|
||
|
room for possibly adding class set notation in the future.
|
||
![]() |
* Various iterator types have been renamed.
|
||
|
* The `RegexBuilder` type now takes an `&mut self` on most methods instead of
|
||
|
`self`. Additionally, the final build step now uses `build()` instead of
|
||
|
`compile()`.
|