rfcs/text/3513-gen-blocks.md

28 KiB

Summary

This RFC reserves the gen keyword in the Rust 2024 edition for generators and adds gen { .. } blocks to the language. Similar to how async blocks produce values that can be awaited with .await, gen blocks produce values that can be iterated over with for.

Motivation

Writing iterators manually can be painful. Many iterators can be written by chaining together iterator combinators, but some need to be written with a manual implementation of Iterator. This can push people to avoid iterators and do worse things such as writing loops that eagerly store values to mutable state. With gen blocks, we can now write a simple for loop and still get a lazy iterator of values.

By way of example, consider these alternate ways of expressing run-length encoding:

// This example uses `gen` blocks, introduced in this RFC.
fn rl_encode<I: IntoIterator<Item = u8>>(
    xs: I,
) -> impl Iterator<Item = u8> {
    gen {
        let mut xs = xs.into_iter();
        let (Some(mut cur), mut n) = (xs.next(), 0) else { return };
        for x in xs {
            if x == cur && n < u8::MAX {
                n += 1;
            } else {
                yield n; yield cur;
                (cur, n) = (x, 0);
            }
        }
        yield n; yield cur;
    }.into_iter()
}

// This example uses a manual implementation of `Iterator`.
fn rl_encode<I: IntoIterator<Item = u8>>(
    xs: I,
) -> impl Iterator<Item = u8> {
    struct RlEncode<I: IntoIterator<Item = u8>> {
        into_xs: Option<I>,
        xs: Option<<I as IntoIterator>::IntoIter>,
        cur: Option<<I as IntoIterator>::Item>,
        n: u8,
        yield_x: Option<<I as IntoIterator>::Item>,
    }
    impl<I: IntoIterator<Item = u8>> Iterator for RlEncode<I> {
        type Item = u8;
        fn next(&mut self) -> Option<Self::Item> {
            let xs = self.xs.get_or_insert_with(|| unsafe {
                self.into_xs.take().unwrap_unchecked().into_iter()
            });
            if let Some(x) = self.yield_x.take() {
                return Some(x);
            }
            loop {
                match (xs.next(), self.cur) {
                    (Some(x), Some(cx))
                        if x == cx && self.n < u8::MAX => self.n += 1,
                    (Some(x), Some(cx)) => {
                        let n_ = self.n;
                        (self.cur, self.n) = (Some(x), 0);
                        self.yield_x = Some(cx);
                        return Some(n_);
                    }
                    (Some(x), None) => {
                        (self.cur, self.n) = (Some(x), 0);
                    }
                    (None, Some(cx)) => {
                        self.cur = None;
                        self.yield_x = Some(cx);
                        return Some(self.n);
                    }
                    (None, None) => return None,
                }
            }
        }
    }
    RlEncode {
        into_xs: Some(xs), xs: None, cur: None, n: 0, yield_x: None,
    }
}

// This example uses `iter::from_fn`.
fn rl_encode<I: IntoIterator<Item = u8>>(
    xs: I,
) -> impl Iterator<Item = u8> {
    let (mut cur, mut n, mut yield_x) = (None, 0, None);
    let (mut into_xs, mut xs) = (Some(xs), None);
    core::iter::from_fn(move || loop {
        let xs = xs.get_or_insert_with(|| unsafe {
            into_xs.take().unwrap_unchecked().into_iter()
        });
        if let Some(x) = yield_x.take() {
            return Some(x);
        }
        match (xs.next(), cur) {
            (Some(x), Some(cx)) if x == cx && n < u8::MAX => n += 1,
            (Some(x), Some(cx)) => {
                let n_ = n;
                (cur, n) = (Some(x), 0);
                yield_x = Some(cx);
                return Some(n_);
            }
            (Some(x), None) => (cur, n) = (Some(x), 0),
            (None, Some(cx)) => {
                cur = None;
                yield_x = Some(cx);
                return Some(n);
            }
            (None, None) => return None,
        }
    })
}

Iterators created with gen blocks return None from next once the gen block has returned (either implicitly at the end of the scope or explicitly with the return statement) and are fused (after next returns None once, it will keep returning None forever).

Guide-level explanation

New keyword

Starting in the 2024 edition, gen is a keyword that cannot be used for naming any items or bindings. This means during the migration to the 2024 edition, all variables, functions, modules, types, etc. named gen must be renamed or be referred to via r#gen.

Return value

gen blocks must diverge or return the unit type. Specifically, the trailing expression must be of the unit or ! type, and any return statements in the block must either be given no argument at all or given an argument of the unit or ! type.

Diverging

For example, a gen block that produces the infinite sequence 0, 1, 0, 1, 0, 1, .. will never return None from next and will only drop its captured state when the iterator is dropped. E.g.:

gen {
    loop {
        yield 0;
        yield 1;
    }
}

If a gen block panics, the behavior is similar to that of return, except that the call to next unwinds instead of returning None.

Error handling

Within gen blocks, the ? operator behaves as follows. When its argument is a value indicating "do not short circuit" (e.g. Option::Some(..), Result::Ok(..), ControlFlow::Continue(..)), that value becomes the result of the expression as usual. When its argument is a value indicating that short-circuiting is desired (e.g. Option::None, Result::Err(..), ControlFlow::Break(..)), the value is first yielded (after being converted by FromResidual::from_residual as usual), then the block returns immediately.

Even when ? is used within a gen block, the block must return a value of type unit or !. That is, it does not return a value of Some(..), Ok(..), or Continue(..) as other such blocks might.

However, note that when ? is used within a gen block, all yield statements will need to be given an argument of a compatible type. For example, if None? is used in an expression, then all yield statements will need to be given arguments of some Option type (or of the ! type) .

Fusing

Iterators produced by gen return None from next repeatedly after having once returned None from next. However, they do not implement FusedIterator, as that is not a language item, but may do so in the future (see the future possibilities).

Holding borrows across yields

Since the Iterator::next method takes &mut self instead of Pin<&mut Self>, we cannot create self-referential gen blocks without taking other steps (see the open questions). Self-referential gen blocks occur when holding a borrow to a local variable across a yield point. E.g.:

gen {
    let xs = vec![1, 2, 3, 4];
    for x in xs.iter() {
        yield x * 2;
    }
    //~^ ERROR borrow may still be in use when `gen` block yields
}

This may in fact be a severe and surprising limitation, and whether we should take the steps necessary to address this before stabilization is left as an open question.

Reference-level explanation

New keyword

In the 2024 edition we reserve gen as a keyword. Rust 2021 will use k#gen to access the same feature. What to do about earlier editions is left as an open question.

Error handling

foo? in gen blocks will stop iteration after the first error as if it desugared to:

match foo.branch() {
    ControlFlow::Break(err) => {
        yield <_ as FromResidual>::from_residual(err);
        return
    },
    ControlFlow::Continue(val) => val,
}

Implementation

This feature is mostly implemented via existing coroutines, though there are some special cases.

We could say that gen blocks are the same as unstable coroutines...

  • ...without arguments,
  • ...with an additional check forbidding holding borrows across yield points,
  • ...with an automatic implementation of a trait allowing the type to be used in for loops (see the open questions),
  • ...that do not panic if invoked again after returning.

Drawbacks

The main drawback is that this adds a language feature for something that can already be written entirely (if more painfully) in user code.

In contrast to full coroutines (currently unstable), gen blocks cannot hold references across yield points (see the open questions, and see from_coroutine which has an Unpin bound on the generator it takes to produce an Iterator).

Reserving the gen keyword will require some adaptation from the ecosystem mostly due to the rand crate which has gen methods on its traits.

Rationale and alternatives

Keyword

We could use iter as the keyword.

Due to unstable coroutines having originally been named "generators" within rustc and nightly Rust, some of the authors connect "generators" with this more powerful control flow construct that can do everything that gen blocks and async blocks can do and more.

There is some appeal in syntactically connecting the Iterator trait with iter blocks, but that would require us to carve out many exceptions for this keyword as iter is widely used for module names and method names, not just in the ecosystem, but also in libstd and libcore. To what degree this might be worse than the situation for the gen keyword we leave as an open question.

Not using the gen keyword now would leave open the possibility of using the gen keyword in the future for a kind of block that might produce types that implement a more powerful Generator trait (perhaps one that takes self by pinned reference) or that implement Coroutine.

Do not do this

Add more combinators

One alternative is to instead add more helper methods to Iterator.

However, it is already difficult for new users of Rust to become familiar with all of the many existing methods on Iterator. Further, some of the new methods we might want would need to be quite generic (similar to array::try_map).

Use crates

We could suggest that people use crates like genawaiter, propane, or iterator_item instead. genawaiter works on stable Rust and provides gen! macro blocks that behave like gen blocks, but it doesn't have compiler support for nice diagnostics or language support for the ? operator. The propane and iterator_item crates use the Coroutine trait from nightly and work mostly like gen would (but therefore require unstable Rust).

Use iter::from_fn

The standard library includes std::iter::from_fn which can be used in some cases, but as we saw in the motivating example, often the improvement over writing out a manual implementation of Iterator is limited.

Have trailing expressions yield one last element

Trailing expressions could have a meaningful value that is yielded before terminating iteration.

However, if we were to do this, we would need to add some other way to immediately terminate iteration without yielding a value. We could do something magical where returning () terminates the iteration, so that this code...

fn foo() -> impl Iterator<Item = i32> {
    gen { 42 }
}

...could be a way to specify std::iter::once(42). However, then logically this code...

fn foo() -> impl Iterator<Item = i32> {
    gen { 42; } // Note the semicolon.
}

...would then not return a value due to the semicolon.

Further, this would make it unclear what the behavior of this...

fn foo() -> impl Iterator<Item = ()> { gen {} }

...should be, as it could reasonably be either std::iter::once(()) or std::iter::empty::<()>().

Note that, under this RFC, because return within gen blocks accepts an argument of type () and yield within gen blocks returns the () type, it is possible to yield one last element concisely with return yield EXPR.

Prior art

CLU, Alphard

The idea of generators that yield their values goes back at least as far as the Alphard language from circa 1975 (see "Alphard: Form and Content", Mary Shaw, 1981). This was later refined into the idea of iterators in the CLU language (see "A History of CLU", Barbara Liskov, 1992 and "CLU Reference Manual", Liskov et al., 1979).

The CLU language opened an iterator context with the iter keyword and produced values with yield statements. E.g.:

odds = iter () yields (int)
  x: int := 1
  while x <= 20 do
    yield x
    x := x + 2
  end
end odds

Icon

In Icon (introduced circa 1977), generators are woven deeply into the language, and any function can return a sequence of values. When done explicitly, the suspend keyword is used. E.g.:

procedure range(i, j)
  while i < j do {
    suspend i
    i +:= 1
  }
  fail
end

Python

In Python, any function that contains a yield statement returns a generator. E.g.:

def odd_dup(xs):
  for x in xs:
    if x % 2 == 1:
      yield x * 2

ECMAScript / JavaScript

In JavaScript, yield can be used within function* generator functions. E.g.:

function* oddDupUntilNegative(xs) {
  for (const x of xs) {
    if (x < 0) {
      return;
    } else if (x % 2 == 1) {
      yield x * 2;
    }
  }
}

These generator functions are general coroutines. yield forms an expression that returns the value passed to next. E.g.:

function* dup(x) {
  while (true) {
    x = yield x * 2;
  }
}

const g = dup(2);
console.assert(g.next().value === 4);
console.assert(g.next(3).value === 6);

Ruby

In Ruby, yield can be used with the Enumerator class to implement an iterator. E.g.:

def odd_dup_until_negative xs
  Enumerator.new do |y|
    xs.each do |x|
      if x < 0
        return
      elsif x % 2 == 1
        y.yield x * 2
      end
    end
  end
end

Ruby also uses yield for a general coroutine mechanism with the Fiber class. E.g.:

def dup
  Fiber.new do |x|
    while true
      x = Fiber.yield x * 2
    end
  end
end

g = dup
4 == (g.resume 2)
6 == (g.resume 3)

Kotlin

In Kotlin, a lazy Sequence can be built using sequence expressions and yield. E.g.:

fun oddDup(xs: Iterable<Int>): Sequence<Int> {
    return sequence {
        for (x in xs) {
            if (x % 2 == 1) {
                yield(x * 2);
            }
        }
    };
}

fun main() {
    for (x in oddDup(listOf(1, 2, 3, 4, 5))) {
        println(x);
    }
}

Swift

In Swift, AsyncStream is used with yield to produce asynchronous generators. E.g.:

import Foundation

let sequence = AsyncStream { k in
    for x in 0..<20 {
        if x % 2 == 1 {
            k.yield(x * 2)
        }
    }
    k.finish()
}

let semaphore = DispatchSemaphore(value: 0)
Task {
    for await elem in sequence {
        print(elem)
    }
    semaphore.signal()
}
semaphore.wait()

Synchronous generators are not yet available in Swift, but may be something they are planning.

C#

In C#, within an iterator, the yield statement is used to either yield the next value or to stop iteration. E.g.:

IEnumerable<int> OddDupUntilNegative(IEnumerable<int> xs)
{
    foreach (int x in xs)
    {
        if (x < 0)
        {
            yield break;
        }
        else if (x % 2 == 1)
        {
            yield return x * 2;
        }
    }
}

Analogously with this RFC and with async blocks in Rust (but unlike async Task in C#), execution of C# iterators does not start until they are iterated.

D

In D, yield is used when constructing a Generator. E.g.:

import std.concurrency;
import std.stdio: writefln;

auto odd_dup(int[] xs) {
    return new Generator!int({
        foreach(x; xs) {
            if (x % 2 == 1) {
                yield(x * 2);
            }
        }
    });
}

void main() {
    auto xs = odd_dup([1, 2, 3, 4, 5]);
    foreach (x; xs) {
        writefln("%d", x);
    }
}

As in Ruby, generators in D are built on top of a more general Fiber class that also uses yield.

Dart

In Dart, there are both synchronous and asynchronous generator functions. Synchronous generator functions return an Iteratable. E.g.:

Iterable<int> oddDup(Iterable<int> xs) sync* {
    for (final x in xs) {
        if (x % 2 == 1) {
            yield x * 2;
        }
    }
}

void main() {
    oddDup(List<int>.generate(20, (x) => x + 1)).forEach(print);
}

Asynchronous generator functions return a Stream object. E.g.:

Stream<int> oddDup(Iterable<int> xs) async* {
    for (final x in xs) {
        if (x % 2 == 1) {
            yield x * 2;
        }
    }
}

void main() {
  oddDup(List<int>.generate(20, (x) => x + 1)).forEach(print);
}

F#

In F#, generators can be expressed with sequence expressions using yield. E.g.:

let oddDup xs = seq {
  for x in xs do
    if x % 2 = 1 then
      yield x * 2 }

for x in oddDup (seq { 1 .. 20 }) do
  printfn "%d" x

Racket

In Racket, generators can be built using generator and yield. E.g.:

#lang racket
(require racket/generator)

(define (odd-dup xs)
  (generator ()
    (for ([x xs])
      (when (odd? x)
        (yield (* 2 x))))))

(define g (odd-dup '(1 2 3 4 5)))
(= (g) 2)
(= (g) 6)
(= (g) 10)

Note that because of the expressive power of call/cc (and continuations in general), generators can be written in Racket (and in other Scheme dialects) as a normal library.

Haskell, Idris, Clean, etc.

In Haskell (and in similar languages such as Idris, Clean, etc.), all functions are lazy unless specially annotated. Consequently, Haskell does not need a special yield operator. Any function can be a generator by recursively building a list of elements that will be lazily returned one at a time. E.g.:

oddDup :: (Integral x) => [x] -> [x]
oddDup [] = []
oddDup (x:xs)
  | odd x = x * 2 : oddDup xs
  | otherwise = oddDup xs

main :: IO ()
main = putStrLn $ show $ take 5 $ oddDup [1..20]

Koka

The Koka language, by contrast, does not lean on laziness. Instead, like Scheme, Koka provides powerful general control flow constructs from which generators, async, coroutines, and other such things fall out naturally. Unlike Scheme, these powerful control flow constructs are typed and are called effect handlers. E.g.:

effect yield<a>
  fun yield(x : a) : ()

fun odd_dup(xs : list<int>) : yield<int> ()
  match xs
    Cons(x,xx) ->
      if x % 2 == 1 then
        yield(x * 2)
      odd_dup(xx)
    Nil -> ()

fun main() : console ()
  with fun yield(i : int)
    println(i.show)
  list(1,20).odd_dup

Note that there is no library being used here and that yield is not a keyword or feature of the language. In Koka, the code above is all that is needed to express generators.

Rust

In Rust, async blocks are built on top of the coroutine transformation. Using a no-op Waker, it's possible to expose this transformation. With that, we can build generators. Without the assistance of macros, the result looks like this:

let odd_dup = |xs| {
    Gen::new(async move |mut y| {
        for x in xs {
            if x % 2 == 1 {
                y.r#yield(x * 2).await;
            }
        }
    })
};

let odd_dup = pin!(odd_dup(1u8..20));
let odd_dup = odd_dup.init();

for (i, x) in odd_dup.enumerate() {
    assert_eq!((i as u8 * 2 + 1) * 2, x);
}

Crates such as genawaiter use this technique.

Unresolved questions

Whether to implement Iterator

There may be benefits to having the type returned by gen blocks not implement Iterator directly. Instead, these blocks would return a type that implements either IntoIterator or a new IntoGenerator trait. Such a design could leave us more appealing options for supporting self-referential gen blocks. We leave this as an open question.

Self-referential gen blocks

We can allow gen blocks to hold borrows across yield points. Should this be part of the initial stabilization?

Here are some options for how we might do this, either before or after stabilization:

  • Add a separate trait for pinned iteration that is also usable with gen and for.
    • Downside: We would have very similar traits for the same thing.
  • Backward compatibly add a way to change the argument type of Iterator::next.
    • Downside: It's unclear whether this is possible.
  • Implement Iterator for Pin<&mut G> instead of for G directly (for some type G that can be produced by gen blocks).
    • Downside: The next method would take a double indirected reference of the form &mut Pin<&mut G> which may present challenges for optimization.

If we were to stabilize gen blocks that could not hold borrows across yield points, this would be a serious usability limitation that users might find surprising. Consequently, whether we should choose to address this before stabilization is an open question.

Keyword

Should we use iter as the keyword since we're producing Iterators?

Alternatively, we could use gen as proposed in this RFC and then later extend its abilities to include those of more powerful generators or coroutines, thereby justifying use of the more general name.

Contextual keyword

Popular crates (like rand) have methods named gen. If we reserve gen as a full keyword, users of Rust 2024 and later editions would need to call these methods as r#gen until these crates update to make some accommodation.

We could instead choose to make gen a contextual keyword and only forbid it in:

  • bindings
  • field names (due to destructuring bindings)
  • enum variants
  • type names

Iterator::size_hint

Should we try to compute a conservative size_hint?

Doing this would reveal information from the body of a generator. But, at least for simple cases, users would likely expect size_hint to not just be the default.

It is backward compatible to later add support for opportunistically implementing size_hint.

Implement other Iterator traits

Might we later want to or be able to implement traits such as DoubleEndedIterator, ExactSizeIterator, etc.?

What to do about Rust 2015 and Rust 2018

In RFC 3101 we reserved prefixed identifiers such as prefix#ident. For this reason, we can make gen blocks available in Rust 2021 using k#gen as was anticipated in the (currently pending) RFC 3098.

Whether and how to make this feature available in Rust 2015 and Rust 2018, however, we leave as an open question.

Future possibilities

yield from (forwarding operator)

Python has the ability to yield from an iterator. Effectively this is syntactic sugar for looping over all elements of the iterator and yielding each individually. There is a wide design space here, but some options are included in the following subsections.

Do nothing, just use loops

Instead of adding special support for this, we could expect that users would write, e.g.:

for x in iter {
    yield x
}

Language support

We could do something like postfix yield, e.g.:

iter.yield

Alternatively, we could use an entirely new keyword.

stdlib macro

We could add a macro to the standard library and prelude. The macro would expand to a for loop + yield. E.g.:

yield_all!(iter)

Complete Coroutine support

We have a Coroutine trait on nightly (previously called Generator) that is more powerful than the Iterator API could possibly be:

  1. resume takes Pin<&mut Self>, allowing self-references across yield points.
  2. yield returns the argument passed to resume.

We could perhaps argue for coroutines to be gen closures while leaving gen blocks as a simpler concept.

There are many open questions here, so we leave this to future work.

async interactions

We could support using await in gen async blocks in a similar way to how we support ? being used within gen blocks. Without a solution for self-referential generators, we'd have the limitation that these blocks could not hold references across await points.

The solution space here is large. This RFC is forward compatible with the solutions we can foresee, so we leave this to later work.

try interactions

We could allow gen try fn foo() -> i32 to mean something akin to gen fn foo() -> Result<i32, E>. Whatever we do here, it should mirror whatever try fn means in the future.

gen fn

This RFC does not introduce gen fn. The syntax design space for this is large and there are open questions around the difference between returning or yielding a type. The options currently known include, e.g.:

fn foo(..) yield .. { .. }
fn foo(..) yields .. { .. }
fn foo(..) => .. { .. }
// Each of the below may instead be combined
// with `yield`, `yields`, or `=>`.
fn* foo(..) -> .. { .. }
gen fn foo(..) -> .. { .. }
gen foo(..) -> .. { .. }
generator fn foo(..) -> .. { .. }

Implement FusedIterator

The iterators produced by gen blocks are fused but do not implement FusedIterator because it is not a language item. We may in the future want for these iterators to implement FusedIterator.