mirror of https://github.com/rust-lang/rfcs.git
union-initialization-and-drop
This commit is contained in:
parent
2d5218a3e2
commit
04c66db03b
|
@ -0,0 +1,340 @@
|
|||
- Feature Name: union_initialization_and_drop
|
||||
- Start Date: 2018-08-03
|
||||
- RFC PR: (leave this empty)
|
||||
- Rust Issue: (leave this empty)
|
||||
|
||||
# Summary
|
||||
[summary]: #summary
|
||||
|
||||
Unions do not allow fields of types that require drop glue, but they may still
|
||||
`impl Drop` themselves. We specify when one may "move" out of a union field and
|
||||
when the union's `drop` is called. To avoid implicit calls of drop, we also
|
||||
restrict the use of `DerefMut` when unions are involved.
|
||||
|
||||
# Motivation
|
||||
[motivation]: #motivation
|
||||
|
||||
Currently, it is unstable to have a non-`Copy` field in the union. The main
|
||||
reason for this is that having fields which need drop glue raises some hard
|
||||
questions about whether to call that drop glue when assigning a union field.
|
||||
Not much progress has been made on stabilizing the unstable union features.
|
||||
This RFC proposes a route forwards that side-steps those hard questions: Do not
|
||||
allow fields with drop glue.
|
||||
|
||||
# Guide-level explanation
|
||||
[guide-level-explanation]: #guide-level-explanation
|
||||
|
||||
## Union Definition
|
||||
|
||||
When defining a union, it is a hard error to use a field type that requires drop glue.
|
||||
Examples:
|
||||
```rust
|
||||
// Accepted
|
||||
union Example1<T> {
|
||||
// `ManuallyDrop<T>` never has drop glue, even if `T` does.
|
||||
f1: ManuallyDrop<T>,
|
||||
// `Cell<i32>` is a fully known type, and does not have drop glue.
|
||||
f2: Cell<i32>,
|
||||
}
|
||||
union Example2<T: Copy> {
|
||||
// `Copy` types never have drop glue.
|
||||
f1: T,
|
||||
}
|
||||
trait Trait3 { type Assoc: Copy; }
|
||||
union Example3<T: Trait3> {
|
||||
// `T::Assoc` is `Copy` and hence cannot have drop glue.
|
||||
f1: T::Assoc,
|
||||
}
|
||||
|
||||
// Rejected
|
||||
union Example4<T> {
|
||||
// `T` might have drop glue.
|
||||
f1: T,
|
||||
}
|
||||
trait Trait5 { type Assoc; }
|
||||
union Example5<T: Trait5> {
|
||||
// `T::Assoc` might have drop glue.
|
||||
f1: T::Assoc,
|
||||
}
|
||||
```
|
||||
|
||||
Ruling out possibly dropping types may seem restrictive, but thanks to
|
||||
`ManuallyDrop` it in fact is not: If the compiler rejects a union definition,
|
||||
you can always wrap field types in `ManuallyDrop` to obtain a working
|
||||
definition. This means you have to manually take care of when to drop the data,
|
||||
but that is already something to be concerned with when working on unions.
|
||||
|
||||
As a consequence, it is quite obvious that writing to a union field will never
|
||||
implicitly call `drop`. Such a write is hence always a safe operation. This
|
||||
removes a whole class of pit falls related to `drop` being called in tricky
|
||||
unsafe code when you might not expect that to happen. (However, see below for
|
||||
some pit falls that remain.)
|
||||
|
||||
## Union initialization and `Drop`
|
||||
|
||||
In two cases, the compiler cares about whether a (field of a) variable is
|
||||
initialized: When deciding whether a move from the field/variable is allowed
|
||||
(for cases where the type is not `Copy`), and when deciding whether or not the
|
||||
variable has to be dropped when it goes out of scope.
|
||||
|
||||
A union just does very simple initialization tracking: There is a single boolean
|
||||
state for the entire union and all of its fields. Nested inner fields are
|
||||
tracked just like they are for structs; however, when the union becomes
|
||||
(un)initialized, then all nested inner fields of all union fields are
|
||||
(un)initialized at once. For example:
|
||||
|
||||
```rust
|
||||
// This code creates bad references and transmute to `Vec` in incorrect ways.
|
||||
// This is just to demonstrate what the compiler would accept in terms of
|
||||
// tracking initialization.
|
||||
|
||||
struct S(i32, i32); // not `Copy`, no drop glue
|
||||
union U { f1: ManuallyDrop<Vec<i32>>, f2: S, f3: i32 }
|
||||
|
||||
let mut u: U;
|
||||
// Now `u` is not initialized: `&u`, `&u.f2` and `&u.f2.0` are all rejected.
|
||||
|
||||
// We can write into uninitialized inner fields:
|
||||
u.f2.1 = 42;
|
||||
let _ = &u.f2.1; // This field is initialized now.
|
||||
// But this does not change the initialization state of the union itself,
|
||||
// or any other (inner) field.
|
||||
|
||||
// We can initialize by assigning an entire field:
|
||||
u.f1 = ManuallyDrop::new(Vec::new());
|
||||
// Now *all fields* of `u` are initialized:
|
||||
let _ = &u.f2;
|
||||
|
||||
// Equivalently, we can assign the entire union:
|
||||
u = U { f2: S(42, 23) };
|
||||
// Now `u` is still initialized.
|
||||
|
||||
// Copying does not change anything:
|
||||
let _ = u.f3;
|
||||
// Now `u` is still initialized.
|
||||
|
||||
// We can move out of an initialized union:
|
||||
let v = u.f1;
|
||||
// Now `u.f1` and `u.f2` are no longer initialized (they got "moved out of").
|
||||
// `let _ = u.f2;` would hence get rejected, as would `&u.f1` and `foo(u)`.
|
||||
u.f1 = v;
|
||||
// Now `u` and all of its fields are initialized again ("moving back in").
|
||||
```
|
||||
|
||||
If the union implements `Drop`, the same restrictions as for structs apply: It
|
||||
is not possible to initialize a field before initializing the entire variable,
|
||||
and it is not possible to move out of a field. For example:
|
||||
|
||||
```rust
|
||||
// This code creates bad references and transmute to `Vec` in incorrect ways.
|
||||
// This is just to demonstrate what the compiler would accept in terms of
|
||||
// tracking initialization.
|
||||
|
||||
struct S(i32); // not `Copy`, no drop glue
|
||||
|
||||
union U { f1: ManuallyDrop<Vec<i32>>, f2: S, f3: u32 }
|
||||
impl Drop for U {
|
||||
fn drop(&mut self) {
|
||||
println!("Goodbye!");
|
||||
}
|
||||
}
|
||||
|
||||
let mut u: U;
|
||||
// `u.f1 = ...;` gets rejected: Cannot initialize a union with `Drop` by assigning a field.
|
||||
u = U { f2: S(42) };
|
||||
// Now `u` is initialized.
|
||||
|
||||
// `let v = u.f1;` gets rejected: Cannot move out of union that implements `Drop`.
|
||||
let v_ref = &mut u.f1; // creating a reference is allowed
|
||||
let _ = u.f3; // copying out is allowed
|
||||
```
|
||||
|
||||
When a union implementing `Drop` goes out of scope, its destructor gets called if and only if the union is currently considered initialized:
|
||||
(Continuing the example from above.)
|
||||
|
||||
```rust
|
||||
{
|
||||
let u = U { f2: S(42) };
|
||||
// drop gets called
|
||||
}
|
||||
{
|
||||
let u = U { f1: ManuallyDrop::new(Vec::new()) };
|
||||
foo(u.f2);
|
||||
// drop does NOT get called
|
||||
}
|
||||
```
|
||||
|
||||
## Potential pitfalls around `DerefMut`
|
||||
|
||||
There is still a potential pitfall left around assigning to union fields: If the
|
||||
assignment implicitly happens through a `DerefMut`, it may still call drop glue.
|
||||
For example:
|
||||
|
||||
```rust
|
||||
#![feature(untagged_unions)]
|
||||
|
||||
use std::mem::ManuallyDrop;
|
||||
|
||||
union U<T> { x:(), f: ManuallyDrop<T> }
|
||||
|
||||
fn main() {
|
||||
let mut u : U<(Vec<i32>,)> = U { x: () };
|
||||
unsafe { u.f.0 = Vec::new() }; // uninitialized `Vec` being droped
|
||||
}
|
||||
```
|
||||
This requires `unsafe` because it desugars to `ManuallyDrop::deref_mut(&mut u.f).0`,
|
||||
and while writing to a union field is safe, taking a reference is not.
|
||||
|
||||
For this reason, `DerefMut` auto-deref is not applied when working on a union or
|
||||
its fields. However, note that manually dereferencing is still possible, so
|
||||
`*(u.f).0 = Vec::new()` is still a way to drop an uninitialized field! But this
|
||||
can never happen when no `*` is involved, and hopefully dereferencing an element
|
||||
of a union is a clear enough signal that the union better be initialized
|
||||
properly for this to make sense.
|
||||
|
||||
# Reference-level explanation
|
||||
[reference-level-explanation]: #reference-level-explanation
|
||||
|
||||
## Union definition
|
||||
|
||||
When defining a union, it is a hard error to use a field type that requires drop glue.
|
||||
This is checked as follows:
|
||||
|
||||
* Proceed recursively down the given type, insofar as the type involved is known
|
||||
at compile-time. For example, `u32`, `&mut T` and `ManuallyDrop<T>` are known
|
||||
to not have drop glue no matter the choice of `T`.
|
||||
* When hitting a type variable where no progress can be made, check that `T:
|
||||
Copy` as a proxy for `T` not requiring drop glue.
|
||||
|
||||
Note: Currently, union fields with drop glue are allowed on nightly with an
|
||||
unstable feature. This RFC proposes to remove support entirely; code using
|
||||
nightly might have to be changed.
|
||||
|
||||
## Writing to union fields
|
||||
|
||||
Writing to union fields is currently unsafe when the field has drop glue. This
|
||||
check is no longer needed, because union fields will never have drop glue.
|
||||
Moreover, writing to a nested field (e.g., `u.f1.x = 0;`) is currently unsafe as
|
||||
well, this should also become a safe operation *as long as the path consists
|
||||
only of field projections, not deref's*. Note that this is sound only because
|
||||
`ManuallyDrop`'s only field is private (so, in fact, this is *not* sound inside
|
||||
the module that defines `ManuallyDrop`).
|
||||
|
||||
## Union initialization tracking
|
||||
|
||||
A "fragment" is a place of the form `local_var.field.field.field`, without any
|
||||
implicit derefs. A fragment can be either *initialized* or *uninitialized*.
|
||||
This state is approximated statically: The type system will only allow accesses
|
||||
to initialized fragments. Drop elaboration needs to know the precise state of a
|
||||
fragment, for which purpose it adds run-time drop flags as needed.
|
||||
|
||||
If a fragment has some uninitialized nested fragments then it is still
|
||||
uninitialized and accesses to this fragment as a whole are prevented. This
|
||||
applies even if it also has a nested initialized fragment (in which case we speak
|
||||
of a *partially initialized* fragment). If a fragment has only initialized
|
||||
nested fragments then it is initialized as a whole and can be accessed.
|
||||
|
||||
A fragment becomes initialized when it is assigned to, or created using an
|
||||
initializer, or it is a union field and a sibling becomes initialized, or all
|
||||
its nested fragments become initialized. A fragment becomes uninitialized when
|
||||
it doesn't implement `Copy` and is moved out from, or it is a union field
|
||||
(possibly `Copy`) and its sibling becomes uninitialized, or some of its nested
|
||||
fragments becomes uninitialized.
|
||||
|
||||
In other words, union fields behave a lot like struct fields except that if one
|
||||
field changes initialization state, the others follow suit. In particular, if
|
||||
one union field becomes partially initialized (because one of its nested
|
||||
fragments got uninitialized), all its siblings become *entirely* uninitialized.
|
||||
|
||||
If a fragment is of a type which has an `impl Drop`, then its nested fragments
|
||||
cannot be separately (un)initialized. Only the entire fragment can be
|
||||
initialized by assignment, and the entire fragment can be uninitialized by
|
||||
moving out.
|
||||
|
||||
NOTE: To my knowledge, the following already mostly matches the current
|
||||
implementation. The only exception is that "fragment becomes initialized when
|
||||
all its nested fragments become initialized" rule is not currently implemented
|
||||
for neither structs nor unions, so the compiler accepts less code than it
|
||||
should.
|
||||
|
||||
(This is based on a
|
||||
[previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).)
|
||||
|
||||
## Potential pitfalls around `DerefMut`
|
||||
|
||||
When adding auto-derefs on the left-hand side of an assignment, as we traverse
|
||||
the path, once we hit a `union`, we stop adding further auto-derefs. So with
|
||||
`s: Struct` and `u: Union`, when encountering `s.u.f.x`, auto-deref *does*
|
||||
happen on `s`, but not on `s.u` or any of the later components.
|
||||
|
||||
Notice that this relies crucially on the only field of `ManuallyDrop` being
|
||||
private! If we could project directly through that field, no `DerefMut` would
|
||||
be needed to reproduce the problematic example from the "guide" section.
|
||||
|
||||
# Drawbacks
|
||||
[drawbacks]: #drawbacks
|
||||
|
||||
This makes working with unions involving types that may have drop glue slightly
|
||||
more verbose than today: One has to write `ManuallyDrop` more often than one may
|
||||
want to.
|
||||
|
||||
The restriction placed on `DerefMut` is not fully backwards compatible: A type
|
||||
could implement `Copy + DerefMut` and actually rely on the deref coercion inside
|
||||
a union. That seems very unlikely, but should be tested with a crater run.
|
||||
|
||||
The initialization tracking rules are somewhat surprising, and one might want to
|
||||
prefer the compiler to just not track anything when it comes to unions. After
|
||||
all, the compiler fundamentally cannot know what part of the union is properly
|
||||
initialized. Unfortunately, not having any initialization tracking is not an
|
||||
option when non-`Copy` fields are involved: We have to decide when moving out of
|
||||
a union field is allowed.
|
||||
|
||||
# Rationale and alternatives
|
||||
[rationale-and-alternatives]: #rationale-and-alternatives
|
||||
|
||||
Ruling out fields with drop glue does not, in fact, reduce the expressiveness of
|
||||
unions because one can use `ManuallyDrop<T>` to obtain a drop-glue-free version
|
||||
of `T`. If anything, having the `ManuallyDrop` in the union definition should
|
||||
help to drive home the point that no automatic dropping is happening, ever.
|
||||
(Before this RFC, automatic dropping is happening when assigning to a union
|
||||
field but not when the union goes out of scope. That seems to be the result of
|
||||
necessity, not of a coherent design.)
|
||||
|
||||
An alternative approach to proceed with unions is
|
||||
[this previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).
|
||||
That RFC replaces RFC 1444 and goes into a lot more points than this much more
|
||||
limited proposal. In particular, it allows fields with drop glue. However, it
|
||||
can be pretty hard for the programmer to predict when drop glue will be
|
||||
automatically invoked on assignment or not, because the initialization tracking
|
||||
(which this RFC adapts from @petrochenkov's proposal) can sometimes be a little
|
||||
surprising when looking at individual fields: Whether `u.f2 = ...;` drops
|
||||
depends on whether `u.f1` has been previously initialized. @petrochenkov hence
|
||||
proposes a lint to warn people that unions with drop-glue fields are not always
|
||||
very well-behaved. This RFC, on the other hand, side-steps the entire question
|
||||
by not allowing fields with drop glue. Initialization tracking thus has no
|
||||
effect on the code executed during an assignment of a union field. For unions
|
||||
that `impl Drop`, it still has an effect on what happens when the union goes out
|
||||
of scope, but in that case initialization is so restricted that I cannot think
|
||||
of any surprises. Together with the `DerefMut` restriction, that should make it
|
||||
very unlikely to accidentally call `drop` when it was not intended.
|
||||
|
||||
We could simplify thus further by not to allowing `impl drop for Union`. It is
|
||||
still possible to add a wrapper struct around the union which has drop glue, so
|
||||
this does not restrict expressiveness. However, this seems unnecessarily
|
||||
cumbersome.
|
||||
|
||||
# Prior art
|
||||
[prior-art]: #prior-art
|
||||
|
||||
I do not know of any language combining initialization tracking and destructors
|
||||
with unions: C++ does not have destructors for unions, and it does not track
|
||||
whether fields of a data structures are initialized to (dis)allow references.
|
||||
|
||||
# Unresolved questions
|
||||
[unresolved-questions]: #unresolved-questions
|
||||
|
||||
Should we even try to avoid the `DerefMut`-related pitfall? And if yes, should
|
||||
we maybe try harder, e.g. lint against using `*` below a union type when
|
||||
describing a place? That would make people write `let v = &mut u.f; *v =
|
||||
Vec::new();`. It is not clear that this helps in terms of pointing out that an
|
||||
automatic drop may be happening.
|
|
@ -10,6 +10,8 @@ Provide native support for C-compatible unions, defined via a new "contextual
|
|||
keyword" `union`, without breaking any existing code that uses `union` as an
|
||||
identifier.
|
||||
|
||||
**Note:** This RFC has been partially superseded by `unions-and-drop`.
|
||||
|
||||
# Motivation
|
||||
[motivation]: #motivation
|
||||
|
||||
|
|
Loading…
Reference in New Issue