21 KiB
- Feature Name: dropck_parametricity
- Start Date: 2015-08-05
- RFC PR: rust-lang/rfcs#1238/
- Rust Issue: rust-lang/rust#28498
Summary
Revise the Drop Check (dropck
) part of Rust's static analyses in two
ways. In the context of this RFC, these revisions are respectively
named cannot-assume-parametricity
and unguarded-escape-hatch
.
-
cannot-assume-parametricity
(CAP): Makedropck
analysis stop relying on parametricity of type-parameters. -
unguarded-escape-hatch
(UGEH): Add an attribute (with some name starting with "unsafe") that a library designer can attach to adrop
implementation that will allow a destructor to side-step thedropck
's constraints (unsafely).
Motivation
Background: Parametricity in dropck
The Drop Check rule (dropck
) for Sound Generic Drop relies on a
reasoning process that needs to infer that the behavior of a
polymorphic function (e.g. fn foo<T>
) does not depend on the
concrete type instantiations of any of its unbounded type parameters
(e.g. T
in fn foo<T>
), at least beyond the behavior of the
destructor (if any) for those type parameters.
This property is a (weakened) form of a property known in academic circles as Parametricity. (See e.g. Reynolds, IFIP 1983, Wadler, FPCA 1989.)
-
Parametricity, in this context, essentially says that the compiler can reason about the body of
foo
(and the subroutines thatfoo
invokes) without having to think about the particular concrete types that the type parameterT
is instantiated with.foo
cannot do anything with at: T
except:-
move
t
to some other owner expecting aT
or, -
drop
t
, running its destructor and freeing associated resources.
-
-
For example, this allows the compiler to deduce that even if
T
is instantiated with a concrete type like&Vec<u32>
, the body offoo
cannot actually read anyu32
data out of the vector. More details about this are available on the Sound Generic Drop RFC.
"Mistakes were made"
The parametricity-based reasoning in the
Drop Check analysis (dropck
) was clever, but
fragile and unproven.
-
Regarding its fragility, it has been shown to have bugs; in particular, parametricity is a necessary but not sufficient condition to justify the inferences that
dropck
makes. -
Regarding its unproven nature,
dropck
violated the heuristic in Rust's design to not incorporate ideas unless those ideas had already been proven effective elsewhere.
These issues might alone provide motivation for ratcheting back on
dropck
's rules in the short term, putting in a more conservative
rule in the stable release channel while allowing experimentation with
more-aggressive feature-gated rules in the development nightly release
channel.
However, there is also a specific reason why we want to ratchet back
on the dropck
analysis as soon as possible.
Impl specialization is inherently non-parametric
The parametricity requirement in the Drop Check rule over-restricts the design space for future language changes.
In particular, the impl specialization RFC describes a language
change that will allow the invocation of a polymorphic function f
to
end up in different sequences of code based solely on the concrete
type of T
, even when T
has no trait bounds within its
declaration in f
.
Detailed design
Revise the Drop Check (dropck
) part of Rust's static analyses in two
ways. In the context of this RFC, these revisions are respectively
named cannot-assume-parametricity
(CAP) and unguarded-escape-hatch
(UGEH).
Though the revisions are given distinct names, they both fall under
the feature gate dropck_parametricity
. (Note however that this
might be irrelevant to CAP; see CAP stabilization details).
cannot-assume-parametricity
The heart of CAP is this: make dropck
analysis stop relying on
parametricity of type-parameters.
Changes to the Drop-Check Rule
The Drop-Check Rule (both in its original form and as revised here)
dictates when a lifetime 'a
must strictly outlive some value v
,
where v
owns data of type D
; the rule gave two circumstances where
'a
must strictly outlive the scope of v
.
-
The first circumstance (
D
is directly instantiated at'a
) remains unchanged by this RFC. -
The second circumstance (
D
has some type parameter with trait-provided methods, i.e. that could be invoked withinDrop
) is broadened by this RFC to simply say "D
has some type parameter."
That is, under the changes of this RFC, whether the type parameter has
a trait-bound is irrelevant to the Drop-Check Rule. The reason is that
any type parameter, regardless of whether it has a trait bound or not,
may end up participating in impl specialization, and thus could
expose an otherwise invisible reference &'a AlreadyDroppedData
.
cannot-assume-parametricity
is a breaking change, since the language
will start assuming that a destructor for a data-type definition such
as struct Parametri<C>
may read from data held in its C
parameter,
even though the fn drop
formerly appeared to be parametric with
respect to C
. This will cause rustc
to reject code that it had
previously accepted (below are some examples that
continue to work and
some that start being rejected).
CAP stabilization details
cannot-assume-parametricity
will be incorporated into the beta
and stable Rust channels, to ensure that destructor code atop
stable channels in the wild stop relying on parametricity as soon
as possible. This will enable new language features such as
impl specialization.
-
It is not yet clear whether it is feasible to include a warning cycle for CAP.
-
For now, this RFC is proposing to remove the parts of Drop-Check that attempted to prove that the
impl<T> Drop
was parametric with respect toT
. This would mean that there would be more warning cycle;dropck
would simply start rejecting more code. There would be no way to opt back into the olddropck
rules. -
(However, during implementation of this change, we should double-check whether a warning-cycle is in fact feasible.)
unguarded-escape-hatch
The heart of unguarded-escape-hatch
(UGEH) is this: Provide a new,
unsafe (and unstable) attribute-based escape hatch for use in the
standard library for cases where Drop Check is too strict.
Why we need an escape hatch
The original motivation for the parametricity special-case in the
original Drop-Check rule was due to an observation that collection
types such as TypedArena<T>
or Vec<T>
were often used to
contain values that wanted to refer to each other.
An example would be an element type like
struct Concrete<'a>(u32, Cell<Option<&'a Concrete<'a>>>);
, and then
instantiations of TypedArena<Concrete>
or Vec<Concrete>
.
This pattern has been used within rustc
, for example,
to store elements of a linked structure within an arena.
Without the parametricity special-case, the existence of a destructor
on TypedArena<T>
or Vec<T>
led the Drop-Check analysis to conclude
that those destructors might hypothetically read from the references
held within T
-- forcing dropck
to reject those destructors.
(Note that Concrete
itself has no destructor; if it did, then
dropck
, both as originally stated and under the changes of this RFC,
would force the 'a
parameter of any instance to strictly outlive
the instance value, thus ruling out cross-references in the same
TypedArena
or Vec
.)
Of course, the whole point of this RFC is that using parametricity as the escape hatch seems like it does not suffice. But we still need some escape hatch.
The new escape hatch: an unsafe attribute
This leads us to the second component of the RFC, unguarded-escape-hatch
(UGEH):
Add an attribute (with a name starting with "unsafe") that a library
designer can attach to a drop
implementation that will allow a
destructor to side-step the dropck
's constraints (unsafely).
This RFC proposes the attribute name unsafe_destructor_blind_to_params
.
This name was specifically chosen to be long and ugly; see
UGEH stabilization details for further discussion.
Much like the unsafe_destructor
attribute that we had in the past,
this attribute relies on the programmer to ensure that the destructor
cannot actually be used unsoundly. It states an (unproven) assumption
that the given implementation of drop
(and all functions that this
drop
may transitively call) will never read or modify a value of
any type parameter, apart from the trivial operations of either
dropping the value or moving the value from one location to another.
- (In particular, it certainly must not dereference any
&
-reference within such a value, though this RFC is adopts a somewhat stronger requirement to encourage the attribute to only be used for the limited case of parametric collection types, where one need not do anything more than move or drop values.)
The above assumption must hold regardless of what impact impl specialization has on the resolution of all function calls.
UGEH stabilization details
The proposed attribute is only a short-term patch to work-around a
bug exposed by the combination of two desirable features (namely
impl specialization and dropck
).
In particular, using the attribute in cases where control-flow in the
destructor can reach functions that may be specialized on a
type-parameter T
may expose the system to use-after-free scenarios
or other unsound conditions. This may a non-trivial thing for the
programmer to prove.
-
Short term strategy: The working assumption of this RFC is that the standard library developers will use the proposed attribute in cases where the destructor is parametric with respect to all type parameters, even though the compiler cannot currently prove this to be the case.
The new attribute will be restricted to non-stable channels, like any other new feature under a feature-gate.
-
Long term strategy: This RFC does not make any formal guarantees about the long-term strategy for including an escape hatch. In particular, this RFC does not propose that we stabilize the proposed attribute
It may be possible for future language changes to allow us to directly express the necessary parametricity properties. See further discussion in the continue supporting parametricity alternative.
The suggested attribute name (
unsafe_destructor_blind_to_params
above) was deliberately selected to be long and ugly, in order to discourage it from being stabilized in the future without at least some significant discussion. (Likewise, the acronym "UGEH" was chosen for its likely pronunciation "ugh", again a reminder that we do not want to adopt this approach for the long term.)
Examples of code changes under the RFC
This section shows some code examples, starting with code that works today and must continue to work tomorrow, then showing an example of code that will start being rejected, and ending with an example of the UGEH attribute.
Examples of code that must continue to work
Here is some code that works today and must continue to work in the future:
use std::cell::Cell;
struct Concrete<'a>(u32, Cell<Option<&'a Concrete<'a>>>);
fn main() {
let mut data = Vec::new();
data.push(Concrete(0, Cell::new(None)));
data.push(Concrete(0, Cell::new(None)));
data[0].1.set(Some(&data[1]));
data[1].1.set(Some(&data[0]));
}
In the above, we are building up a vector, pushing Concrete
elements
onto it, and then later linking those concrete elements together via
optional references held in a cell in each concrete element.
We can even wrap the vector in a struct that holds it. This also must continue to work (and will do so under this RFC); such structural composition is a common idiom in Rust code.
use std::cell::Cell;
struct Concrete<'a>(u32, Cell<Option<&'a Concrete<'a>>>);
struct Foo<T> { data: Vec<T> }
fn main() {
let mut foo = Foo { data: Vec::new() };
foo.data.push(Concrete(0, Cell::new(None)));
foo.data.push(Concrete(0, Cell::new(None)));
foo.data[0].1.set(Some(&foo.data[1]));
foo.data[1].1.set(Some(&foo.data[0]));
}
Examples of code that will start to be rejected
The main change injected by this RFC is this: due to cannot-assume-parametricity
,
an attempt to add a destructor to the struct Foo
above will cause the
code above to be rejected, because we will assume that the destructor for Foo
may invoke methods on the concrete elements that dereferences their links.
Thus, this code will be rejected:
use std::cell::Cell;
struct Concrete<'a>(u32, Cell<Option<&'a Concrete<'a>>>);
struct Foo<T> { data: Vec<T> }
// This is the new `impl Drop`
impl<T> Drop for Foo<T> {
fn drop(&mut self) { }
}
fn main() {
let mut foo = Foo { data: Vec::new() };
foo.data.push(Concrete(0, Cell::new(None)));
foo.data.push(Concrete(0, Cell::new(None)));
foo.data[0].1.set(Some(&foo.data[1]));
foo.data[1].1.set(Some(&foo.data[0]));
}
NOTE: Based on a preliminary crater run, it seems that mixing together
destructors with this sort of cyclic structure is sufficiently rare
that no crates on crates.io
actually regressed under the new rule:
everything that compiled before the change continued to compile after
it.
Example of the unguarded-escape-hatch
If the developer of Foo
has access to the feature-gated
escape-hatch, and is willing to assert that the destructor for Foo
does nothing with the links in the data, then the developer can work
around the above rejection of the code by adding the corresponding
attribute.
#![feature(dropck_parametricity)]
use std::cell::Cell;
struct Concrete<'a>(u32, Cell<Option<&'a Concrete<'a>>>);
struct Foo<T> { data: Vec<T> }
impl<T> Drop for Foo<T> {
#[unsafe_destructor_blind_to_params] // This is the UGEH attribute
fn drop(&mut self) { }
}
fn main() {
let mut foo = Foo { data: Vec::new() };
foo.data.push(Concrete(0, Cell::new(None)));
foo.data.push(Concrete(0, Cell::new(None)));
foo.data[0].1.set(Some(&foo.data[1]));
foo.data[1].1.set(Some(&foo.data[0]));
}
Drawbacks
As should be clear by the tone of this RFC, the
unguarded-escape-hatch
is clearly a hack. It is subtle and unsafe,
just as unsafe_destructor
was (and for the most part, the whole
point of Sound Generic Drop was to remove unsafe_destructor
from
the language).
-
However, the expectation is that most clients will have no need to ever use the
unguarded-escape-hatch
. -
It may suffice to use the escape hatch solely within the collection types of
libstd
. -
Otherwise, if clients outside of
libstd
determine that they do need to be able to write destructors that need to bypassdropck
safely, then we can (and should) investigate one of the sound alternatives, rather than stabilize the unsafe hackish escape hatch..
Alternatives
CAP without UGEH
One might consider adopting cannot-assume-parametricity
without
unguarded-escape-hatch
. However, unless some other sort of escape
hatch were added, this path would break much more code.
UGEH for lifetime parameters
Since we're already being unsafe here, one might consider having
the unsafe_destructor_blind_to_params
apply to lifetime parameters
as well as type parameters.
However, given that the unsafe_destructor_blind_to_params
attribute
is only intended as a short-term band-aid (see
UGEH stabilization details) it seems better to just make it only as
broad as it needs to be (and no broader).
"Sort-of Guarded" Escape Hatch
We could add the escape hatch but continue employing the current dropck analysis to it. This would essentially mean that code would have to apply the unsafe attribute to be considered for parametricity, but if there were obvious problems (namely, if the type parameter had a trait bound) then the attempt to opt into parametricity would be ignored and the strict ordering restrictions on the lifetimes would be imposed.
I only mention this because it occurred to me in passing; I do not
really think it has much of a benefit. It would potentially lead
someone to think that their code has been proven sound (since the
dropck
would catch some mistakes in programmer reasoning) but the
pitfalls with respect to specialization would remain.
Continue Supporting Parametricity
There may be ways to revise the language so that functions can declare that they must be parametric with respect to their type parameters. Here we sketch two potential ideas for how one might do this, mostly to give a hint of why this is not a trivial change to the language.
Neither design is likely to be adopted, at least as described here, because both of them impose significant burdens on implementors of parametric destructors, as we will see.
(Also, if we go down this path, we will need to fix other bugs in the Drop Check rule, where, as previously noted, parametricity is a necessary but insufficient condition for soundness.)
Parametricity via effect-system attributes
One feature of the impl specialization RFC is that all functions that
can be specialized must be declared as such, via the default
keyword.
This leads us to one way that a function could declare that its body
must not be allows to call into specialized methods: an attribute like
#[unspecialized]
. The #[unspecialized]
attribute, when applied to
a function fn foo()
, would mean two things:
-
foo
is not allowed to call any functions that have thedefault
keyword. -
foo
is only allowed to call functions that are also marked#[unspecialized]
All fn drop
methods would be required to be #[unspecialized]
.
It is the second bullet that makes this an ad-hoc effect system: it provides
a recursive property ensuring that during the extent of the call to foo
,
we will never invoke a function marked as default
(and therefore, I think,
will never even potentially invoke a method that has been specialized).
It is also this second bullet that represents a significant burden on
the destructor implementor. In particular, it immediately rules out
using any library routine unless that routine has been marked as
#[unspecialized]
. The attribute is unlikely to be included on any
function unless the its developer is making a destructor that calls it
in tandem.
Parametricity via some ?
-bound
Another approach starts from another angle: As described earlier,
parametricity in dropck
is the requirement that fn drop
cannot do
anything with a t: T
(where T
is some relevant type parameter)
except:
-
move
t
to some other owner expecting aT
or, -
drop
t
, running its destructor and freeing associated resources.
So, perhaps it would be more natural to express this requirement via a bound.
We would start with the assumption that functions may be non-parametric (and thus their implementations may be specialized to specific types).
But then if you want to declare a function as having a stronger
constraint on its behavior (and thus expanding its potential callers
to ones that require parametricity), you could add a bound T: ?Special
.
The Drop-check rule would treat T: ?Special
type-parameters as parametric,
and other type-parameters as non-parametric.
The marker trait Special
would be an OIBIT that all sized types would get.
Any expression in the context of a type-parameter binding of the form
<T: ?Special>
would not be allowed to call any default
method
where T
could affect the specialization process.
(The careful reader will probably notice the potential sleight-of-hand here: is this really any different from the effect-system attributes proposed earlier? Perhaps not, though it seems likely that the finer grain parameter-specific treatment proposed here is more expressive, at least in theory.)
Like the previous proposal, this design represents a significant
burden on the destructor implementor: Again, the T: ?Special
attribute is unlikely to be included on any function unless the its
developer is making a destructor that calls it in tandem.
Unresolved questions
-
What name to use for the attribute? Is
unsafe_destructor_blind_to_params
sufficiently long and ugly? ;) -
What is the real long-term plan?
-
Should we consider merging the discussion of alternatives into the impl specialization RFC?
Bibliography
Reynolds
John C. Reynolds. "Types, abstraction and parametric polymorphism". IFIP 1983 http://www.cse.chalmers.se/edu/year/2010/course/DAT140_Types/Reynolds_typesabpara.pdf
Wadler
Philip Wadler. "Theorems for free!". FPCA 1989 http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf