27 KiB
unsafe_pinned
- Feature Name:
unsafe_pinned
- Start Date: 2022-11-05
- RFC PR: rust-lang/rfcs#3467
- Tracking Issue: rust-lang/rust#125735
Summary
Add a type UnsafePinned
that indicates to the compiler that this field is "pinned" and there might be pointers elsewhere that point to the same memory.
This means, in particular, that &mut UnsafePinned
is not necessarily a unique pointer, and thus the compiler cannot just make aliasing assumptions.
However, &mut UnsafePinned
can still be mem::swap
ed, so this is not a free ticket for arbitrary aliasing of mutable references.
You need to use mechanisms such as Pin
to ensure that mutable references cannot be used in incorrect ways by clients.
This type is then used in generator lowering, finally fixing #63818.
Motivation
Let's say you want to write a type with a self-referential pointer:
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};
pub struct S {
data: i32,
ptr_to_data: *mut i32,
}
impl !Unpin for S {}
impl S {
pub fn new() -> Self {
S { data: 42, ptr_to_data: ptr::null_mut() }
}
pub fn get_data(self: Pin<&mut Self>) -> i32 {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.read() }
}
pub fn set_data(self: Pin<&mut Self>, data: i32) {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.write(data) }
}
}
fn main() {
let mut s = pin!(S::new());
s.as_mut().set_data(42);
println!("{}", s.as_mut().get_data());
}
This kind of code is implicitly generated by rustc all the time when an async fn
has a local variable of reference type that is live across a yield point.
The problem is that this code has UB under our aliasing rules: the &mut S
inside the self
argument of get_data
aliases with ptr_to_data
!
(If you run this code in Miri, remove the impl !Unpin
to see the UB. Miri treats Unpin
as magic as otherwise the entire async ecosystem would cause errors.
But that is not how Unpin
was actually designed.)
This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what async fn
generates -- also have UB under the LLVM aliasing rules.
A more complex variant
The following roughly corresponds to a generator with this code:
let mut data = 0;
let ptr_to_data = &mut data;
yield;
*ptr_to_data = 42;
println!("{}", data);
return;
When implemented by hand, it looks as follows, and causes aliasing issues:
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};
use std::task::Poll;
pub struct S {
state: i32,
data: i32,
ptr_to_data: *mut i32,
}
impl !Unpin for S {}
impl S {
pub fn new() -> Self {
S { state: 0, data: 0, ptr_to_data: ptr::null_mut() }
}
fn poll(self: Pin<&mut Self>) -> Poll<()> {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
match this.state {
0 => {
// The first time, set up the pointer.
this.ptr_to_data = ptr::addr_of_mut!(this.data);
// Now yield.
this.state += 1;
Poll::Pending
}
1 => {
// After coming back from the yield, write to the pointer.
unsafe { this.ptr_to_data.write(42) };
// And read our local variable `data`.
// THIS IS UB! `this` is derived from the `noalias` pointer
// `self` but we did a write to `this.data` in the previous
// line when writing to `ptr_to_data`. The compiler is allowed
// to reorder this and the previous line and then the output
// would change.
println!("{}", this.data);
// Now yield and be done.
this.state += 1;
Poll::Ready(())
}
_ => unreachable!(),
}
}
}
fn main() {
let mut s = pin!(S::new());
while let Poll::Pending = s.as_mut().poll() {}
}
Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements.
When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements.
This is an example of an intrusive list in the standard library that is breaking Rust's aliasing rules.
Pin
is sometimes used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers) and provide a safe API, but there still is the problem that an &mut Node
is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules.
The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB. We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.
Guide-level explanation
To write this code in a UB-free way, wrap the fields that are targets of self-referential pointers in an UnsafePinned
:
pub struct S {
data: UnsafePinned<i32>, // <!---- here
ptr_to_data: *mut i32,
}
impl S {
pub fn new() -> Self {
S { data: UnsafePinned::new(42), ptr_to_data: ptr::null_mut() }
}
pub fn get_data(self: Pin<&mut Self>) -> i32 {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = UnsafePinned::raw_get_mut(ptr::addr_of_mut!(this.data));
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.read() }
}
pub fn set_data(self: Pin<&mut Self>, data: i32) {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = UnsafePinned::raw_get_mut(ptr::addr_of_mut!(this.data));
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.write(data) }
}
}
This lets the compiler know that mutable references to data
might still have aliases, and so optimizations cannot assume that no aliases exist.
That's entirely analogous to how UnsafeCell
lets the compiler know that shared references to this field might undergo mutation.
However, what is not analogous is that &mut S
, when handed to safe code you do not control, must still be unique pointers!
This is because of methods like mem::swap
that can still assume that their two arguments are non-overlapping.
(mem::swap
on two &mut UnsafePinned<i32>
may soundly assume that they do not alias.)
In other words, the safety invariant of &mut S
still requires full ownership of the entire memory range S
is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory.
But again, this is a safety invariant; it only applies to safe code you do not control. You can write your own code handling &mut S
and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.
To hand such references to safe code, use Pin
: the type Pin<&mut S>
can be safely given to external code, since the Pin
wrapper blocks access to operations like mem::swap
.
Similarly, the intrusive linked list from the motivation can be fixed by wrapping the entire UnsafeListEntry
in UnsafePinned
.
Reference-level explanation
API sketch:
/// The type `UnsafePinned<T>` lets unsafe code violate
/// the rule that `&mut UnsafePinned<T>` may never alias anything else.
///
/// However, even if you define your type like `pub struct Wrapper(UnsafePinned<...>)`,
/// it is still very risky to have an `&mut Wrapper` that aliases
/// anything else. Many functions that work generically on `&mut T` assume that the
/// memory that stores `T` is uniquely owned (such as `mem::swap`). In other words,
/// while having aliasing with `&mut Wrapper` is not immediate Undefined
/// Behavior, it is still unsound to expose such a mutable reference to code you do
/// not control! Techniques such as pinning via `Pin` are needed to ensure soundness.
///
/// Similar to `UnsafeCell`, `UnsafePinned` will not usually show up in the public
/// API of a library. It is an internal implementation detail of libraries that
/// need to support aliasing mutable references.
///
/// Further note that this does *not* lift the requirement that shared references
/// must be read-only! Use `UnsafeCell` for that.
///
/// This type blocks niches the same way `UnsafeCell` does.
#[lang = "unsafe_aliased"]
#[repr(transparent)]
struct UnsafePinned<T: ?Sized> {
value: T,
}
/// When this type is used, that almost certainly means safe APIs need to use pinning
/// to avoid the aliases from becoming invalidated. Therefore let's mark this as `!Unpin`.
impl<T> !Unpin for UnsafePinned<T> {}
/// The type is `Copy` when `T` is to avoid people assuming that `Copy` implies there
/// is no `UnsafePinned` anywhere. (This is an issue with `UnsafeCell`: people use `Copy` bounds
/// to mean `Freeze`.) Given that there is no `unsafe impl Copy for ...`, this is also
/// the option that leaves the user more choices (as they can always wrap this in a `!Copy` type).
impl<T: Copy> Copy for UnsafePinned<T> {}
impl<T: Copy> Clone for UnsafePinned<T> {
fn clone(&self) -> Self { *self }
}
// `Send` and `Sync` are inherited from `T`. This is similar to `SyncUnsafeCell`, since
// we eventually concluded that `UnsafeCell` implicitly making things `!Sync` is sometimes
// unergonomic. A type that needs to be `!Send`/`!Sync` should really have an explicit
// opt-out itself, e.g. via an `PhantomData<*mut T>` or (one day) via `impl !Send`/`impl !Sync`.
impl<T: ?Sized> UnsafePinned<T> {
/// Constructs a new instance of `UnsafePinned` which will wrap the specified
/// value.
pub fn new(value: T) -> UnsafePinned<T> where T: Sized {
UnsafePinned { value }
}
pub fn into_inner(self) -> T where T: Sized {
self.value
}
/// Get read-write access to the contents of an `UnsafePinned`.
///
/// You should usually be using `get_mut_pinned` instead to explicitly track
/// the fact that this memory is "pinned" due to there being aliases.
pub fn get_mut_unchecked(&mut self) -> *mut T {
ptr::addr_of_mut!(self.value)
}
/// Get read-write access to the contents of a pinned `UnsafePinned`.
pub fn get_mut_pinned(self: Pin<&mut Self>) -> *mut T {
// SAFETY: we're not using `get_unchecked_mut` to unpin anything
unsafe { ptr::addr_of_mut!(self.get_unchecked_mut().value) }
}
/// Get read-only access to the contents of a shared `UnsafePinned`.
/// Note that `&UnsafePinned<T>` is read-only if `&T` is read-only.
/// This means that if there is mutation of the `T`, future reads from the
/// `*const T` returned here are UB!
///
/// ```rust
/// unsafe {
/// let mut x = UnsafePinned::new(0);
/// let ref1 = &mut *addr_of_mut!(x);
/// let ref2 = &mut *addr_of_mut!(x);
/// let ptr = ref1.get(); // read-only pointer, assumes immutability
/// ref2.get_mut().write(1);
/// ptr.read(); // UB!
/// }
/// ```
pub fn get(&self) -> *const T {
self as *const _ as *const T
}
pub fn raw_get_mut(this: *mut Self) -> *mut T {
this as *mut T
}
pub fn raw_get(this: *const Self) -> *const T {
this as *const T
}
}
The comment about aliasing &mut
being "risky" refers to the fact that their safety invariant still asserts exclusive ownership.
This implies that duplicate
in the following example, while not causing immediate UB, is still unsound:
pub struct S {
data: UnsafePinned<i32>,
}
impl S {
fn new(x: i32) -> Self {
S { data: UnsafePinned::new(x) }
}
fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
let s1 = unsafe { (&s).transmute_copy() };
let s2 = s;
(s1, s2)
}
}
The unsoundness is easily demonstrated by using safe code to cause UB:
let mut s = S::new(42);
let (s1, s2) = s.duplicate(); // no UB
mem::swap(s1, s2); // UB
We could even soundly make get_mut_unchecked
return an &mut T
, given that the safety invariant is not affected by UnsafePinned
.
But that would probably not be useful and only cause confusion.
Here is a polyfill on current Rust that uses the Unpin
hack to achieve mostly the same effect as this API.
("Mostly" because a safe impl Unpin for ...
can un-do the effect of this, which would not be the case with the real UnsafePinned
.)
Reference diff:
* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
- [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
+ [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
+ the `&mut T` contains an [`UnsafePinned<U>`].
Async generator lowering changes:
- Fields that represent local variables whose address is taken across a yield point must be wrapped in
UnsafePinned
.
rustc and Miri changes:
- We have a
UnsafeUnpin
auto trait similar toFreeze
that is implemented if the type does not contain any by-valUnsafePinned
. This trait is an internal implementation detail and (for now) not exposed to users. noalias
on mutable references is only emitted forUnsafeUnpin
types. (This replaces the current hack where it is only emitted forUnpin
types.)- Niches are blocked on
UnsafePinned
. - Miri will do
SharedReadWrite
retagging insideUnsafePinned
similar to what it does insideUnsafeCell
already. (This replaces the currentUnpin
-based hack.)
Comparison with some other types that affect aliasing
UnsafeCell
: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e.&UnsafeCell<T>
is special.UnsafeCell<&T>
(by-val, fully owned) is not special at all and basically like&T
;&mut UnsafeCell<T>
is also not special. Safe wrappers around this type can expose mutability behind shared references, such as&RefCell<T>
.UnsafePinned
: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e.&mut UnsafePinned<T>
is special.UnsafePinned<&mut T>
(by-val, fully owned) is not special at all and basically like&mut T
;&UnsafePinned<T>
is also not special. Safe wrappers around this type can expose sharing that involves pinned mutable references, such asPin<&mut MyFuture>
.MaybeDangling
: disables aliasing and dereferencable of all references (and boxes) directly inside it, i.e.MaybeDangling<&[mut] T>
is special.&[mut] MaybeDangling<T>
is not special at all and basically like&[mut] T
.
Drawbacks
-
It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with
UnsafeCell
orMaybeDangling
. Furthermore, it is an extremely subtle wrapper type, as theduplicate
example shows. -
UnsafeUnpin
is a somewhat unfortunate twin toUnpin
. The purpose ofUnsafeUnpin
really is only to search forUnsafePinned
fields, so that we can use the trait solver to determine whether an&mut
reference getsnoalias
or not. The actual safety promise ofUnsafeUnpin
is likely going to be exactly the same asUnpin
, but we can't use a stable and safe trait to determinenoalias
:impl UnsafeUnpin for T
would add thenoalias
back to&mut T
, and that can lead to very surprising aliasing issues as thepoll_fn
debacle showed. (Note thatPollFn
has already been fixed, but that doesn't mean nobody will make similar mistakes in the future so it is worth discussing how the original, problematicPollFn
would fare under this RFC.) Splitting up the traits partially mitigates such issues: afterimpl<T> Unpin for PollFn<T>
,PollFn<T>
is (in general)Unpin + !UnsafeUnpin
. The known examples of UB that Miri found all were caused by bad aliasing assumptions, which no longer occur when the aliasing assumptions are tied toUnsafeUnpin
rather thanUnpin
. Actually moving thePollFn
would still cause problems (and that can be done in safe code since it implementsUnpin
), but now the chances of code causing UB are much reduced since one must both pin data that's moved into a closure, and move that closure -- even though the Rust compiler will not help prevent such moves, programmers thinking carefully about pinning are hopefully less likely to then try to move that closure. In conclusion,Unpin + !UnsafeUnpin
types are somewhat foot-gunny but less foot-gunny than the status quo. Maybe in a future editionUnpin
can be transitioned to an unsafe trait and then this situation can be re-evaluated; for now,UnsafeUnpin
remains an unstable implementation detail similar toFreeze
. (UnsafeUnpin + !Unpin
types are harmless, one just loses the ability to callPin::deref_mut
for no good reason.) -
It is unfortunate that
&mut UnsafePinned
and&mut TypeThatContainsUnsafePinned
lose their no-alias assumptions even when they are not currently pinned. However, since pinning was implemented as a library concept, there's not really any way the compiler can know whether an&mut UnsafePinned
is pinned or not -- working withPin<&mut TypeThatContainsUnsafePinned>
generally requires usingPin::get_unchecked_mut
andPin::map_unchecked_mut
, which exposes&mut TypeThatContainsUnsafePinned
and&mut UnsafePinned
that still need to be considered aliased.
Rationale and alternatives
The proposal in this RFC matches what was discussed with the lang team a long time ago.
However, of course one could imagine alternatives:
-
Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is
Unpin
. This is unsatisfying:Unpin
was never meant to have this job. A consequence is that a strayimpl Unpin
on aWrapper<T>
-style type can lead to subtle miscompilations since it re-adds aliasing requirements for the innerT
. Contrast this with theUnsafeCell
situation, where it is not possible for (stable) code to justimpl Freeze for T
in the wrong way --UnsafeCell
is always recognized by the compiler.On the other hand,
UnsafePinned
is rather quirky in its behavior and having two marker traits (UnsafeUnpin
andUnpin
) might be too confusing, so sticking withUnpin
might not be too bad in comparison.If we do that, however, it seems preferrable to transition
Unpin
to an unsafe trait. There is a clear statement about the types' invariants associated withUnpin
, so animpl Unpin
already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don't have safe field projections onPin
. That said, solving this also requires solving the trouble aroundDrop
andPin
, where effectively animpl Drop
does an implicitget_mut_unchecked
, i.e., implicitly assumes the type isUnpin
. -
UnsafePinned
could affect aliasing guarantees both on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing&mut UnsafePinned<T>
is cast or coerced to&UnsafePinned<T>
: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' thanUnsafeCell
in the sense that replacingUnsafeCell
byUnsafePinned
would always be correct. (Under the RFC as written,UnsafeCell
andUnsafePinned
can be nested to remove aliasing requirements from both shared and mutable references.)If we don't do this, we could consider removing
get
since it seems too much like a foot-gun. But that makes shared references toUnsafePinned
fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are. -
Instead of introducing a new type, we could say that
UnsafeCell
affects both shared and mutable references. That would lose some optimization potential on types like&mut Cell<T>
, but would avoid the footgun of coercing an&mut UnsafePinned<T>
to&UnsafePinned<T>
. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is able to detect such issues). -
We could entirely avoid all these problems by not having aliasing restrictions on mutable references. But that is completely against the direction Rust has had for 8 years now, and it would mean removing LLVM
noalias
annotations for mutable references (and likely boxes) entirely. That is sacrificing optimization potential for the common case in favor of simplifying niche cases such as self-referential structs -- which is against the usual design philosophy of Rust. -
Instead of adding a new type that needs to be used as
Pin<&mut UnsafePinned<T>>
, can't we just makePin<&mut T>
special? The answer is no, because working withPin<&mut T>
in unsafe code usually involves getting direct access to the&mut
and then using it "carefully". But being careful is not enough when the compiler makes non-aliasing assumptions! We need to preserve the fact that the&mut T
may have aliases even afterPin::get_unchecked_mut
was used and insidePin::map_unchecked_mut
. In a different universe where pinning is a first-class concept with native support for projections and no need forget_unchecked_mut
, this might not have been required, but with pinning being introduced as a library type, there is no (currently known) alternative toUnsafePinned
.
In terms of rationale, the question that comes to mind first is why is this so different from UnsafeCell
.
UnsafeCell
opts-out of read-only guarantees for shared references, can't we just have a type that opts-out of uniqueness guarantees for mutable references?
The answer is no, because mutable references have some universal operations that exploit their uniqueness -- in particular, mem::swap
.
In contrast, there exists no operation available on all shared references that exploits their immutability.
This is why we need pinning to make APIs around UnsafePinned
actually sound.
Naming
An earlier proposal suggested to call the type UnsafeAliased
, since the type is not inherently tied to pinning.
However, it is not possible to write sound wrappers around UnsafeAliased
such that we can have aliasing &mut MyType
. One has to use pinning for that: Pin<&mut MyType>
.
Because of that, the RFC proposes that we suggestively put pinning into the name of the type, so that people don't confuse it with a general mechanism for aliasing mutable references.
It is more like the core primitive behind pinning, where whenever a type is pinned that is caused by an UnsafePinned
field somewhere inside it.
For instance, it may be tempting to use an UnsafeAliased
type to mark a single field in some struct as "separately aliased", and then a Mutex<Struct>
would acquire ownership of the entire struct except for that field.
However, due to mem::swap
, that would not be sound.
One cannot hand out an &mut
to such aliased memory as part of a safe-to-use abstraction -- except by using pinning.
Prior art
This is somewhat like UnsafeCell
, but for mutable instead of shared references.
Adding something like this to Rust has been discussed many times throughout the years. Here are some links for recent discussions:
Unresolved questions
- Is there a better name than
UnsafePinned
for the type? What about the trait --UnsafeUnpin
is awkward. MaybeUnique
is better? - How do we transition code that relies on
Unpin
opting-out of aliasing guarantees to the new type? Here's a plan: define thePhantomPinned
type asUnsafePinned<()>
. Places in the standard library that useimpl !Unpin for
and the generator lowering are adjusted to useUnsafePinned
instead. Then as long as nobody outside the standard library used the unstableimpl !Unpin for
, switching thenoalias
-opt-out to theUnsafeUnpin
trait is actually backwards compatible with the (never explicitly supported)Unpin
hack! However, if we ever makeUnsafePinned
location-relevant (i.e., only data inside theUnsafePinned
is allowed to have aliases), then unsafe code in the ecosystem needs to be adjusted. The justification for this is that currently this code relies on undocumented unstable behavior (using!Unpin
to opt-out of aliasing guarantees), so we are in our right to declare it unsound. Of course we should give the ecosystem time to migrate to the new approach before we actually start doing optimizations that exploit this UB. - Relatedly, in which module should this type live?
Unpin
also affects thedereferenceable
attribute, so the same would happen for this type. Is that something we want to guarantee, or do we hope to get backdereferenceable
when better semantics for it materialize on the LLVM side?
Future possibilities
- Similar to how we might want the ability to project from
&UnsafeCell<Struct>
to&UnsafeCell<Field>
, the same kind of projection could also be interesting forUnsafePinned
.