40 KiB
- Feature Name:
arbitrary_self_types
- Start Date: 2023-05-04
- RFC PR: rust-lang/rfcs#3519
- Tracking Issue: rust-lang/rust#44874
Summary
Allow types that implement the new trait Receiver<Target=Self>
to be the receiver of a method.
Motivation
Today, methods can only be received by value, by reference, or by one of a few blessed smart pointer types from core
, alloc
and std
(Arc<Self>
, Box<Self>
, Pin<P>
and Rc<Self>
).
It's been assumed that this will eventually be generalized to support any smart pointer, such as an CustomPtr<Self>
. Since late 2017, it has been available on nightly under the arbitrary_self_types
feature for types that implement Deref<Target=T>
and for raw pointers.
This RFC proposes some changes to the existing nightly feature based on the experience gained, with a view towards stabilizing the feature in the relatively near future.
Motivation for the arbitrary self types feature overall
The Rust async work identified a need to allow self
types of Pin<&mut Self>
(and similar). At that time, certain types - Pin
, Rc
, Box
etc. - became hard coded in stable Rust as valid self
types. That's been sufficient for many use-cases including async Rust, but this special power is currently restricted to these hard-coded types.
Since then, other use-cases have become clear where crates need to make their own smart pointer types with similar powers.
One use-case is cross-language interop (JavaScript, Python, C++). In many cases, automatic code generation tools need to represent foreign language pointers or references somehow in Rust, and often, we want to call methods on such types. But, other languages' references can’t guarantee the aliasing and exclusivity semantics required of a Rust reference. For example, the C++ this
pointer can't be practically or safely represented as a Rust reference because C++ may retain other pointers to the data and it might mutate at any time.
What is a code generator to do? Its options in current stable Rust are poor:
- It can represent foreign pointers/references as
&T
, with a virtual certainty of undefined behavior due to different guarantees in different languages - It can represent foreign pointers/references as
*const T
or*mut T
but can't attach methods. - It can represent foreign pointers/references as a smart pointer type (
CppRef<T>
orCppPtr<T>
) but can't attach methods.
With "arbitrary self types", smart pointer types can be created which obey foreign-language semantics and yet allow method calls:
#[repr(transparent)]
#[derive(Clone)]
/// A C++ reference. Obeys C++ reference semantics, not Rust reference semantics.
/// There is no exclusivity; the underlying data may mutate, etc.
/// (This is an abridged example: a real CppRef type would fully document invariants
/// here.)
pub struct CppRef<T: ?Sized> {
ptr: *const T,
}
impl<T: ?Sized> Receiver for CppRef<T> {
type Target = T;
}
// generated by bindings generator
struct ConcreteCppType {
// ...
}
// all generated by bindings generator; mostly calls into C++
// In this example these are not marked "unsafe" because we do not directly use
// CppRef::ptr in Rust. This example assumes that the corresponding C++ functions
// do not themselves have unsafe behavior and thus can be presented to Rust as safe.
// Safety of FFI is orthogonal to this RFC.
impl ConcreteCppType {
fn some_cpp_method(self: CppRef<Self>) {}
fn get_int_field(self: &CppRef<Self>) -> u32 {}
fn get_more_complex_field(self: &CppRef<Self>) -> CppRef<FieldType> {}
fn equals(self: &CppRef<Self>) -> bool {}
}
// generated by bindings generator
fn get_cpp_reference() -> CppRef<ConcreteCppType> {
// also calls into C++
}
fn main() {
// Rust code manipulating C++ objects via C++-semantics references
let cpp_obj_reference: CppRef<ConcreteCppType> = get_cpp_reference();
// cpp_obj_reference does not obey Rust reference semantics. Other
// "references" to the same data may exist in the Rust or C++ domain.
// But it can effectively be used as an opaque token to pass safely
// through Rust back into C++
let some_value: u32 = cpp_obj_reference.get_int_field();
let some_field = cpp_obj_reference.get_more_complex_field();
cpp_obj_reference.equals(&get_cpp_reference());
}
(fuller example here, with various trait-based attempts to work around the lack of arbitrary self types.)
Another case is when the existence of a reference is, itself, semantically important — for example, reference counting, or if relayout of a UI should occur each time a mutable reference ceases to exist. In these cases it's not OK to allow a regular Rust reference to exist, and yet sometimes we still want to be able to call methods on a reference-like thing.
A third motivation is that taking smart pointer types as self
parameters can enable functions to act on the smart pointer type, not just the underlying data. For example, taking &Arc<T>
allows the functions to both clone the smart pointer (noting that the underlying T
might not implement Clone
) in addition to access the data inside the type, which is useful for some methods; this also makes it ergonomic in more cases to make Arc<SomeType>
explicit rather than having SomeType
contain an Arc
internally and have Arc
-like clone
semantics. Also, being able to change a method from accepting &self
to self: &Arc<Self>
can be done in a mostly frictionless way, whereas changing from &self
to a static method accepting &Arc<Self>
will always require some amount of refactoring. These options are currently open only to Rust's built-in smart pointer types, not to custom smart pointer types.
Finally, there's just a matter of symmetry with Rust's own smart pointer types. The Rust for Linux project, for instance, requires a custom Arc
type. In theory, users can define their own smart pointers. In practice, they're second-class citizens compared to the smart pointers in Rust's standard library. A type T
can accept method calls using smart pointers as the self
type only if they're one of Rust's built-in smart pointers.
This RFC proposes to loosen this restriction to allow custom smart pointer types to be accepted as a self
type just like for the standard library types.
See also this blog post, especially for a list of more specific use-cases.
Motivation for the v2 changes
Unstable Rust contains an implementation of arbitrary self types based around the Deref<Target=T>
trait. Naturally, that trait also provides a means to create a &T
. Example:
#[feature(arbitrary_self_types)]
struct SmartPtr<T>(*const T);
impl<T: ?Sized> Deref for SmartPtr<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
// never called, but smart pointers need to implement this method
// sometimes it's just not safe to create a reference to self.0
}
}
struct ConcreteType;
impl ConcreteType {
fn some_method(self: SmartPtr<ConcreteType>) {
}
}
fn main() {
let concrete: SmartPtr<ConcreteType> = ...;
concrete.some_method();
}
This works well for some smart pointer types where it's OK to create &T
(but not necessarily &mut T
). This includes Pin
and the reference counted pointers. For that reason, the original arbitrary self types feature could be based around Deref
. But in other smart pointer use-cases (especially those relating to foreign language semantics) it's not OK to create even &T
.
The arbitrary self types feature should be enhanced so it works even when we can't allow &T
. As noted above, that's most commonly because of semantic differences to pointers in other languages, but it might be because references have special meaning or behavior in some pure Rust domain. Either way, it may not be OK to create a Rust reference &T
, yet we may want to allow methods to be called on some reference-like thing.
For this reason, implementing Deref::deref
is problematic for many of the likely users of this "arbitrary self types" feature.
If you're implementing a smart pointer P<T>
, and you need to allow impl T { fn method(self: P<T>) { ... }}
, yet you can't allow a reference &T
to exist, any option for implementing Deref::deref
has drawbacks:
- Specify
Deref::Target=T
and panic inDeref::deref
. Not good. - Specify
Deref::Target=*const T
. This is only possible if your smart pointer type contains a*const T
which you can reference - this isn't the case for (for instance) weak pointers or types containingNonNull
.
Therefore, the current Arbitrary Self Types v2 provides a separate Receiver
trait, so that there's no need to provide an awkward Deref::deref
implementation.
This v2 version has two other differences relative to the existing unstable arbitrary_self_type
feature:
- We won't allow raw pointer receivers, yet. It's highly desirable that we do so in future - this is discussed under the enable for pointers section.
- We will block generic receivers. See the diagnostics section for reasoning.
Aside from these differences, Arbitrary Self Types v2 is similar to the existing unstable arbitrary_self_types
feature.
Guide-level explanation
When declaring a method, users can also declare the type of the self
receiver to be any type T
where T: Receiver<Target = Self>
, in addition to using Self
by value or reference.
The Receiver
trait is simple and only requires specifying the Target
type:
trait Receiver {
type Target: ?Sized;
}
The Receiver
trait is already implemented for many standard library types:
- smart pointers in the standard library:
Rc<Self>
,Arc<Self>
,Box<Self>
, andPin<SomeSmartPtr<Self>>
(and in fact, any type which implementsDeref
) - references:
&Self
and&mut Self
Shorthand exists for references, so that self
with no ascription is of type Self
, &self
is of type &Self
and &mut self
is of type &mut Self
.
All of the following self types are valid:
impl Foo {
fn by_value(self /* self: Self */);
fn by_ref(&self /* self: &Self */);
fn by_ref_mut(&mut self /* self: &mut Self */);
fn by_box(self: Box<Self>);
fn by_rc(self: Rc<Self>);
fn by_custom_ptr(self: CustomPtr<Self>);
}
struct CustomPtr<T>(*const T);
impl<T> Receiver for CustomPtr<T> {
type Target = T;
}
Recursive arbitrary receivers
Receivers are recursive and therefore allowed to be nested. If type T
implements Receiver<Target=U>
, and type U
implements Receiver<Target=Self>
, T
is a valid receiver (and so on outward). This is the behavior for the current special-cased self types (Pin
, Box
etc.), so as we remove the special-casing, we need to retain this property.
For example, this self type is valid:
impl MyType {
fn by_rc_to_box(self: Rc<Box<Self>>) { ... }
}
The Rust language doesn't provide a way for user code to use this recursive property in generics or iteration, so this trait is unlikely to be useful except to the compiler. Nevertheless, we don't intend to prevent use of the Receiver
trait by user code: since the same recursive property applies to Deref
yet it's been occasionally useful to introduce Deref
bounds.
Implementing methods on smart pointers
If your smart pointer type implements Receiver
, you should not add methods to that smart pointer type after its initial creation. As soon as anyone is using your smart pointer type outside of your crate, they may add methods on a contained type; for example:
impl SomeType {
fn do_something(self: your_crate::SmartPointer<SomeType>) {}
}
If you then add SmartPointer::do_something
, this is a conflict, and the compiler will produce an error. It's therefore considered to be a compatibility break to add additional methods to your_crate::SmartPointer
. It's OK to add methods at the outset when you create SmartPointer
, until the point at which other people start using it.
This principle has been followed for the types in Rust's standard library which implement Receiver
; for instance, Box
and Rc
. Mostly they offer associated functions rather than methods.
In the future there might be a deshadowing algorithm that can relax this rule - see the method shadowing section below for discussion.
Reference-level explanation
core
libs changes
The Receiver
trait is made public (removing its #[doc(hidden)])
attribute), exposing it under core::ops
. It gains a Target
associated type.
This trait marks types that can be used as receivers other than the Self
type of an impl or trait definition.
pub trait Receiver {
type Target: ?Sized;
}
A blanket implementation is provided for any type that implements Deref
:
impl<P: ?Sized> Receiver for P
where
P: Deref,
{
type Target = <P as Deref>::Target;
}
(See alternatives for discussion of the tradeoffs here.)
It is also implemented for &T
and &mut T
.
Compiler changes: method probing
The existing Rust reference section for method calls describes the algorithm for assembling method call candidates, and there's more detail in the rustc dev guide.
The key part of the first page is this:
The first step is to build a list of candidate receiver types. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate
T
, add&T
and&mut T
to the list immediately afterT
.
Then, for each candidate type
T
, search for a visible method with a receiver of that type in the following places:
T
's inherent methods (methods implemented directly onT
). Any of the methods provided by a visible trait implemented byT
.
We'll call this second list the candidate methods.
With this RFC, the candidate receiver types are assembled the same way - nothing changes. But, the candidate methods are assembled in a different way. Specifically, instead of iterating the candidate receiver types, we assemble a new list of types by following the chain of Receiver
implementations. As Receiver
is implemented for all types that implement Deref
, this may be the same list or a longer list. Aside from following a different trait, the list is assembled the same way, including the insertion of equivalent reference types.
We then search each type for inherent methods or trait methods in the existing fashion - the only change is that we search a potentially longer list of types.
It's particularly important to emphasize also that the list of candidate receiver types does not change. But, a wider set of locations is searched for methods with those receiver types.
For instance, suppose SmartPtr<T>
implements Receiver
but not Deref
. Imagine you have let t: SmartPtr<SomeStruct> = /* obtain */; t.some_method();
. We will now search impl SomeStruct {}
blocks for an implementation of fn some_method(self: SmartPtr<SomeStruct>)
, fn some_method(self: &SmartPtr<SomeStruct>)
, etc. The possible self types in the method call expression are unchanged - they're still obtained by searching the Deref
chain for t
- but we'll look in more places for methods with those valid self
types.
Compiler changes: deshadowing
The major functional change to the compiler is described above, but a couple of extra adjustments are necessary to avoid future compatibility breaks by method shadowing.
Specifically, that page also states:
If this results in multiple possible candidates, then it is an error, and the receiver must be converted to an appropriate receiver type to make the method call.
With arbitrary self types v2, the compiler will actively search for additional conflicts in order to produce this error in more cases. Specifically, it will consider whether autoreffed candidates conflict with by-value candidates, in order to produce an error in situations like this:
struct Foo;
struct SmartPtr<T>(T): // implements Receiver
impl<T> SmartPtr<T> {
fn a(&self) {} // by reference
}
impl Foo {
fn a(self: SmartPtr<Self>) {} // by value
}
fn main() {
let a = SmartPtr(Foo);
a.a(); // produces an error
}
To be precise, the compiler will:
- Search for the best by-value pick
- Search for the best autoreffed pick
- Search for the best autorefmut pick
- For each pair from the above list, consider the first to be the 'shadowing' pick and the second to be the 'shadowed' pick. Show an error if:
- The same number of autoderefs has been applied (confirming the
self
type is identical, aside from any autoreffing) - One is further along the chain of
Receiver
than another (confirms that it's arbitrary self types causing the conflcit) - The shadowing pick is an inherent impl (we are concerned about the case that a smart pointer is adding inherent methods shadowing inner types, not cases where traits bring further methods into play)
- The picks don't refer to the same resulting item (which could happen with things like blanket impls for any type)
- The same number of autoderefs has been applied (confirming the
- Otherwise, choose the pick in order of by-value, autoreffered, autorefmut, or const ptr as it does now.
Aside from production of errors in more cases, there is no change to method picking here. That said, the production of errors requires us to interrogate more candidates to look for potential conflicts, so this could have a compile-time performance penalty which we should measure.
(The current reference doesn't describe it, but the current algorithm also searches for method receivers of type *const Self
and handles them explicitly in case the receiver type was *mut Self
. We do not check for cases where a new self: *mut Self
method on an outer type might shadow an existing self: *const SomePtr<Self>
method on an inner type. Although this is a theoretical risk, such compatibility breaks should be easy to avoid because self: *mut Self
are rare. It's not readily possible to produce errors in these cases, because we already intentionally shadow *const::cast
with *mut::cast
.)
Object safety
Receivers are object safe if they implement the (unstable) core::ops::DispatchFromDyn
trait.
As not all receivers might want to permit object safety or are unable to support it, object safety should remain being encoded in a different trait than the here proposed Receiver
trait, likely DispatchFromDyn
.
This RFC does not propose any changes to DispatchFromDyn
. Since DispatchFromDyn
is unstable at the moment, object-safe receivers might be delayed until DispatchFromDyn
is stabilized. Receiver
is not blocked on further DispatchFromDyn
work, since non-object-safe receivers already cover a big chunk of the use-cases.
It's been proposed that, instead of DispatchFromDyn
, a #[derive(SmartPointer)]
mechanism may be stabilized instead. Again, this doesn't block our work on Receiver
. There are some use cases for Receiver
that won't suit either DispatchFromDyn
nor #[derive(SmartPointer)]
, most notably the Rust for Linux Wrapper
type described here.
Lifetime elision
Arbitrary self
parameters may involve lifetimes.
Even in existing stable Rust, there are bugs in lifetime elision for complex Self
types such as &Box<Self>
. We're aiming to fix them whether or not this RFC is accepted. The net rules will be:
- If a parameter is the first parameter, and
- Called
self
, and - Its type involves
Self
anywhere, and - Its type contains exactly one lifetime anywhere
then that lifetime may be used to elide lifetimes on return types, and will take precedence over any lifetimes in other parameters.
If this seems wrong, please discuss this over on the linked bug rather than here in this RFC, because none of that should change with this RFC (though it does make it more likely users will run into the current inconsistencies). We'll try to keep this RFC up to date with the outcome of those discussions.
Diagnostics
The existing branches in the compiler for "arbitrary self types" already emit excellent diagnostics. We will largely re-use them, with the following improvements:
- In the case where a self type is invalid because it doesn't implement
Receiver
, the existing excellent error message will be updated. - An easy mistake is to implement
Receiver
forP<T>
, forgetting to specifyT: ?Sized
.P<Self>
then only works as aself
parameter in traitswhere Self: Sized
, an unusual stipulation. It's not obvious thatSized
ness is the problem here, so we will identify this case specifically and produce an error giving that hint. - There are certain types which feel like they "should" implement
Receiver
but do not:Weak
andNotNull
. If these are encountered as a self type, we should produce a specific diagnostic explaining that they do not implementReceiver
and suggesting that they could be wrapped in a newtype wrapper if method calls are important. We hope this can be achieved with diagnostic items. - The current unstable arbitrary self types feature allows generic receivers. For instance,
We don't know a use-case for this. There are several cases where this can result in misleading diagnostics. (For instance, if such a method is called with an incorrect type (for exampleimpl Foo { fn a<R: Deref<Target=Self>>(self: R) { } }
smart_ptr.a::<&Foo>()
instead ofsmart_ptr.a::<Foo>()
). We could attempt to find and fix all those cases. However, we feel that generic receiver types might risk subtle interactions with method resolutions and other parts of the language. We think it is a safer choice to generate an error on any declaration of a genericself
type. - As noted in the section about compiler changes for deshadowing we will produce a "multiple method candidates" error if a method in an inner type is chosen in preference to a method in an outer type ("inner" = further along the
Receiver
chain) and the inner type is eitherself: &T
orself: &mut T
and we're choosing it in preference toself: T
orself: &T
in the outer type.
Drawbacks
Why should we not do this?
- Deref coercions can already be confusing and unexpected. Adding a new
Receiver
trait could cause similar confusion. - Custom smart pointers are a niche use case (but they're very important for cross-language interoperability.)
Method shadowing
For a smart pointer P<T>
that implements Deref<Target = T>
, a method call p.m()
might call a method P::m
on the smart pointer type itself, or it might call T::m
. If both methods are declared, this results in an error.
Rust standard library smart pointers are designed with this shadowing behavior in mind:
Box
,Pin
,Rc
andArc
heavily use associated functions rather than methods.- Where they use methods, it's often with the intention of shadowing a method in the inner type (e.g.
Arc::clone
).
Furthermore, the Deref
trait itself documents this possible compatibility hazard, and the Rust API Guidelines has a guideline about avoiding inherent methods on smart pointers.
This RFC does not make things worse for types that implement Deref
.
However, this RFC allow types to implement Receiver
. This would run the risk of breakage:
struct Concrete;
impl Concrete {
fn wardrobe(self: SmartPointerWhichImplementsReceiver<Self>) { }
}
fn main() {
let concrete: SmartPointerWhichImplementsReceiver<Concrete> = /* obtain */;
concrete.wardrobe()
}
If SmartPointerWhichImplementsReceiver
now adds SmartPointerWhichImplementsReceiver::wardrobe(self)
, the above valid code would start to error.
The same would apply in this slightly different circumstance:
struct Concrete;
impl Concrete {
fn wardrobe(self: &SmartPointerWhichImplementsReceiver<Self>) { } // this is now a reference
}
fn main() {
let concrete: SmartPointerWhichImplementsReceiver<Concrete> = /* obtain */;
concrete.wardrobe()
}
If Rust added SmartPointerWhichImplementsReceiver::wardrobe(&self)
we would start to produce an error here. If SmartPointerWhichImplementsReceiver
added SmartPointerWhichImplementsReceiver::wardrobe(self)
then it would be
even worse - code would start to call SmartPointerWhichImplementsReceiver::wardrobe
where it had previously called SmartPointerWhichImplementsReceiver::wardrobe
.
The deshadowing section of the compiler changes, describes how we avoid this. The compiler will take pains to identify any such ambiguities and it will show an error.
We have (extensively) considered algorithms to pick the intended method instead - see picking the shadowed method, below.
Rationale and alternatives
As this feature has been cooking since 2017, many alternative implementations have been discussed.
Deref-based
As noted in the rationale section, the currently nightly implementation implements arbitrary self types using the Deref
trait.
No blanket implementation for Deref
Another major approach previously discussed is to have a Receiver
trait, as proposed in this RFC, but without a blanket implementation for T: Deref
. Blanket implementations are unusual for core Rust traits, but the authors of this RFC believe it's necessary in this case.
Specifically, this RFC proposes that the existing method search algorithm is modified to search the Receiver
chain instead of the Deref
chain.
It's therefore a major compatibility break if existing Deref
implementors cease to be usable as self
parameters. Just in the standard library, we'd have to add Receiver
implementations for Cow
, Ref
, ManuallyDrop
and possibly many other existing implementors of Deref
: third party libraries would have to do the same. Without that, method calls on these types would not be possible:
fn main() {
let ref_cell = RefCell::new(/* something cloneable */);
ref_cell.borrow().clone(); // no longer possible if:
// 1) we cease to explore Deref in identifying method candidates
// 2) Ref doesn't implement Receiver.
}
This doesn't just break people previously using the unstable Rust arbitrary_self_type
feature; it breaks stable Rust usages as well. Obviously this is not acceptable, so we believe the blanket implementation is necessary.
In any case, we think a blanket implementation is desirable:
- It prevents
Deref
andReceiver
having differentTarget
s. That could possible lead to confusion if it prompted the compiler to explore different chains for these two different purposes. - If smart pointer type
P<T>
is in a crate, users ofP
to createP<MyConcreteType>
will be able to use it as aself
type forMyConcreteType
without waiting for a new release of theP
crate.
We found that some crates use Deref
to express an is-a not a has-a relationship and so, ideally, might have preferred the option of setting up Deref
and self
candidacy separately. But, on discussion, we concluded that traits would be a better way to model those relationships.
Explore both Receiver
and Deref
chains while identifying method candidates
We could modify the method search algorithm to explore both Deref
and Receiver
targets when identifying method candidates. This would avoid breaking compatibility, yet would give the desired flexibility for folks who wish to implement Receiver
but not Deref
.
We don't think this is such a good option because:
- It's more confusing for users;
- It could lead to a worst-case O(n^2) number of method candidates to explore (though possibly this could be limited to O(2n) if we added restrictions);
- It's a more invasive change to the compiler;
- We don't know of any use-cases which the
Receiver<Target=T>
and blanket implementation forDeref
do not allow.
If some use-case presents itself where a type must implement Deref
but not Receiver
; or a use-case presents itself where Deref
and Receiver
must have different Target
s then we will have to consider this more complex option.
Generic parameter
Change the trait definition to have a generic parameter instead of an associated type. There might be permutations here which could allow a single smart pointer type to dispatch method calls to multiple possible receivers - but this would add complexity, no known use case exists, and it might cause worst-case O(n^2) performance on method lookup.
Enable for raw pointers (or Weak
or NonNull
)
This RFC, unlike the original Arbitrary Self Types nightly feature, does not allow raw pointer self
types. We are led to believe that raw pointer receivers are quite important for the future of safe Rust, because stacked borrows makes it illegal to materialize references in many positions, and there are a lot of operations (like going from a raw pointer to a raw pointer to a field) where users don't need to or want to do that.
On the other hand, we don't want to encourage the use of raw pointers, and would prefer rather that raw pointers are wrapped in a custom smart pointer that encodes and documents the invariants.
The main problem, though, is that raw pointers have methods and Rust wants to add more methods to them in future - especially around pointer provenance. As noted in the deshadowing section, we would start to generate errors in arbitrary crates if ever we added such additional methods to raw pointers. That's clearly not OK. So, to add support for raw pointers as self types, we'd need to use a cleverer deshadowing algorithm. This is discussed in the next section, but overall has been judged to be too complicated for now.
Instead, this version of Arbitrary Self Types is as conservative as possible, such that we ought to be able to adopt such an algorithm in a future enhancement.
Pick shadowed methods instead of erroring
As explained in the deshadowing section, the Rust compiler will generate errors in case of a conflict between a method on a smart pointer and an inner type. For example:
struct Foo;
struct SmartPtr<T>(T): // implements Receiver
impl<T> SmartPtr<T> {
fn a(self) {}
}
impl Foo {
fn a(self: SmartPtr<Self>) {}
}
fn main() {
let a = SmartPtr(Foo);
a.a(); // produces an error
}
There has been extensive discussion (and prototyping) about cleverer "deshadowing" algorithms here. The current leading contender is to:
- If there are conflicts,
- Always pick the "inner" method;
- Show a warning, and ask the user to disambiguate using UFC syntax (or future alternatives).
The rationale is that the author of the "inner" method is always aware of pre-existing methods on the "outer" (smart pointer) type. If a conflict arises, this means that the new method was added to the outer type, and therefore Rust can maintain existing behavior by picking the method on the inner type. (This logic falls down in the case of race conditions as crates are published, but it's broadly true.) This logic is believed to be sound, but it's counterintuitive: in all other circumstances Rust method probing works outside-in. This algorithm is also quite complex, and there's a risk of unknown unknowns.
There has also been some discussion about broader changes to method resolution in future, for example a crate-by-crate approach or even a name-resolution.lock
file.
The decision has been taken, then, to restrict the current RFC to the most conserative possible version - one which errors on any conflicts, and firmly advises the creators of smart pointers to avoid adding new methods. This gives us maximum flexibility in future to allow more possibilities by relaxing some of those errors to warnings. This is a high priority primarily because of the desire to allow method calls on raw pointers (see the previous section).
Not do it
As always there is the option to not do this. But this feature already kind of half-exists (we are talking about Box
, Pin
etc.) and it makes a lot of sense to also take the last step and therefore enable non-libstd types to be used as self types.
There is the option of using traits to fill a similar role, e.g.
trait ForeignLanguageRef {
type Pointee;
fn read(&self) -> *const Self::Pointee;
fn write(&mut self, value: *const Self::Pointee);
}
// --------------------------------------------------------
struct ConcreteForeignLanguageRef<T>(T);
impl<T> ForeignLanguageRef for ConcreteForeignLanguageRef<T> {
type Pointee = T;
fn read(&self) -> *const Self::Pointee {
todo!()
}
fn write(&mut self, _value: *const Self::Pointee) {
todo!()
}
}
// --------------------------------------------------------
struct SomeForeignLanguageType;
impl ConcreteForeignLanguageRef<SomeForeignLanguageType> {
fn m(&self) {
todo!()
}
}
trait Tr {
type RustType;
fn tm(self)
where
Self: ForeignLanguageRef<Pointee = Self::RustType>;
}
impl Tr for ConcreteForeignLanguageRef<SomeForeignLanguageType> {
type RustType = SomeForeignLanguageType;
fn tm(self) {}
}
fn main() {
let a = ConcreteForeignLanguageRef(SomeForeignLanguageType);
a.m();
a.tm();
}
This successfully allows method calls to m()
and even tm()
without a reference to a SomeForeignLanguageType
ever existing. However, due to the orphan rule, this forces every crate to have its own equivalent of ConcreteForeignLanguageRef
. This workaround has been used by some interop tools, but use across multiple crates requires many generic parameters (impl ForeignLanguageRef<Pointee=SomeForeignLanguageType>
).
Always use unsafe
when interacting with other languages
One main motivation here is cross-language interoperability. As noted in the rationale, C++ references can't be safely represented by Rust references. Many would say that all C++ interop is intrinsically unsafe and that unsafe
blocks are required. Maybe true: but that just moves the problem - an unsafe
block requires a human to assert preconditions are met, e.g. that there are no other C++ pointers to the same data. But those preconditions are almost never true, because other languages don't have those rules. This means that a C++ reference can never be a Rust reference, because neither human nor computer can promise things that aren't true.
Only in the very simplest interop scenarios can we claim that a human could audit all the C++ code to eliminate the risk of other pointers existing. In complex projects, that's not possible.
However, a C++ reference can be passed through Rust safely as an opaque token such that method calls can be performed on it. Those method calls actually happen back in the C++ domain where aliasing and concurrent modification are permitted.
For instance,
struct ForeignLanguageRef<T>;
fn main() {
let some_foreign_language_reference: ForeignLanguageRef<_> = CallSomeForeignLanguageFunctionToGetAReference();
// There may be other foreign language references to the referent, with concurrent
// modification, so some_foreign_language_reference can't be a &T
// But we still want to be able to do this
some_foreign_language_reference.SomeForeignLanguageMethod(); // executes in the foreign language. Data is not
// dereferenced at all in Rust.
}
Even if the reader takes the view that all calls into foreign languages are intrinsically unsafe and must be marked as such, hopefully the reader would support building abstractions using the Rust type system to minimize the practical risk of undefined behavior. That's what this RFC aims to enable.
Prior art
A previous PR based on the Deref
alternative has been proposed before https://github.com/rust-lang/rfcs/pull/2362 and was postponed with the expectation that the lang team would get back to arbitrary_self_types
eventually.
Future work
As discussed above we anticipate a future version which will relax some errors into warnings, and thus allow us to add support for raw pointers, Weak
and NonNull
as self types.
Thereafter, we could consider implementing Receiver
for other types, e.g. std::cell
types, std::sync
types, std::cmp::Reverse
, std::num::Wrapping
, std::mem::MaybeUninit
, std::task::Poll
, and so on - possibly even for arrays, etc.
There seems to be no disadvantage to doing this - taking Cell
as an example, it would only have any effect on the behavior of code if somebody implemented a method taking Cell<T>
as a receiver. On the other hand, it's hard to imagine use-cases for some of these. For now, though, we should clearly restrict Receiver
to those types for which there's a demonstrated need.
Feature gates
This RFC is in an unusual position regarding feature gates. There are two existing gates:
arbitrary_self_types
enables, roughly, the semantics we're proposing, albeit in a different way. It has been used by various projects.receiver_trait
enables the specific trait we propose to use, albeit without theTarget
associated type. It has only been used within the Rust standard library, as far as we know.
Although we presumably have no obligation to maintain compatibility for users of the unstable arbitrary_self_types
feature, we should consider the least disruptive way to introduce this feature.
The plan is:
- the
receiver_trait
gate continues to control the existingReceiver
trait used solely within the standard library, which is renamed toLegacyReceiver
orFixedReceiver
or something (and will be removed assuming we stabilize this feature) arbitrary_self_types
comes to control the new behavior, with a newReceiver
trait containing aTarget
associated type. As noted, this does not include raw pointers, though we hope to find a way to stabilize this in a future RFC.- Add a new
arbitrary_self_types_pointers
feature gate which retains support for raw pointers.
Summary
This RFC is an example of replacing special casing aka. compiler magic with clear and transparent definitions. We believe this is a good thing and should be done whenever possible.