Address feedback and questions

As this RFC was reviewed in the GitHub thread, many alternatives were
proposed and questions raised.  As those of us who were a bit too
close to it were cursed with knowledge, the rationale for rejecting
these alternatives were not fully articulated and not all of these
questions were clearly answered.

Let's better document these alternatives, the rationale for rejecting
each, and the answers to various known questions.

(Thanks to GoldsteinE, madsmtm, kennytm, samih, and tmccombs for
raising these alternatives and questions.)
This commit is contained in:
Travis Cross 2024-05-06 23:02:08 +00:00
parent b423b2bfff
commit 09a088c113
1 changed files with 194 additions and 3 deletions

View File

@ -96,11 +96,198 @@ Extern statics can be either immutable or mutable just like statics outside of e
This change will induce some churn. Hopefully, allowing people to safely call some foreign functions will make up for that.
# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
# Rationale
[rationale]: #rationale
Incorrect extern declarations can cause UB in current Rust, but we have no way to automatically check that all declarations are correct, nor is such a thing likely to be developed. Making the declarations `unsafe` so that programmers are aware of the dangers and can give extern blocks the attention they deserve is the minimum step.
# Alternatives
[alternatives]: #alternatives
## Don't prefix `extern` with `unsafe`
One could ask, why not allow each item within an `extern` block to be prefixed with either `safe` or `unsafe`, but do not prefix `extern` with `unsafe`? E.g.:
```rust
extern {
pub safe fn sqrt(x: f64) -> f64;
pub unsafe fn strlen(p: *const c_char) -> usize;
}
```
Here's the problem with this. The programmer is asserting that these signatures are correct, but this assertion cannot be checked by the compiler. The human must simply get these correct, and if that person doesn't, then calling either of these functions, even the one marked `safe`, may result in undefined behavior. In fact, as explained in the motivation, due to current LLVM behavior, simply *writing* this `extern` block with incorrect signatures can lead to undefined behavior in the resulting program, even if these functions are never called by Rust code.
In Rust, we use `unsafe { .. }` (and, as of [RFC 3325][], `unsafe(..)`) to indicate that what is enclosed must be proven correct by the programmer to avoid undefined behavior. This RFC extends this pattern to `extern` blocks.
[RFC 3325]: https://github.com/rust-lang/rfcs/pull/3325
## Don't prefix `extern` with `unsafe` and support `unsafe` items only
One could ask, why not support only `unsafe` items within `extern` blocks and then don't require those blocks to be marked `unsafe`? E.g.:
```rust
extern {
pub unsafe fn sqrt(x: f64) -> f64;
pub unsafe fn strlen(p: *const c_char) -> usize;
}
```
One could argue that, since an `unsafe { .. }` block must be used to call either of these functions, that this is OK.
There are three problems with this.
One, as mentioned above, simply *writing* this `extern` block, if the signatures are incorrect, can cause undefined behavior in the resulting program, even if these functions are never called by Rust code. Saying `unsafe` in the signature of each item only indicates that the caller must uphold certain unchecked invariants; it does not correctly capture the semantics of this kind of unsafety.
Two, we have to think about *who* is responsible for discharging the obligation of ensuring that these signatures are correct. Is it the responsibility of a *caller* to these functions to ensure the signatures are correct? That would seem unreasonable. So even though the caller has to write `unsafe { .. }` to call these functions, and even setting aside the issue that calling the functions from Rust is not required to produce undefined behavior, this suggests that the `extern` *itself* should be somehow marked with or wrapped in `unsafe`.
Three, not allowing items to be marked as `safe` would remove one of the key tangible *benefits* that the changes in this RFC provide to users. This would reduce the motivation to make this change at all.
## Prefix only `extern` with `safe` or `unsafe`
One could ask, who not prefix *only* `extern` with `safe` or `unsafe`? E.g.:
```rust
safe extern {
pub fn sqrt(x: f64) -> f64;
}
unsafe extern {
pub fn strlen(p: *const c_char) -> usize;
}
```
The problem with this, as explained in the last two sections, is that the person who writes the `extern` block must discharge an unchecked obligation of proving that the signatures are correct. This must be proven by the programmer even for the `sqrt` function. One purpose of this RFC is to flag this obligation with `unsafe`. This variation would fail to do that.
## Wrap `extern` in `unsafe { .. }`
Semantically, what we're trying to express is probably most precisely represented by syntax such as:
```rust
unsafe { extern {
pub safe fn sqrt(x: f64) -> f64;
pub unsafe fn strlen(p: *const c_char) -> usize;
}}
```
However, we currently don't support `unsafe { .. }` blocks at the item level, and the extra set of braces and indentation would seem unfortunate here. One way to think of `unsafe extern { .. }` is exactly as above, but with the braces elided.
## Don't add the `safe` contextual keyword, flip the default
One could ask, why include the `safe` contextual keyword at all? Why not just *assume* that within an `unsafe extern` block that items not marked as `unsafe` are in fact safe to call (as is true elsewhere in Rust)? E.g.:
```rust
unsafe extern {
pub fn sqrt(x: f64) -> f64; // Safe to call.
pub unsafe fn strlen(p: *const c_char) -> usize;
}
```
This was in fact the original proposal. The reason we did not end up adopting this was to reduce the churn that users would experience and to make the transition more incremental.
Consider that all `extern` blocks today look like this:
```rust
extern {
pub fn sqrt(x: f64) -> f64; // Unsafe to call.
pub fn strlen(p: *const c_char) -> usize; // Unsafe to call.
// Many more items follow...
}
```
We want users to be able to adopt the new syntax by changing just one line, e.g.:
```rust
unsafe extern { // <--- We added `unsafe` here.
pub fn sqrt(x: f64) -> f64; // Unsafe to call.
pub fn strlen(p: *const c_char) -> usize; // Unsafe to call.
// Many more items follow...
}
```
If we had made it so that writing `unsafe extern` flipped the default and made each item safe to call, then users would, upon making this change, have to simultaneously examine each item to determine whether it should be safe to call (or, at least, would have to conservatively add `unsafe` to each item). We wanted to avoid this.
Still, the user gets immediate *benefit* out of this change, because the user can now *incrementally* mark items as safe to call, e.g.:
```rust
unsafe extern {
pub safe fn sqrt(x: f64) -> f64; // <--- We added `safe` here.
pub fn strlen(p: *const c_char) -> usize; // Unsafe to call.
// Many more items follow...
}
```
We may or may not, in a later edition, decide to switch the default and thereby make the `safe` contextual keyword redundant. Either way, adding the `safe` keyword makes the migration more straightforward while delivering value to users and better indicating where users must make a correctness assertion to the compiler.
## Don't add the `safe` contextual keyword, keep the default
One could ask, why not allow but not require items within an `unsafe extern` block to be prefixed with `unsafe`, but not support prefixing items with `safe`, and treat items not prefixed as `unsafe`? E.g.:
```rust
unsafe extern {
pub fn sqrt(x: f64) -> f64; // Unsafe to call.
pub unsafe fn strlen(p: *const c_char) -> usize;
}
```
Doing this would eliminate one of the key tangible benefits of this RFC, which is allowing users to express that an item declared within an `unsafe extern` block is in fact sound to use directly in safe code.
While we could, in a later edition, perhaps flip the default to make items safe to call, we could only do that if enough code has already been migrated. But in the interim, we'd be asking users to accept the churn of migrating to this syntax without receiving any of the benefits. That seems a bit like a cyclic dependency, so we've chosen not to do that.
## Require all items to be marked as either `safe` or `unsafe`
One could ask, why not require all items within an `unsafe extern` block to be marked as either `safe` or `unsafe` rather than making this optional? Or alternatively, one could ask, why not *only* allow items to be marked as `unsafe` and *require* that all items be marked `unsafe`?
As described in the last section, doing this would lead to a worse migration story for users, and so we chose not to do this.
## Wait until we switch to `safe { .. }` blocks
One could ask, why not wait to do this at all until we switch the language to use `safe { .. }` rather than `unsafe { .. }` blocks and then align this RFC with that?
The problem with this is that there is no current plan to make such a switch. Waiting to improve the language on a possibility that may or may not happen -- and in any case, will not happen soon -- is usually not a good plan.
## Use `trusted` as the contextual keyword
One could ask, why not use `trusted` rather than `safe` as the contextual keyword? E.g.:
```rust
unsafe extern {
pub trusted fn sqrt(x: f64) -> f64; // Safe to call.
pub unsafe fn strlen(p: *const c_char) -> usize;
}
```
The Rust language already has an accepted semantic for "safe" and "unsafe". If we were to introduce a separated "trusted" concept, that would need to be part of a larger plan. Such a plan does not yet exist in any concrete form, and it's not clear at this point whether any plan along these lines will succeed in gaining consensus. Waiting to deliver value here on that possibility seems like a bad plan.
If we later decide, e.g., to replace all uses of `unsafe { .. }` with `trusted { .. }`, large amounts of code would need to be changed in that migration. Changing from `safe fn` to `trusted fn` as part of that, as this RFC would require, doesn't seem that it would make that migration markedly more painful.
## Fire the `unsafe_code` lint for `extern` blocks also
This RFC specifies that the `unsafe_code` lint will fire for `unsafe extern` but not for `extern` blocks. One could ask, why not fire this for `extern` blocks also?
The problem with doing this is that it may be very noisy. We're careful when expanding the meaning of existing lints to not create too much noise, and doing this, at least immediately, may run afoul of this.
# Questions and answers
[q-and-a]: #q-and-a
## Why do we want to mark `extern` blocks as `unsafe`?
In *safe* Rust, we want the compiler to *prove* that all code is *sound* and therefore cannot exhibit undefined behavior. However, for some things, the compiler cannot complete this proof without help from the programmer. When the programmer must make assertions that cannot be checked by the compiler to preserve soundness, we call this *unsafe* Rust. We use the `unsafe` keyword to designate places where the programmer has this proof obligation.
In the past, `extern` blocks have been an exception to this. Programmers are required to prove that these blocks are correct, and the compiler has no way of checking this, but we had yet not thought to write `unsafe` here. This RFC closes that gap.
## Is adding this feature going to break people's existing code on existing editions?
No. Rust has a stability guarantee that is outlined in [RFC 1122][]. Adding this feature does not break any existing code on existing editions when updating to newer versions of the Rust compiler.
[RFC 1122]: https://github.com/rust-lang/rfcs/pull/1122
## Will `extern` blocks not marked `unsafe extern` fire the `unsafe_code` lint?
No. This RFC specifies that `unsafe extern` blocks will fire this lint. There are no such blocks in the ecosystem today, so people who have `#![forbid(unsafe_code)]` will only newly encounter this lint when switching a block from `extern` to `unsafe extern`.
## Does this RFC require all items in an `unsafe extern` block to be marked `safe` or `unsafe`?
No. This RFC allows for items within an `unsafe extern` block to not be marked with either of `safe` or `unsafe`. Items that are not marked in either way are assumed to be `unsafe`.
# Prior art
[prior-art]: #prior-art
@ -114,4 +301,8 @@ None we are aware of.
# Future possibilities
[future-possibilities]: #future-possibilities
None are apparent at this time.
## Interaction with extern types
If we were to later accept [RFC 3396][] ("Extern types v2"), that would introduce `type` items into `extern` blocks, and the interaction between those items, this RFC, and the `unsafe_code` lint would need to be addressed.
[RFC 3396]: https://github.com/rust-lang/rfcs/pull/3396