25 KiB
- Feature Name:
offset_of
- Start Date: 2022-08-29
- RFC PR: rust-lang/rfcs#3308
- Rust Issue: rust-lang/rust#106655
Summary
Introduce a new macro core::mem::offset_of!
, which evaluates to a constant
containing the offset in bytes of a field inside some type.
Specifically, this RFC allows usage like the following:
use core::mem::offset_of;
const EXAMPLES: &[usize] = &[
offset_of!(Struct, b),
offset_of!(TupleStruct, 0),
offset_of!(Union, y),
offset_of!((i32, u32), 1),
offset_of!(inner::SubmodGeneric<i32>, pub_field),
];
struct Struct { a: u64, b: &'static str }
struct TupleStruct(u8, i32);
union Union { x: u8, y: u64 }
mod inner {
pub struct SubmodAndGeneric<T> {
private_field: T,
pub pub_field: u8,
}
}
Motivation
Type layout information is very frequently needed in low level code, especially if it's performing serialization, FFI, or implementing a data structure.
While often the needed information is limited to the size and required alignment of a given type, sometimes there is a need to access information about the fields of a type, most commonly (and most fundamentally) the offset (in bytes), at which the field may be found in the type which contains it.
Currently, Rust's standard library provides good explicit APIs for providing
information about the size and alignment of a given type (specifically,
core::mem
has size_of
, align_of
, size_of_val
, and align_of_val
).
Unfortunately, it provides none for determining field-offset, leaving it to be
computed based on implicitly-provided layout information.
This is an unfortunate gap, one we've seen countless workarounds for, which have caused no end of trouble in the ecosystem. The problem is that while recovering layout information in this manner is completely possible in rust (recovering the size and alignment would even be possible using the same technique), doing it correctly is very subtle. Most of the implementations which seem obvious are actually wrong, often because they invoke undefined behavior.
Unfortunately, this also means they often tend to work at first, but have a risk to be something of a "ticking time-bomb", which may break in a future release of Rust or LLVM.
This is not a theoretical concern, and widespread breakage of incorrect
offset_of
implementations has happened in the past (e.g. when mem::zeroed
started performing validity checks), and may happen again (e.g. the
deref_nullptr
lint revealed large bodies of code with incorrect
implementations).
Unfortunately, previously there's not been great alternative. Generally, the recommendation users are given is to either:
- Use a crate, for example
memoffset
andbytemuck
both haveoffset_of!
implementations. - Hardcode the constant.
Both of which have several downsides, but even if the operation can be flawlessly performed by library code, it's the opinion of the author of this RFC that this operation is fundamental enough that at a minimum, that the standard library should provide the implementation.
Guide-level explanation
In low level code, you may find you need to know the byte offset of a field
within a type. This can be accomplished with the core::mem::offset_of!
macro.
core::mem::offset_of!
takes two arguments, the type that holds the field, and
the name of the field. For example, if you have:
#[repr(C)]
struct Vertex {
tex: [u16; 2],
pos: [f32; 3],
}
Then you can use core::mem::offset_of!(Vertex, tex)
to get the offset in bytes
where tex
begins, and core::mem::offset_of!(Vertex, pos)
to get the offset
in bytes where pos
begins.
In this example, we also specified the layout algorithm to use, so we know that
offset_of!(Vertex, tex)
will be 0, and offset_of!(Vertex, pos)
will be 4.
However, if a #[repr(...)]
is not used, the compiler is free to place the
fields of Vertex
in whatever order it prefers (even if they aren't the same as
the order the fields are written in the struct declaration), so there's no way
to know in advance what the positions of the fields will be.
Thankfully, offset_of!
is still usable here:
// No `#[repr()]` needed!
struct Vertex {
tex: [u16; 2],
pos: [f32; 3],
}
// This time let's define some constants containing the offset value,
// which can be more readable if you need to use them several times.
const OFFSET_VERTEX_TEX: usize = core::mem::offset_of!(Vertex, tex);
const OFFSET_VERTEX_POS: usize = core::mem::offset_of!(Vertex, pos);
As you can see, the usage is the same as before, but because we didn't specify
#[repr(C)]
, compiler may have changed the order or position, so the values may
be different -- it's completely possible that pos
is located at offset 0, for
example! Thankfully, by using core::mem::offset_of!
, this code is correct
either way, and will continue to be correct, even if the layout algorithm
changes in the future.
offset_of!
On Other Types
If your type doesn't have named fields, offset_of!
can still be used. For
tuples and tuple structs, the "name" of the field is the numeral value you use
to access it. For example:
// Works with a tuple struct
struct KeyVal(&'static str, Vec<u8>);
const OFFSET_KV_KEY: usize = core::mem::offset_of!(KeyVal, 0);
const OFFSET_KV_VAL: usize = core::mem::offset_of!(KeyVal, 1);
// Or with an anonymous tuple.
const OFFSET_ANON_KEY: usize = core::mem::offset_of!((&'static str, Vec<u8>), 0);
const OFFSET_ANON_VAL: usize = core::mem::offset_of!((&'static str, Vec<u8>), 1);
Finally, offset_of!
can be used to compute the offset of fields in unions too.
While this may be surprising, the compiler is allowed to put padding in front of
fields in unions which are not #[repr(C)]
, which would lead to a non-zero
field offset.
use core::mem::offset_of;
union Buffer {
metadata: [u64; 3],
datadata: [u8; 1024 * 1024 * 32],
}
const METADATA_OFFSET: usize = offset_of!(Buffer, metadata);
Limitations
There are a few limitations worth mentioning. Some of these may be relaxed in the future, however.
-
Perhaps unsurprisingly, it obeys privacy, so both the type and field you call
offset_of!
on must be visible to the code callingoffset_of!
. -
The type holding the field must be
Sized
, so trying to compute where the slice begins in something likeoffset_of!((i32, [u32]), 1)
isn't supported. -
Compared to
offsetof
in C and C++, you can't access nested fields/arrays. That is, instead ofoffset_of!(Foo, quank.zoop.2.quank[4])
, you'll have to compute the offsets of each step manually, and sum them. -
Finally, types other than tuples, structs, and unions are currently unsupported.
Reference-level explanation
offset_of
is a new macro exported from core::mem
which has a signature
similar to the following:
pub macro offset_of($Container:ty, $field:tt $(,)?) {
// ...implementation defined...
}
Invoking this macro expands to a constant expression of type usize
, which
evaluates to the offset in bytes from the beginning of $Container
where
$field
is found.
$Container
must be visible and must be or resolve to one of the following
types:
-
A
struct
orunion
type with either named or anonymous/tuple-style fields.In this case,
$field
must share a name or tuple index with a field which:- Exists on
$Container
. - Is visible at the location where
offset_of!
is invoked (but there is no requirement that fields other than$field
be visible there)
- Exists on
-
An anonymous tuple type.
In this case,
$field
must be a tuple index (that is, an integer literal) that exists on the tuple type in question.
Use on other types is an error, although this may be relaxed in some cases in the future (see the Future possibilities section).
As a note: the implementation is strongly encouraged to not have runtime
resource usage dependent on the values of $Container
or $field
. In
particular, the implementation should not allocate space for an instance of
$Container
on the runtime stack.
Drawbacks
-
This exposes layout information at compile time which is otherwise not exposed until runtime. This can cause compatibility hazards similar to
mem::size_of
ormem::align_of
, but plausibly greater as it provides even more information.That said, this API allows querying information which (if needed at compile time) would otherwise be hard-coded, so in some cases it may reduce the risk of a compatibility hazard.
-
Similarly, this reduces the amount of dynamism that a Rust implementation could use for
repr(Rust)
types.For example, it forbids a Rust implementation from varying field offsets of
repr(Rust)
types between executions of the same compiled program (for example, by way of interpretation or code modification), unless it also performs modifications to adjust the result ofoffset_of!
(and recompute the values of derived constants, and regenerate relevant code, ...). -
This is a feature most code won't need to use, and it may be confusing to users unfamiliar with low level programming.
Rationale and alternatives
The general rationale is that it should remove the need to hardcode, hand-roll, or pull in a third-party crate in order to compute field offsets. This hopefully should remove as many barriers
That said, there are several alternatives to this, some of which were even considered:
-
Do nothing, and tell users to use the
memoffset
crate, or to hard-code constant offsets.This was not chosen as this operation seems fundamental enough to provided by the standard library, especially given how often it is incorrectly implemented in the wild.
-
Add
offset_of!
, but disallow use on#[repr(Rust)]
types.This would make
core::mem::offset_of!
have less functionality than the implementation frommemoffset
, or the implementation they could implement if they computed it manually.Needing the offset of fields on
#[repr(Rust)]
is not as common, but still useful in code which needs to describe precise details of type layout to some other system, including GPU APIs which accept configurable vertex formats or binary serialization formats that contain descriptions of the field offsets for the record types they contain, etc.It is also useful for implementing field projection as a library feature, as in cases like
field-offset
. -
Require that all fields of
$Container
be visible at the invocation site, rather than just requiring that$field
is.As above, this would make
core::mem::offset_of!
worse than the version they'd have written themselves and/or an off-the-shelf implementation. -
Add
offset_of!
, but disallow use during constant evaluation.This would mean that users which need const access to
offset_of!
must continue to hardcode the field offsets as constants, which is undesirable, error-prone, and can cause compatibility hazards. -
Try to make
addr_of!((*null::<$Container>()).$field) as usize
work for this:Currently this is UB (due to dereferencing a null pointer) and does not support use in const (due to accessing the address of a raw pointer). Changing both of these issues would be challenging, but may be possible.
This was not chosen because seems difficult, would be harder to teach (or read) than
core::mem::offset_of
, and is largely orthogonal to whether or not a dedicated field offset API is provided (in other words, fixing those issues seems unlikely to makeoffset_of!
appear redundant). -
Hold off until this can be integrated into some larger language feature, such as C++-style pointer-to-field, Swift-style field paths, ...
Aside from avoiding scope creep, this wasn't pursued as
offset_of!
does not prevent these in the future, and may not even be solved by them. -
Use
offset_of!($Container::$field)
as the syntax instead.This wasn't chosen because it doesn't really work with tuples, and seems like it may harm the quality of error messages (for example, if a user forgets
::$field
, and doesoffset_of!(crate::path::to::SomeType)
).Additionally, this does not generalize as well to some of the extensions in future work.
-
Expose a high level type-safe API instead, where
offset_of
returns a type with phantom parameters for container and field (for example, see thefield-offset
crate, and the notes on it in the Prior Art section below):This is not pursued for a few reasons:
-
Field projection is just one of several use cases for getting the offset to a field, rather than the only one, or even the most common one. While the other uses could be supported by a function which returns the
usize
, it seems better to push this kind of thing into the ecosystem. -
Add this to the stdlib risks conflicting with or restricting our ability to add a lang feature for field projection and/or pointer-to-member functionality.
None of those are deal-breakers, but it seems better to keep this simple and limited. Such a type-safe API can be implemented on top of a
offset_of!
which returns integers. -
Prior art
There is quite a bit of prior art here, which I've grouped into:
- Crates: Rust libraries that expose similar or equivalent functionality to this proposal.
- Languages: Other languages that provide access to this information either as a language builtin, or via a library.
Prior Art: Crates
Several crates in the ecosystem have offset_of!
implementations.
memoffset
and bytemuck
are probably the two most
popular, and provide this functionality in different ways.
-
The
memoffset
crate provides anoffset_of!
macro very similar to this proposal. It is a fairly straightforward implementation that avoids most pitfalls, although it does allocate an instance of the type on the stack, which can cause stack overflow during debug builds (the compiler removes this in release builds).On nightly, if the
unstable_const
cargo feature is enabled,memoffset::offset_of!
may be used during constant evaluation. -
The
bytemuck
crate has anoffset_of!
implementation which differs from the one inmemoffset
in that it takes three arguments, where the first is an existing instance of the type (or, due to a quirk in how it is implemented, a reference to one).This is intended to allow an implementation that does not require
unsafe
(as it was added in a time when it was unclear how to provide a soundoffset_of!
).Somewhat interestingly, this first parameter may be used to avoid a large stack allocation by providing a reference to a const/static in this first parameter (for example as
bytemuck::offset_of!(&SOME_STATIC, SomeTy, field)
).It does not support use during constant evaluation.
-
The
field-offset
crate provides a higher level type-safe API for field offsets similar to the pointer-to-member functionality in C++. It usesmemoffset
to implementoffset_of!
.Calling
field_offset::offset_of!
returns aFieldOffset<Field, Container>
structure, which transparently wrapsusize
and while providing phantom annotations to ensure it is used with the correct container and field type. It uses this to provide some generic field projection functionality, mostly aroundPin
.
Prior Art: Languages
Many languages which support low level programming have some equivalent to this functionality.
-
The C programming language supports this as an
offsetof
macro, for example:offsetof(struct some_struct, some_field)
is morally equivalent to this proposal'soffset_of!(SomeStruct, some_field)
. It produces a integer constant, so it can be used during C's equivalent of constant evaluation.Notably, C's
offsetof
is more powerful than theoffset_of!
proposed in this RFC, as it supports access to fields of nested types, and even can project through arrays, for exampleoffsetof(some_type, foo.bar[1].baz)
is completely allowed.Extending
core::mem::offset_of
to support some of these use-cases could be done in the future, as is discussed in the future possibilities section below. -
C++ can an
offsetof
macro which is essentially compatible with C's, although it is only "conditionally supported" to use it on types which are not "standard layout" (see the linked documentation for information on what the quoted text means).C++ also has support for getting a pointer to a field via it's pointer-to-member feature. This feature is powerful and while it replaces some uses of
offsetof
, it does not replace all of them -
Zig supports this via the
@offsetOf
function, which takes atype
andu8[]
that contains the field name as a string, for example@offsetOf(SomeType, "some_field")
would be essentially equivalent to this proposal'score::mem::offset_of!(SomeType, some_field)
.Zig also supports the
@bitOffsetOf
function, as Zig allows structs to contain fields which are not byte-aligned (e.g. bitfields). The syntax and semantics are otherwise equivalent.These are all
comptime
functions, which means they may be used in situations which are morally equivalent to Rust's constant evaluation. -
The D language allows accessing the offset via a property of each field. For example,
SomeType.some_field.offsetof
is essentially equivalent to this proposal'score::mem::offset_of!(SomeType, some_field)
. -
Swift supports this via the
MemoryLayout.offset(of:)
function (note: the link contains a good overview of the design). For example,MemoryLayout<SomeType>.offset(of: \.some_field))
would be the equivalent tocore::mem::offset_of!(SomeType, some_field)
.The
\.some_field
syntax is a partial key path (a Swift language feature). This can grant access to fields of nested structs in a manner similar to C'soffsetof
, for example:MemoryLayout<SomeType>.offset(of: \.foo.bar.baz)
.
Unresolved questions
- Should any of the features listed as "Future possibilities" be supported initially?
Future possibilities
This proposal is intentionally minimal, so there are a number of future possibilities.
Nested Field Access
In C, expressions like offsetof(struct some_struct, foo.bar.baz[3].quux)
are
allowed, where foo.bar.baz[3].quux
denotes a path to a derived field. This can
be of somewhat arbitrary complexity, accessing fields of nested structs,
performing array indexing (often this is used to access past the end of the
array even), and so on. Similar functionality is offered by
MemoryLayout.offset
in Swift, where more complex language features are used to
achieve it.
This was omitted from this proposal because it is not commonly used, and can generally be replaced (at the cost of convenience) by multiple invocations of the macro.
Additionally, in the future similar functionality could be added in a
backwards-compatible way, either by directly allowing usage like
offset_of!(SomeStruct, foo.bar.baz[3].quux)
, or by requiring each field be
comma-separated, as in offset_of!(SomeStruct, foo, bar, baz, [3], quux)
.
Note that while this example shows a combination that supports array indexing, it's unclear if this is actually desirable for Rust.
Enum support (offset_of!(SomeEnum::StructVariant, field_on_variant)
)
Eventually, it may be desirable to allow offset_of!
to access the fields
inside the struct and tuple variants of certain enums (possibly limited to enums
with a primitive integer representation, such as #[repr(C)]
, #[repr(int)]
,
or #[repr(C, int)]
-- where int
is one of Rust's primitive integer types —
u8, isize, u128, etc).
For example, in the future something like the following could be allowed:
use core::mem::offset_of;
#[repr(i8)]
enum Event {
Key { pressed: bool, code: u32 },
Resize(u32, u32),
}
const EVENT_KEY_CODE: usize = offset_of!(Event::Key, code);
const EVENT_KEY_PRESSED: usize = offset_of!(Event::Key, pressed);
const EVENT_RESIZE_W: usize = offset_of!(Event::Resize, 0);
const EVENT_RESIZE_H: usize = offset_of!(Event::Resize, 1);
In this example, the name/path of the variant is used as the first argument.
While there are use-cases for this in low level FFI code (similar to the use
cases for #[repr(int)]
and #[repr(C, int)]
enums), this may need further
design work, and is left to the future.
A drawback is that it is unclear how to support these types in the "Nested Field Access" proposed above, so in the future should we decide to support one of these, a decision may need to be made about the other.
memoffset::span_of!
Functionality
The memoffset
crate has support for a span_of!
macro (used like
memoffset::span_of!(SomeType, some_field)
), which expands to a Range<usize>
indicating which bytes of SomeType
are from the field some_field
.
The use case for this is more limited than that of offset_of!
, so it was
omitted from this proposal. That said, should this prove sufficiently useful, it
would be simple to add a similar macro to core::mem
in the future.
Support for types with unsized fields
... via offset_of_val!
Currently, we don't support use with unsized types. That is, (A, B, ... [T])
and/or (A, B, ..., dyn Foo)
, or their equivalent in structs.
The reason for this is that the offset of the unsized field is not always known,
such as in the case of the last field in (Foo, dyn SomeTrait)
, where the
offset depends on what the concrete type is. Notably, the compiler must read the
alignment out of the vtable when you access such a field.
This is equivalent to not being able to determine the size and/or alignment
of ?Sized
types, where we solve it by making the user provide the instance
they're interested in, as in core::mem::{size_of_val, align_of_val}
, so we
could provide an analogous core::mem::offset_of_val!($val, $Type, $field)
to
support this case.
It would be reasonable to add this in the future, but is left out for now.
... by only forbidding the edge case
The only case where we currently do not know the offset of a field statically is when the user has requested the offset of the unsized field, and the unsized field is a trait object.
It's possible for us to provide the offset of for:
-
The fields before the unsized field, as in
offset_of!((i32, dyn Send), 0)
. -
The unsized field itself if it is a type which whose offset is known without reading the metadata, such as
[T]
,str
, and types that end with them, as inoffset_of!((i32, [u16]), 1)
, oroffset_of!((u16, (i64, str)), 2)
.
Allowing these is somewhat inconsistent with core::mem::align_of
, which could
provide the alignment in some cases such as slices, but instead you must use
core::mem::align_of_val
for all ?Sized
types (admittedly, allowing
align_of::<[T]>()
is perhaps not very compelling, as it's always the same as
align_of::<T>()
).
Either way, it's trivially backwards compatible for us to eventually start allowing these, and for the trailing slice/str case, it seems difficult to pin down the cases where it's allowed without risk of complicating potential future features (like custom DSTs, extern types, or whatever other new unsized types we might want to add).
As such, it's left for future work.
Fields in Traits
If support for fields in traits is ever added, then it would be an open question
how offset_of!
behaves when applied to a generic value of a trait type which
has fields. Similarly, if an offset_of_val!
is added, it would interact with
trait objects of traits that have fields.
In either case, this could be forbidden or allowed, but decisions along these lines are deferred for now, as fields in traits do not yet exist.