rfcs/text/1044-io-fs-2.1.md

559 lines
22 KiB
Markdown
Raw Normal View History

- Feature Name: `fs2`
- Start Date: 2015-04-04
2017-11-01 21:29:36 +08:00
- RFC PR: [rust-lang/rfcs#1044](https://github.com/rust-lang/rfcs/pull/1044)
- Rust Issue: [rust-lang/rust#24796](https://github.com/rust-lang/rust/issues/24796)
# Summary
Expand the scope of the `std::fs` module by enhancing existing functionality,
exposing lower-level representations, and adding a few new functions.
# Motivation
The current `std::fs` module serves many of the basic needs of interacting with
a filesystem, but is missing a lot of useful functionality. For example, none of
these operations are possible in stable Rust today:
* Inspecting a file's modification/access times
* Reading low-level information like that contained in `libc::stat`
* Inspecting the unix permission bits on a file
* Blanket setting the unix permission bits on a file
* Leveraging `DirEntry` for the extra metadata it might contain
2015-04-15 00:21:20 +08:00
* Reading the metadata of a symlink (not what it points at)
* Resolving all symlink in a path
There is some more functionality listed in the [RFC issue][issue], but this RFC
will not attempt to solve the entirety of that issue at this time. This RFC
strives to expose APIs for much of the functionality listed above that is on the
track to becoming `#[stable]` soon.
[issue]: https://github.com/rust-lang/rfcs/issues/939
## Non-goals of this RFC
There are a few areas of the `std::fs` API surface which are **not** considered
goals for this RFC. It will be left for future RFCs to add new APIs for these
areas:
* Enhancing `copy` to copy directories recursively or configuring how copying
happens.
* Enhancing or stabilizing `walk` and its functionality.
* Temporary files or directories
# Detailed design
2015-04-08 07:02:40 +08:00
First, a vision for how lowering APIs in general will be presented, and then a
number of specific APIs will each be proposed. Many of the proposed APIs are
independent from one another and this RFC may not be implemented all-in-one-go
but instead piecemeal over time, allowing the designs to evolve slightly in the
meantime.
## Lowering APIs
### The vision for the `os` module
One of the principles of [IO reform][io-reform-vision] was to:
> Provide hooks for integrating with low-level and/or platform-specific APIs.
The original RFC went into some amount of detail for how this would look, in
particular by use of the `os` module. Part of the goal of this RFC is to flesh
out that vision in more detail.
Ultimately, the organization of `os` is planned to look something like the
following:
```
os
unix applicable to all cfg(unix) platforms; high- and low-level APIs
io extensions to std::io
fs extensions to std::fs
net extensions to std::net
env extensions to std::env
process extensions to std::process
...
linux applicable to linux only
io, fs, net, env, process, ...
macos ...
windows ...
```
APIs whose behavior is platform-specific are provided only within the `std::os`
hierarchy, making it easy to audit for usage of such APIs. Organizing the
platform modules internally in the same way as `std` makes it easy to find
relevant extensions when working with `std`.
It is emphatically *not* the goal of the `std::os::*` modules to provide
bindings to *all* system APIs for each platform; this work is left to external
crates. The goals are rather to:
1. Facilitate interop between abstract types like `File` that `std` provides and
the underlying system. This is done via "lowering": extension traits like
[`AsRawFd`][AsRawFd] allow you to extract low-level, platform-specific
representations out of `std` types like `File` and `TcpStream`.
2. Provide high-level but platform-specific APIs that feel like those in the
rest of `std`. Just as with the rest of `std`, the goal here is not to
include all possible functionality, but rather the most commonly-used or
fundamental.
Lowering makes it possible for external crates to provide APIs that work
"seamlessly" with `std` abstractions. For example, a crate for Linux might
provide an `epoll` facility that can work directly with `std::fs::File` and
`std::net::TcpStream` values, completely hiding the internal use of file
descriptors. Eventually, such a crate could even be merged into `std::os::unix`,
with minimal disruption -- there is little distinction between `std` and other
crates in this regard.
Concretely, lowering has two ingredients:
1. Introducing one or more "raw" types that are generally direct aliases for C
2015-04-08 07:02:40 +08:00
types (more on this in the next section).
2. Providing an extension trait that makes it possible to extract a raw type
from a `std` type. In some cases, it's possible to go the other way around as
well. The conversion can be by reference or by value, where the latter is
used mainly to avoid the destructor associated with a `std` type (e.g. to
extract a file descriptor from a `File` and eliminate the `File` object,
without closing the file).
While we do not seek to exhaustively bind types or APIs from the underlying
system, it *is* a goal to provide lowering operations for every high-level type
to a system-level data type, whenever applicable. This RFC proposes several such
lowerings that are currently missing from `std::fs`.
[io-reform-vision]: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#vision-for-io
[AsRawFd]: http://static.rust-lang.org/doc/master/std/os/unix/io/trait.AsRawFd.html
2015-04-08 07:02:40 +08:00
#### `std::os::platform::raw`
Each of the primitives in the standard library will expose the ability to be
lowered into its component abstraction, facilitating the need to define these
abstractions and organize them in the platform-specific modules. This RFC
proposes the following guidelines for doing so:
* Each platform will have a `raw` module inside of `std::os` which houses all of
its platform specific definitions.
* Only type definitions will be contained in `raw` modules, no function
bindings, methods, or trait implementations.
* Cross-platform types (e.g. those shared on all `unix` platforms) will be
located in the respective cross-platform module. Types which only differ in
the width of an integer type are considered to be cross-platform.
* Platform-specific types will exist only in the `raw` module for that platform.
A platform-specific type may have different field names, components, or just
not exist on other platforms.
Differences in integer widths are not considered to be enough of a platform
difference to define in each separate platform's module, meaning that it will be
possible to write code that uses `os::unix` but doesn't compile on all Unix
platforms. It is believed that most consumers of these types will continue to
store the same type (e.g. not assume it's an `i32`) throughout the application
or immediately cast it to a known type.
To reiterate, it is not planned for each `raw` module to provide *exhaustive*
bindings to each platform. Only those abstractions which the standard library is
lowering into will be defined in each `raw` module.
### Lowering `Metadata` (all platforms)
Currently the `Metadata` structure exposes very few pieces of information about
a file. Some of this is because the information is not available across all
platforms, but some of it is also because the standard library does not have the
appropriate abstraction to return at this time (e.g. time stamps). The raw
contents of `Metadata` (a `stat` on Unix), however, should be accessible via
lowering no matter what.
The following trait hierarchy and new structures will be added to the standard
library.
```rust
mod os::windows::fs {
pub trait MetadataExt {
fn file_attributes(&self) -> u32; // `dwFileAttributes` field
fn creation_time(&self) -> u64; // `ftCreationTime` field
fn last_access_time(&self) -> u64; // `ftLastAccessTime` field
fn last_write_time(&self) -> u64; // `ftLastWriteTime` field
fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields
}
impl MetadataExt for fs::Metadata { ... }
}
mod os::unix::fs {
pub trait MetadataExt {
2015-04-08 07:02:40 +08:00
fn as_raw(&self) -> &Metadata;
}
impl MetadataExt for fs::Metadata { ... }
2015-04-08 07:02:40 +08:00
pub struct Metadata(raw::stat);
impl Metadata {
2015-04-11 01:14:41 +08:00
// Accessors for fields available in `raw::stat` for *all* unix platforms
2015-04-08 07:02:40 +08:00
fn dev(&self) -> raw::dev_t; // st_dev field
fn ino(&self) -> raw::ino_t; // st_ino field
fn mode(&self) -> raw::mode_t; // st_mode field
fn nlink(&self) -> raw::nlink_t; // st_nlink field
fn uid(&self) -> raw::uid_t; // st_uid field
fn gid(&self) -> raw::gid_t; // st_gid field
fn rdev(&self) -> raw::dev_t; // st_rdev field
fn size(&self) -> raw::off_t; // st_size field
fn blksize(&self) -> raw::blksize_t; // st_blksize field
fn blocks(&self) -> raw::blkcnt_t; // st_blocks field
fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec)
fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec)
fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec)
}
}
// st_flags, st_gen, st_lspare, st_birthtim, st_qspare
mod os::{linux, macos, freebsd, ...}::fs {
2015-04-08 07:02:40 +08:00
pub mod raw {
pub type dev_t = ...;
pub type ino_t = ...;
// ...
pub struct stat {
// ... same public fields as libc::stat
}
}
pub trait MetadataExt {
2015-04-08 07:02:40 +08:00
fn as_raw_stat(&self) -> &raw::stat;
}
impl MetadataExt for os::unix::fs::RawMetadata { ... }
impl MetadataExt for fs::Metadata { ... }
}
```
The goal of this hierarchy is to expose all of the information in the OS-level
metadata in as cross-platform of a method as possible while adhering to the
design principles of the standard library.
The interesting part about working in a "cross platform" manner here is that the
makeup of `libc::stat` on unix platforms can vary quite a bit between platforms.
For example some platforms have a `st_birthtim` field while others do not.
To enable as much ergonomic usage as possible, the `os::unix` module will expose
the *intersection* of metadata available in `libc::stat` across all unix
platforms. The information is still exposed in a raw fashion (in terms of the
values returned), but methods are required as the raw structure is not exposed.
The unix platforms then leverage the more fine-grained modules in `std::os`
(e.g. `linux` and `macos`) to return the raw `libc::stat` structure. This will
allow full access to the information in `libc::stat` in all platforms with clear
opt-in to when you're using platform-specific information.
One of the major goals of the `os::unix::fs` design is to enable as much
functionality as possible when programming against "unix in general" while still
allowing applications to choose to only program against macos, for example.
2020-05-16 13:08:09 +08:00
#### Fate of `Metadata::{accessed, modified}`
At this time there is no suitable type in the standard library to represent the
return type of these two functions. The type would either have to be some form
of time stamp or moment in time, both of which are difficult abstractions to add
lightly.
Consequently, both of these functions will be **deprecated** in favor of
requiring platform-specific code to access the modification/access time of
files. This information is all available via the `MetadataExt` traits listed
above.
Eventually, once a `std` type for cross-platform timestamps is available, these
methods will be re-instated as returning that type.
### Lowering and setting `Permissions` (Unix)
> **Note**: this section only describes behavior on unix.
Currently there is no stable method of inspecting the permission bits on a file,
and it is unclear whether the current unstable methods of doing so,
`PermissionsExt::mode`, should be stabilized. The main question around this
2020-05-16 13:08:09 +08:00
piece of functionality is whether to provide a higher level abstraction (e.g.
similar to the `bitflags` crate) for the permission bits on unix.
2015-04-08 07:02:40 +08:00
This RFC proposes considering the methods for stabilization as-is and not
pursuing a higher level abstraction of the unix permission bits. To facilitate
in their inspection and manipulation, however, the following constants will be
added:
```rust
mod os::unix::fs {
pub const USER_READ: raw::mode_t;
pub const USER_WRITE: raw::mode_t;
pub const USER_EXECUTE: raw::mode_t;
pub const USER_RWX: raw::mode_t;
pub const OTHER_READ: raw::mode_t;
pub const OTHER_WRITE: raw::mode_t;
pub const OTHER_EXECUTE: raw::mode_t;
pub const OTHER_RWX: raw::mode_t;
pub const GROUP_READ: raw::mode_t;
pub const GROUP_WRITE: raw::mode_t;
pub const GROUP_EXECUTE: raw::mode_t;
pub const GROUP_RWX: raw::mode_t;
pub const ALL_READ: raw::mode_t;
pub const ALL_WRITE: raw::mode_t;
pub const ALL_EXECUTE: raw::mode_t;
pub const ALL_RWX: raw::mode_t;
2015-04-09 06:54:25 +08:00
pub const SETUID: raw::mode_t;
pub const SETGID: raw::mode_t;
pub const STICKY_BIT: raw::mode_t;
2015-04-08 07:02:40 +08:00
}
```
Finally, the `set_permissions` function of the `std::fs` module is also proposed
to be marked `#[stable]` soon as a method of blanket setting permissions for a
file.
## Constructing `Permissions`
2015-04-09 08:24:31 +08:00
Currently there is no method to construct an instance of `Permissions` on any
platform. This RFC proposes adding the following APIs:
```rust
mod os::unix::fs {
pub trait PermissionsExt {
2015-04-08 07:02:40 +08:00
fn from_mode(mode: raw::mode_t) -> Self;
}
impl PermissionsExt for Permissions { ... }
}
```
2015-04-09 08:24:31 +08:00
This RFC does not propose yet adding a cross-platform way to construct a
`Permissions` structure due to the radical differences between how unix and
windows handle permissions.
## Creating directories with permissions
Currently the standard library does not expose an API which allows setting the
permission bits on unix or security attributes on Windows. This RFC proposes
adding the following API to `std::fs`:
```rust
2015-04-15 00:22:13 +08:00
pub struct DirBuilder { ... }
2015-04-15 00:22:13 +08:00
impl DirBuilder {
/// Creates a new set of options with default mode/security settings for all
/// platforms and also non-recursive.
pub fn new() -> Self;
/// Indicate that directories create should be created recursively, creating
/// all parent directories if they do not exist with the same security and
/// permissions settings.
pub fn recursive(&mut self, recursive: bool) -> &mut Self;
/// Create the specified directory with the options configured in this
/// builder.
pub fn create<P: AsRef<Path>>(&self, path: P) -> io::Result<()>;
}
mod os::unix::fs {
2015-04-15 00:22:13 +08:00
pub trait DirBuilderExt {
2015-04-08 07:02:40 +08:00
fn mode(&mut self, mode: raw::mode_t) -> &mut Self;
}
2015-04-15 00:22:13 +08:00
impl DirBuilderExt for DirBuilder { ... }
}
mod os::windows::fs {
// once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added
2015-04-15 00:22:13 +08:00
pub trait DirBuilderExt {
fn security_attributes(&mut self, ...) -> &mut Self;
}
2015-04-15 00:22:13 +08:00
impl DirBuilderExt for DirBuilder { ... }
}
2015-04-08 07:02:40 +08:00
```
2015-04-08 07:02:40 +08:00
This sort of builder is also extendable to other flavors of functions in the
future, such as [C++'s template parameter][cpp-dir-template]:
[cpp-dir-template]: http://en.cppreference.com/w/cpp/experimental/fs/create_directory
```rust
/// Use the specified directory as a "template" for permissions and security
/// settings of the new directories to be created.
///
/// On unix this will issue a `stat` of the specified directory and new
/// directories will be created with the same permission bits. On Windows
/// this will trigger the use of the `CreateDirectoryEx` function.
pub fn template<P: AsRef<Path>>(&mut self, path: P) -> &mut Self;
```
At this time, however, it is not proposed to add this method to
2015-04-15 00:22:13 +08:00
`DirBuilder`.
2015-04-08 07:02:40 +08:00
2015-04-09 01:07:45 +08:00
## Adding `FileType`
Currently there is no enumeration or newtype representing a list of "file types"
on the local filesystem. This is partly done because the need is not so high
right now. Some situations, however, imply that it is more efficient to learn
the file type at once instead of testing for each individual file type itself.
For example some platforms' `DirEntry` type can know the `FileType` without an
extra syscall. If code were to test a `DirEntry` separately for whether it's a
file or a directory, it may issue more syscalls necessary than if it instead
learned the type and then tested that if it was a file or directory.
The full set of file types, however, is not always known nor portable across
platforms, so this RFC proposes the following hierarchy:
```rust
#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct FileType(..);
impl FileType {
pub fn is_dir(&self) -> bool;
pub fn is_file(&self) -> bool;
2015-04-15 00:21:20 +08:00
pub fn is_symlink(&self) -> bool;
2015-04-09 01:07:45 +08:00
}
```
Extension traits can be added in the future for testing for other more flavorful
kinds of files on various platforms (such as unix sockets on unix platforms).
#### Dealing with `is_{file,dir}` and `file_type` methods
Currently the `fs::Metadata` structure exposes stable `is_file` and `is_dir`
accessors. The struct will also grow a `file_type` accessor for this newtype
struct being added. It is proposed that `Metadata` will retain the
`is_{file,dir}` convenience methods, but no other "file type testers" will be
added.
2015-04-15 00:21:20 +08:00
## Enhancing symlink support
Currently the `std::fs` module provides a `soft_link` and `read_link` function,
2015-04-15 00:21:20 +08:00
but there is no method of doing other symlink related tasks such as:
2015-04-15 00:21:20 +08:00
* Testing whether a file is a symlink
* Reading the metadata of a symlink, not what it points to
The following APIs will be added to `std::fs`:
```rust
/// Returns the metadata of the file pointed to by `p`, and this function,
2015-04-15 00:21:20 +08:00
/// unlike `metadata` will **not** follow symlinks.
pub fn symlink_metadata<P: AsRef<Path>>(p: P) -> io::Result<Metadata>;
```
## Binding `realpath`
There's a [long-standing issue][realpath] that the unix function `realpath` is
not bound, and this RFC proposes adding the following API to the `fs` module:
[realpath]: https://github.com/rust-lang/rust/issues/11857
```rust
/// Canonicalizes the given file name to an absolute path with all `..`, `.`,
2015-04-15 00:21:20 +08:00
/// and symlink components resolved.
///
/// On unix this function corresponds to the return value of the `realpath`
/// function, and on Windows this corresponds to the `GetFullPathName` function.
///
/// Note that relative paths given to this function will use the current working
/// directory as a base, and the current working directory is not managed in a
/// thread-local fashion, so this function may need to be synchronized with
/// other calls to `env::change_dir`.
pub fn canonicalize<P: AsRef<Path>>(p: P) -> io::Result<PathBuf>;
```
## Tweaking `PathExt`
Currently the `PathExt` trait is unstable, yet it is quite convenient! The main
motivation for its `#[unstable]` tag is that it is unclear how much
functionality should be on `PathExt` versus the `std::fs` module itself.
Currently a small subset of functionality is offered, but it is unclear what the
guiding principle for the contents of this trait are.
This RFC proposes a few guiding principles for this trait:
* Only read-only operations in `std::fs` will be exposed on `PathExt`. All
operations which require modifications to the filesystem will require calling
methods through `std::fs` itself.
* Some inspection methods on `Metadata` will be exposed on `PathExt`, but only
those where it logically makes sense for `Path` to be the `self` receiver. For
example `PathExt::len` will not exist (size of the file), but
`PathExt::is_dir` will exist.
Concretely, the `PathExt` trait will be expanded to:
```rust
pub trait PathExt {
fn exists(&self) -> bool;
fn is_dir(&self) -> bool;
fn is_file(&self) -> bool;
fn metadata(&self) -> io::Result<Metadata>;
2015-04-15 00:21:20 +08:00
fn symlink_metadata(&self) -> io::Result<Metadata>;
fn canonicalize(&self) -> io::Result<PathBuf>;
fn read_link(&self) -> io::Result<PathBuf>;
fn read_dir(&self) -> io::Result<ReadDir>;
}
impl PathExt for Path { ... }
```
## Expanding `DirEntry`
Currently the `DirEntry` API is quite minimalistic, exposing very few of the
underlying attributes. Platforms like Windows actually contain an entire
`Metadata` inside of a `DirEntry`, enabling much more efficient walking of
directories in some situations.
The following APIs will be added to `DirEntry`:
```rust
impl DirEntry {
/// This function will return the filesystem metadata for this directory
2015-04-15 00:21:20 +08:00
/// entry. This is equivalent to calling `fs::symlink_metadata` on the
/// path returned.
///
/// On Windows this function will always return `Ok` and will not issue a
/// system call, but on unix this will always issue a call to `stat` to
/// return metadata.
pub fn metadata(&self) -> io::Result<Metadata>;
2015-04-09 01:07:45 +08:00
/// Return what file type this `DirEntry` contains.
///
/// On some platforms this may not require reading the metadata of the
/// underlying file from the filesystem, but on other platforms it may be
/// required to do so.
2015-04-15 00:21:44 +08:00
pub fn file_type(&self) -> io::Result<FileType>;
/// Returns the file name for this directory entry.
pub fn file_name(&self) -> OsString;
}
2015-04-09 00:58:23 +08:00
mod os::unix::fs {
pub trait DirEntryExt {
fn ino(&self) -> raw::ino_t; // read the d_ino field
}
impl DirEntryExt for fs::DirEntry { ... }
}
```
# Drawbacks
* This is quite a bit of surface area being added to the `std::fs` API, and it
may perhaps be best to scale it back and add it in a more incremental fashion
instead of all at once. Most of it, however, is fairly straightforward, so it
seems prudent to schedule many of these features for the 1.1 release.
* Exposing raw information such as `libc::stat` or `WIN32_FILE_ATTRIBUTE_DATA`
possibly can hamstring altering the implementation in the future. At this
point, however, it seems unlikely that the exposed pieces of information will
be changing much.
# Alternatives
* Instead of exposing accessor methods in `MetadataExt` on Windows, the raw
`WIN32_FILE_ATTRIBUTE_DATA` could be returned. We may change, however, to
using `BY_HANDLE_FILE_INFORMATION` one day which would make the return value
from this function more difficult to implement.
* A `std::os::MetadataExt` trait could be added to access truly common
information such as modification/access times across all platforms. The return
value would likely be a `u64` "something" and would be clearly documented as
being a lossy abstraction and also only having a platform-specific meaning.
* The `PathExt` trait could perhaps be implemented on `DirEntry`, but it doesn't
necessarily seem appropriate for all the methods and using inherent methods
also seems more logical.
# Unresolved questions
* What is the ultimate role of crates like `liblibc`, and how do we draw the
line between them and `std::os` definitions?