31 KiB
- Feature Name:
expand_open_options
- Start Date: 2015-08-04
- RFC PR: rust-lang/rfcs#1252
- Rust Issue: rust-lang/rust#30014
Summary
Document and expand the open options.
Motivation
The options that can be passed to the os when opening a file vary between systems. And even if the options seem the same or similar, there may be unexpected corner cases.
This RFC attempts to
- describe the different corner cases and behaviour of various operating systems.
- describe the intended behaviour and interaction of Rusts options.
- remedy cross-platform inconsistencies.
- suggest extra options to expose more platform-specific options.
Detailed design
Access modes
Read-only
Open a file for read-only.
Write-only
Open a file for write-only.
If a file already exist, the contents of that file get overwritten, but it is not truncated. Example:
// contents of file before: "aaaaaaaa"
file.write(b"bbbb")
// contents of file after: "bbbbaaaa"
Read-write
This is the simple combinations of read-only and write-only.
Append-mode
Append-mode is similar to write-only, but all writes always happen at the end of the file. This mode is especially useful if multiple processes or threads write to a single file, like a log file. The operating system guarantees all writes are atomic: no writes get mangled because another process writes at the same time. No guarantees are made about the order writes end up in the file though.
Note: sadly append-mode is not atomic on NFS filesystems.
One maybe obvious note when using append-mode: make sure that all data that
belongs together, is written to the file in one operation. This can be done
by concatenating strings before passing them to write()
, or using a buffered
writer (with a more than adequately sized buffer) and calling flush()
when the
message is complete.
Implementation detail: On Windows opening a file in append-mode has one flag less, the right to change existing data is removed. On Unix opening a file in append-mode has one flag extra, that sets the status of the file descriptor to append-mode. You could say that on Windows write is a superset of append, while on Unix append is a superset of write.
Because of this append is treated as a separate access mode in Rust, and if
.append(true)
is specified than .write()
is ignored.
Read-append
Writing to the file works exactly the same as in append-mode.
Reading is more difficult, and may involve a lot of seeking. When the file is opened, the position for reading may be set at the end of the file, so you should first seek to the beginning. Also after every write the position is set to the end of the file. So before writing you should save the current position, and restore it after the write.
try!(file.seek(SeekFrom::Start(0)));
try!(file.read(&mut buffer));
let pos = try!(file.seek(SeekFrom::Current(0)));
try!(file.write(b"foo"));
try!(file.seek(SeekFrom::Start(pos)));
try!(file.read(&mut buffer));
No access mode set
Even if you don't have read or write permission to a file, it is possible to
open it on some systems by opening it with no access mode set (or the equivalent
there of). This is true for Windows, Linux (with the flag O_PATH
) and
GNU/Hurd.
What can be done with a file opened this way is system-specific and niche. Since Linux version 2.6.39 all three operating systems support reading metadata such as the file size and timestamps.
On practically all variants of Unix opening a file without specifying the access
mode falls back to opening the file read-only. This is because of the way the
access flags where traditionally defined: O_RDONLY = 0
, O_WRONLY = 1
and
O_RDWR = 2
. When no flags are set, the access mode is 0
: read-only. But
code that relies on this is considered buggy and not portable.
What should Rust do when no access mode is specified? Fall back to read-only, open with the most similar system-specific mode, or always fail to open? This RFC proposes to always fail. This is the conservative choice, and can be changed to open in a system-specific mode if a clear use case arises. Implementing a fallback is not worth it: it is no great effort to set the access mode explicitly.
Windows-specific
.access_mode(FILE_READ_DATA)
On Windows you can detail whether you want to have read and/or write access to the files data, attributes and/or extended attributes. Managing permissions in such detail has proven itself too difficult, and generally not worth it.
In Rust, .read(true)
gives you read access to the data, attributes and
extended attributes. Similarly, .write(true)
gives write access to those
three, and the right to append data beyond the current end of the file.
But if you want fine-grained control, with access_mode
you have it.
.access_mode()
overrides the access mode set with Rusts cross-platform
options. Reasons to do so:
- it is not possible to un-set the flags set by Rusts options;
- otherwise the cross-platform options have to be wrapped with
#[cfg(unix)]
, instead of only having to wrap the Windows-specific option.
As a reference, this are the flags set by Rusts access modes:
bit | flag | read | write | read-write | append | read-append |
---|---|---|---|---|---|---|
generic rights | ||||||
31 | GENERIC_READ | set | set | set | ||
30 | GENERIC_WRITE | set | set | |||
29 | GENERIC_EXECUTE | |||||
28 | GENERIC_ALL | |||||
specific rights | ||||||
0 | FILE_READ_DATA | implied | implied | implied | ||
1 | FILE_WRITE_DATA | implied | implied | |||
2 | FILE_APPEND_DATA | implied | implied | set | set | |
3 | FILE_READ_EA | implied | implied | implied | ||
4 | FILE_WRITE_EA | implied | implied | set | set | |
6 | FILE_EXECUTE | |||||
7 | FILE_READ_ATTRIBUTES | implied | implied | implied | ||
8 | FILE_WRITE_ATTRIBUTES | implied | implied | set | set | |
standard rights | ||||||
16 | DELETE | |||||
17 | READ_CONTROL | implied | implied | implied | set | set+implied |
18 | WRITE_DAC | |||||
19 | WRITE_OWNER | |||||
20 | SYNCHRONIZE | implied | implied | implied | set | set+implied |
The implied flags can be specified explicitly with the constants
FILE_GENERIC_READ
and FILE_GENERIC_WRITE
.
Creation modes
creation mode | file exists | file does not exist | Unix | Windows |
---|---|---|---|---|
not set (open existing) | open | fail | OPEN_EXISTING | |
.create(true) | open | create | O_CREAT | OPEN_ALWAYS |
.truncate(true) | truncate | fail | O_TRUNC | TRUNCATE_EXISTING |
.create(true).truncate(true) | truncate | create | O_CREAT + O_TRUNC | CREATE_ALWAYS |
.create_new(true) | fail | create | O_CREAT + O_EXCL | CREATE_NEW + FILE_FLAG_OPEN_REPARSE_POINT |
Not set (open existing)
Open an existing file. Fails if the file does not exist.
Create
.create(true)
Open an existing file, or create a new file if it does not already exists.
Truncate
.truncate(true)
Open an existing file, and truncate it to zero length. Fails if the file does not exist. Attributes and permissions of the truncated file are preserved.
Note when using the Windows-specific .access_mode()
: truncating will only work
if the GENERIC_WRITE
flag is set. Setting the equivalent individual flags is
not enough.
Create and truncate
.create(true).truncate(true)
Open an existing file and truncate it to zero length, or create a new file if it does not already exists.
Note when using the Windows-specific .access_mode()
: Contrary to only
.truncate(true)
, with .create(true).truncate(true)
Windows can truncate an
existing file without requiring any flags to be set.
On Windows the attributes of an existing file can cause .open()
to fail. If
the existing file has the attribute hidden set, it is necessary to open with
FILE_ATTRIBUTE_HIDDEN
. Similarly if the existing file has the attribute
system set, it is necessary to open with FILE_ATTRIBUTE_SYSTEM
. See
the Windows-specific .attributes()
below on how to set these.
Create_new
.create_new(true)
Create a new file, and fail if it already exist.
On Unix this options started its life as a security measure. If you first check
if a file does not exists with exists()
and then call open()
, some other
process may have created in the in mean time. .create_new()
is an atomic
operation that will fail if a file already exist at the location.
.create_new()
has a special rule on Unix for dealing with symlinks. If there
is a symlink at the final element of its path (e.g. the filename), open will
fail. This is to prevent a vulnerability where an unprivileged process could
trick a privileged process into following a symlink and overwriting a file the
unprivileged process has no access to.
See Exploiting symlinks and tmpfiles.
On Windows this behaviour is imitated by specifying not only CREATE_NEW
but
also FILE_FLAG_OPEN_REPARSE_POINT
.
Simply put: nothing is allowed to exist on the target location, also no (dangling) symlink.
if .create_new(true)
is set, .create()
and .truncate()
are ignored.
Unix-specific: Mode
.mode(0o666)
On Unix the new file is created by default with permissions 0o666
minus the
systems umask
(see Wikipedia). It is
possible to set on other mode with this option.
If a file already exist or .create(true)
or .create_new(true)
are not
specified, .mode()
is ignored.
Rust currently does not expose a way to modify the umask.
Windows-specific: Attributes
.attributes(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_HIDDEN | FILE_ATTRIBUTE_SYSTEM)
Files on Windows can have several attributes, most commonly one or more of the
following four: readonly, hidden, system and archive. Most
others
are properties set by the file system. Of the others only
FILE_ATTRIBUTE_ENCRYPTED
, FILE_ATTRIBUTE_TEMPORARY
and
FILE_ATTRIBUTE_OFFLINE
can be set when creating a new file. All others are
silently ignored.
It is no use to set the archive attribute, as Windows sets it automatically when the file is newly created or modified. This flag may then be used by backup applications as an indication of which files have changed.
If a new file is created because it does not yet exist and .create(true)
or
.create_new(true)
are specified, the new file is given the attributes declared
with .attributes()
.
If an existing file is opened with .create(true).truncate(true)
, its
existing attributes are preserved and combined with the ones declared with
.attributes()
.
In all other cases the attributes get ignored.
Combination of access modes and creation modes
Some combinations of creation modes and access modes do not make sense.
For example: .create(true)
when opening read-only. If the file does not
already exist, it is created and you start reading from an empty file. And it is
questionable whether you have permission to create a new file if you don't have
write access. A new file is created on all systems I have tested, but there is
no documentation that explicitly guarantees this behaviour.
The same is true for .truncate(true)
with read and/or append mode. Should an
existing file be modified if you don't have write permission? On Unix it is
undefined
(see some
comments on the
OpenBSD mailing list). The behaviour on Windows is inconsistent and depends on
whether .create(true)
is set.
To give guarantees about cross-platform (and sane) behaviour, Rust should allow only the following combinations of access modes and creations modes:
creation mode | read | write | read-write | append | read-append |
---|---|---|---|---|---|
not set (open existing) | X | X | X | X | X |
create | X | X | X | X | |
truncate | X | X | |||
create and truncate | X | X | |||
create_new | X | X | X | X |
It is possible to bypass these restrictions by using system-specific options (as
in this case you already have to take care of cross-platform support yourself).
On Unix this is done by setting the creation mode using .custom_flags()
with
O_CREAT
, O_TRUNC
and/or O_EXCL
. On Windows this can be done by manually
specifying .access_mode()
(see above).
Asynchronous IO
Out op scope.
Other options
Inheritance of file descriptors
Leaking file descriptors to child processes can cause problems and can be a security vulnerability. See this report by Python.
On Windows, child processes do not inherit file descriptors by default (but this can be changed). On Unix they always inherit, unless the close-on-exec flag is set.
The close on exec flag can be set atomically when opening the file, or later
with fcntl
. The O_CLOEXEC
flag is in the relatively new POSIX-2008 standard,
and all modern versions of Unix support it. The following table lists for which
operating systems we can rely on the flag to be supported.
os | since version | oldest supported version |
---|---|---|
OS X | 10.6 | 10.7? |
Linux | 2.6.23 | 2.6.32 (supported by Rust) |
FreeBSD | 8.3 | 8.4 |
OpenBSD | 5.0 | 5.7 |
NetBSD | 6.0 | 5.0 |
Dragonfly BSD | 3.2 | ? (3.2 is not updated since 2012-12-14) |
Solaris | 11 | 10 |
This means we can always set the flag O_CLOEXEC
, and do an additional fcntl
if the os is NetBSD or Solaris.
Custom flags
.custom_flags()
Windows and the various flavours of Unix support flags that are not
cross-platform, but that can be useful in some circumstances. On Unix they will
be passed as the variable flags to open
, on Windows as the
dwFlagsAndAttributes parameter.
The cross-platform options of Rust can do magic: they can set any flag necessary
to ensure it works as expected. For example, .append(true)
on Unix not only
sets the flag O_APPEND
, but also automatically O_WRONLY
or O_RDWR
. This
special treatment is not available for the custom flags.
Custom flags can only set flags, not remove flags set by Rusts options.
For the custom flags on Unix, the bits that define the access mode are masked
out with O_ACCMODE
, to ensure they do not interfere with the access mode set
by Rusts options.
bit | flag |
---|---|
31 | FILE_FLAG_WRITE_THROUGH |
30 | FILE_FLAG_OVERLAPPED |
29 | FILE_FLAG_NO_BUFFERING |
28 | FILE_FLAG_RANDOM_ACCESS |
27 | FILE_FLAG_SEQUENTIAL_SCAN |
26 | FILE_FLAG_DELETE_ON_CLOSE |
25 | FILE_FLAG_BACKUP_SEMANTICS |
24 | FILE_FLAG_POSIX_SEMANTICS |
23 | FILE_FLAG_SESSION_AWARE |
21 | FILE_FLAG_OPEN_REPARSE_POINT |
20 | FILE_FLAG_OPEN_NO_RECALL |
19 | FILE_FLAG_FIRST_PIPE_INSTANCE |
18 | FILE_FLAG_OPEN_REQUIRING_OPLOCK |
Unix:
POSIX | Linux | OS X | FreeBSD | OpenBSD | NetBSD | Dragonfly BSD | Solaris |
---|---|---|---|---|---|---|---|
O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC |
O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT |
O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL |
O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND |
O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC |
O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY |
O_NOCTTY | O_NOCTTY | O_NOCTTY | O_NOCTTY | O_NOCTTY | O_NOCTTY | ||
O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW |
O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK |
O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_FSYNC | O_SYNC |
O_DSYNC | O_DSYNC | O_DSYNC | O_DSYNC | O_DSYNC | |||
O_RSYNC | O_RSYNC | O_RSYNC | |||||
O_DIRECT | O_DIRECT | O_DIRECT | O_DIRECT | ||||
O_ASYNC | O_ASYNC | ||||||
O_NOATIME | |||||||
O_PATH | |||||||
O_TMPFILE | |||||||
O_SHLOCK | O_SHLOCK | O_SHLOCK | O_SHLOCK | O_SHLOCK | |||
O_EXLOCK | O_EXLOCK | O_EXLOCK | O_EXLOCK | O_EXLOCK | |||
O_SYMLINK | |||||||
O_EVTONLY | |||||||
O_NOSIGPIPE | |||||||
O_ALT_IO | |||||||
O_NOLINKS | |||||||
O_XATTR | |||||||
POSIX | Linux | OS X | FreeBSD | OpenBSD | NetBSD | Dragonfly BSD | Solaris |
Windows-specific flags and attributes
The following variables for CreateFile2 currently have no equivalent functions in Rust to set them:
DWORD dwSecurityQosFlags;
LPSECURITY_ATTRIBUTES lpSecurityAttributes;
HANDLE hTemplateFile;
Changes from current
Access mode
- Current:
.append(true)
requires.write(true)
on Unix, but not on Windows. New: ignore.write()
if.append(true)
is specified. - Current: when
.append(true)
is set, it is not possible to modify file attributes on Windows, but it is possible to change the file mode on Unix. New: allow file attributes to be modified on Windows in append-mode. - Current: On Windows
.read()
and.write()
set individual bit flags instead of generic flags. New: Set generic flags, as recommend by Microsoft. e.g.GENERIC_WRITE
instead ofFILE_GENERIC_WRITE
andGENERIC_READ
instead ofFILE_GENERIC_READ
. Currently truncate is broken on Windows, this fixes it. - Current: when no access mode is set, this falls back to opening the file read-only on Unix, and opening with no access permissions on Windows. New: always fail to open if no access mode is set.
- Rename the Windows-specific
.desired_access()
to.access_mode()
Creation mode
- Implement
.create_new()
. - Do not allow
.truncate(true)
if the access mode is read-only and/or append. - Do not allow
.create(true)
or.create_new (true)
if the access mode is read-only. - Remove the Windows-specific
.creation_disposition()
. It has no use, because all its options can be set in a cross-platform way. - Split the Windows-specific
.flags_and_attributes()
into.custom_flags()
and.attributes()
. This is a form of future-proofing, as the new Windows 8Createfile2
also splits these attributes. This has the advantage of a clear separation between file attributes, that are somewhat similar to Unix mode bits, and the custom flags that modify the behaviour of the current file handle.
Other options
- Set the close-on-exec flag atomically on Unix if supported.
- Implement
.custom_flags()
on Windows and Unix to pass custom flags to the system.
Drawbacks
This adds a thin layer on top of the raw operating system calls. In this pull request the conclusion was: this seems like a good idea for a "high level" abstraction like OpenOptions.
This adds extra options that many applications can do without (otherwise they were already implemented).
Also this RFC is in line with the vision for IO in the IO-OS-redesign:
- [The APIs] should impose essentially zero cost over the underlying OS services; the core APIs should map down to a single syscall unless more are needed for cross-platform compatibility.
- The APIs should largely feel like part of "Rust" rather than part of any legacy, and they should enable truly portable code.
- Coverage. The std APIs should over time strive for full coverage of non-niche, cross-platform capabilities.
Alternatives
The first version of this RFC contained a proposal for options that control caching anf file locking. They are out of scope for now, but included here for reference.
Sharing / locking
On Unix it is possible for multiple processes to read and write to the same file at the same time.
When you open a file on Windows, the system by default denies other processes to read or write to the file, or delete it. By setting the sharing mode, it is possible to allow other processes read, write and/or delete access. For cross-platform consistency, Rust imitates Unix by setting all sharing flags.
Unix has no equivalent to the kind of file locking that Windows has. It has two types of advisory locking, POSIX and BSD-style. Advisory means any process that does not use locking itself can happily ignore the locking af another process. As if that is not bad enough, they both have problems that make them close to unusable for modern multi-threaded programs. Linux may in some very rare cases support mandatory file locking, but it is just as broken as advisory.
Windows-specific: Share mode
.share_mode(FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE)
It is possible to set the individual share permissions with .share_mode()
.
The current philosophy of this function is that others should have no rights,
unless explicitly granted. I think a better fit for Rust would be to give all
others all rights, unless explicitly denied, e.g.:
.share_mode(DENY_READ | DENY_WRITE | DENY_DELETE)
.
Controlling caching
When dealing file file systems and hard disks, there are several kinds of caches. Giving hints or controlling them may improve performance or data consistency.
- read-ahead (performance of reads and overwrites)
Instead of requesting only the data necessary for a single
read()
call from a storage device, an operating system may request more data than necessary to have it already available for the next read. - os cache (performance of reads and overwrites) The os may keep the data of previous reads and writes in memory to increase the performance of future reads and possibly writes.
- os staging area (convenience/performance of reads and writes)
The size and alignment of data reads and writes to a disk should
correspondent to sectors on the storage device, usually 512 or 4096 bytes.
The os makes sure a regular
write()
orread()
doesn't have to care about this. For example a small write (say a 100 bytes) has to rewrite a whole sector. The os often has the surrounding data in its cache and can efficiently combine it to write the whole sector. - delayed writing (performance/correctness of writes) The os may delay writes to improve performance, for example by batching consecutive writes, and scheduling with reads to minimize seeking.
- on-disk write cache (performance/correctness of writes) Most hard disk / storage devices have a small RAM cache. It can speed up reads, and writes can return as soon as the data is written to the devices cache.
Read-ahead hint
.read_ahead_hint(enum CacheHint)
enum ReadAheadHint {
Default,
Sequential,
Random,
}
If you read a file sequentially the read-ahead is beneficial, for completely random access it can become a penalty.
Default
uses the generally good heuristics of the operating system.Sequential
indicates sequential but not necessary consecutive access. With this the os may increase the amount of data that is read ahead.Random
indicates mainly random access. The os may disable its read-ahead cache.
This option is treated as a hint. It is ignored if the os does not support it, or if the behaviour of the application proves it is set wrong.
Open flags / system calls:
- Windows: flags
FILE_FLAG_SEQUENTIAL_SCAN
andFILE_FLAG_RANDOM_ACCESS
- Linux, FreeBSD, NetBSD:
posix_fadvise()
with the flagsPOSIX_FADV_SEQUENTIAL
andPOSIX_FADV_RANDOM
- OS X:
fcntl()
with withF_RDAHEAD 0
for random (there is no special mode for sequential).
OS cache
used_once(true)
When reading many gigabytes of data a process may push useful data from other
processes out of the os cache. To keep the performance of the whole system up, a
process could indicate to the os whether data is only needed once, or not needed
anymore. On Linux, FreeBSD and NetBSD this is possible with fcntl
POSIX_FADV_DONTNEED
after a read or write with sync (or before close). On
FreeBSD and NetBSD it is also possible to specify this up-front with fnctl
POSIX_FADV_NOREUSE
, and on OS X with fnctl F_NOCACHE
. Windows does not seem
to provide an option for this.
This option may negatively effect the performance of writes smaller than the sector size, as cached data may not be available to the os staging area.
This control over the os cache is the main reason some applications use direct io, despite it being less convenient and disabling other useful caches.
Delayed writing and on-disk write cache
.sync_data(true)
and .sync_all(true)
There can be two delays (by the os and by the disk cache) between when an application performs a write, and when the data is written to persistent storage. They increase performance, but increase the risk of data loss in case of a systems crash or power outage.
When dealing with critical data, it may be useful to control these caches to
make the chance of data loss smaller. The application should normally do so by
calling Rusts stand-alone functions sync_data()
or sync_all()
at meaningful
points (e.g. when the file is in a consistent state, or a state it can recover
from).
However, .sync_data()
and .sync_all()
may also be given as an open option.
This guarantees every write will not return before the data is written to disk.
These options improve reliability as and you can never accidentally forget a
sync.
Whether performance with these options is worse than with the stand-alone functions is hard to say. With these options the data maybe has to be synchronised more often. But the stand-alone functions often sync outstanding writes to all files, while the options possibly sync only the current file.
The difference between .sync_all()
and .sync_data(true)
is that
.sync_data(true)
does not update the less critical metadata such as the last
modified timestamp (although it will be written eventually).
Open flags:
- Windows:
FILE_FLAG_WRITE_THROUGH
for.sync_all()
- Unix:
O_SYNC
for.sync_all()
andO_DSYNC
for.sync_data()
If a system does not support syncing only data, this option will fall back to
syncing both data and metadata. If .sync_all(true)
is specified,
.sync_data()
is ignored.
Direct access / no caching
Most operating systems offer a mode that reads data straight from disk to an application buffer, or that writes straight from a buffer to disk. This avoid the small cost of a memory copy. It has the side effect that the data is not available to the os to provide caching. Also, because this does not use the os staging area all reads and writes have to take care of data sizes and alignment themselves.
Overview:
- os staging area: not used
- read-ahead: not used
- os cache: data may be used, but is not added
- delayed writing: no delay
- on-disk write cache: maybe
Open flags / system calls:
- Windows: flag
FILE_FLAG_NO_BUFFERING
- Linux, FreeBSD, NetBSD, Dragonfly BSD: flag
O_DIRECT
The other options offer a more fine-grained control over caching, and usually offer better performance or correctness guarantees. This option is sometimes used by applications as a crude way to control (disable) the os cache.
Rust should not currently expose this as an open option, because it should be used with an abstraction / external crate that handles the data size and alignment requirements. If it should be used at all.
Unresolved questions
None.