2017-11-15 03:35:15 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2017-01-25 02:58:06 +08:00
|
|
|
/*
|
2017-10-10 03:15:34 +08:00
|
|
|
* fscrypt.h: declarations for per-file encryption
|
|
|
|
*
|
2018-12-12 17:50:12 +08:00
|
|
|
* Filesystems that implement per-file encryption must include this header
|
|
|
|
* file.
|
2017-01-25 02:58:06 +08:00
|
|
|
*
|
|
|
|
* Copyright (C) 2015, Google, Inc.
|
|
|
|
*
|
|
|
|
* Written by Michael Halcrow, 2015.
|
|
|
|
* Modified by Jaegeuk Kim, 2015.
|
|
|
|
*/
|
2017-10-10 03:15:34 +08:00
|
|
|
#ifndef _LINUX_FSCRYPT_H
|
|
|
|
#define _LINUX_FSCRYPT_H
|
2017-01-25 02:58:06 +08:00
|
|
|
|
|
|
|
#include <linux/fs.h>
|
2018-12-12 17:50:12 +08:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/slab.h>
|
2019-08-05 10:35:43 +08:00
|
|
|
#include <uapi/linux/fscrypt.h>
|
2017-01-25 02:58:06 +08:00
|
|
|
|
|
|
|
#define FS_CRYPTO_BLOCK_SIZE 16
|
|
|
|
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
union fscrypt_policy;
|
2017-01-25 02:58:06 +08:00
|
|
|
struct fscrypt_info;
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
struct seq_file;
|
2017-01-25 02:58:06 +08:00
|
|
|
|
|
|
|
struct fscrypt_str {
|
|
|
|
unsigned char *name;
|
|
|
|
u32 len;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct fscrypt_name {
|
|
|
|
const struct qstr *usr_fname;
|
|
|
|
struct fscrypt_str disk_name;
|
|
|
|
u32 hash;
|
|
|
|
u32 minor_hash;
|
|
|
|
struct fscrypt_str crypto_buf;
|
2020-09-24 12:26:23 +08:00
|
|
|
bool is_nokey_name;
|
2017-01-25 02:58:06 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define FSTR_INIT(n, l) { .name = n, .len = l }
|
|
|
|
#define FSTR_TO_QSTR(f) QSTR_INIT((f)->name, (f)->len)
|
|
|
|
#define fname_name(p) ((p)->disk_name.name)
|
|
|
|
#define fname_len(p) ((p)->disk_name.len)
|
|
|
|
|
2017-07-06 12:01:59 +08:00
|
|
|
/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
|
fscrypt: v2 encryption policy support
Add a new fscrypt policy version, "v2". It has the following changes
from the original policy version, which we call "v1" (*):
- Master keys (the user-provided encryption keys) are only ever used as
input to HKDF-SHA512. This is more flexible and less error-prone, and
it avoids the quirks and limitations of the AES-128-ECB based KDF.
Three classes of cryptographically isolated subkeys are defined:
- Per-file keys, like used in v1 policies except for the new KDF.
- Per-mode keys. These implement the semantics of the DIRECT_KEY
flag, which for v1 policies made the master key be used directly.
These are also planned to be used for inline encryption when
support for it is added.
- Key identifiers (see below).
- Each master key is identified by a 16-byte master_key_identifier,
which is derived from the key itself using HKDF-SHA512. This prevents
users from associating the wrong key with an encrypted file or
directory. This was easily possible with v1 policies, which
identified the key by an arbitrary 8-byte master_key_descriptor.
- The key must be provided in the filesystem-level keyring, not in a
process-subscribed keyring.
The following UAPI additions are made:
- The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a
fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated
from fscrypt_policy/fscrypt_policy_v1 by the version code prefix.
- A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows
getting the v1 or v2 encryption policy of an encrypted file or
directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not
be used because it did not have a way for userspace to indicate which
policy structure is expected. The new ioctl includes a size field, so
it is extensible to future fscrypt policy versions.
- The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2
encryption policies. Such keys are kept logically separate from keys
for v1 encryption policies, and are identified by 'identifier' rather
than by 'descriptor'. The 'identifier' need not be provided when
adding a key, since the kernel will calculate it anyway.
This patch temporarily keeps adding/removing v2 policy keys behind the
same permission check done for adding/removing v1 policy keys:
capable(CAP_SYS_ADMIN). However, the next patch will carefully take
advantage of the cryptographically secure master_key_identifier to allow
non-root users to add/remove v2 policy keys, thus providing a full
replacement for v1 policies.
(*) Actually, in the API fscrypt_policy::version is 0 while on-disk
fscrypt_context::format is 1. But I believe it makes the most sense
to advance both to '2' to have them be in sync, and to consider the
numbering to start at 1 except for the API quirk.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:47 +08:00
|
|
|
#define FSCRYPT_SET_CONTEXT_MAX_SIZE 40
|
2017-07-06 12:01:59 +08:00
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
#ifdef CONFIG_FS_ENCRYPTION
|
2021-07-29 12:37:28 +08:00
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/*
|
2021-07-29 12:37:28 +08:00
|
|
|
* If set, the fscrypt bounce page pool won't be allocated (unless another
|
|
|
|
* filesystem needs it). Set this if the filesystem always uses its own bounce
|
|
|
|
* pages for writes and therefore won't need the fscrypt bounce page pool.
|
2018-12-12 17:50:12 +08:00
|
|
|
*/
|
|
|
|
#define FS_CFLG_OWN_PAGES (1U << 1)
|
|
|
|
|
2021-07-29 12:37:28 +08:00
|
|
|
/* Crypto operations for filesystems */
|
2018-12-12 17:50:12 +08:00
|
|
|
struct fscrypt_operations {
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/* Set of optional flags; see above for allowed flags */
|
2018-12-12 17:50:12 +08:00
|
|
|
unsigned int flags;
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If set, this is a filesystem-specific key description prefix that
|
|
|
|
* will be accepted for "logon" keys for v1 fscrypt policies, in
|
|
|
|
* addition to the generic prefix "fscrypt:". This functionality is
|
|
|
|
* deprecated, so new filesystems shouldn't set this field.
|
|
|
|
*/
|
2018-12-12 17:50:12 +08:00
|
|
|
const char *key_prefix;
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the fscrypt context of the given inode.
|
|
|
|
*
|
|
|
|
* @inode: the inode whose context to get
|
|
|
|
* @ctx: the buffer into which to get the context
|
|
|
|
* @len: length of the @ctx buffer in bytes
|
|
|
|
*
|
|
|
|
* Return: On success, returns the length of the context in bytes; this
|
|
|
|
* may be less than @len. On failure, returns -ENODATA if the
|
|
|
|
* inode doesn't have a context, -ERANGE if the context is
|
|
|
|
* longer than @len, or another -errno code.
|
|
|
|
*/
|
2020-05-12 03:13:57 +08:00
|
|
|
int (*get_context)(struct inode *inode, void *ctx, size_t len);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set an fscrypt context on the given inode.
|
|
|
|
*
|
|
|
|
* @inode: the inode whose context to set. The inode won't already have
|
|
|
|
* an fscrypt context.
|
|
|
|
* @ctx: the context to set
|
|
|
|
* @len: length of @ctx in bytes (at most FSCRYPT_SET_CONTEXT_MAX_SIZE)
|
|
|
|
* @fs_data: If called from fscrypt_set_context(), this will be the
|
|
|
|
* value the filesystem passed to fscrypt_set_context().
|
|
|
|
* Otherwise (i.e. when called from
|
|
|
|
* FS_IOC_SET_ENCRYPTION_POLICY) this will be NULL.
|
|
|
|
*
|
|
|
|
* i_rwsem will be held for write.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, -errno on failure.
|
|
|
|
*/
|
2020-05-12 03:13:57 +08:00
|
|
|
int (*set_context)(struct inode *inode, const void *ctx, size_t len,
|
|
|
|
void *fs_data);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the dummy fscrypt policy in use on the filesystem (if any).
|
|
|
|
*
|
|
|
|
* Filesystems only need to implement this function if they support the
|
|
|
|
* test_dummy_encryption mount option.
|
|
|
|
*
|
|
|
|
* Return: A pointer to the dummy fscrypt policy, if the filesystem is
|
|
|
|
* mounted with test_dummy_encryption; otherwise NULL.
|
|
|
|
*/
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
const union fscrypt_policy *(*get_dummy_policy)(struct super_block *sb);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether a directory is empty. i_rwsem will be held for write.
|
|
|
|
*/
|
2020-05-12 03:13:57 +08:00
|
|
|
bool (*empty_dir)(struct inode *inode);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/* The filesystem's maximum ciphertext filename length, in bytes */
|
2018-12-12 17:50:12 +08:00
|
|
|
unsigned int max_namelen;
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether the filesystem's inode numbers and UUID are stable,
|
|
|
|
* meaning that they will never be changed even by offline operations
|
|
|
|
* such as filesystem shrinking and therefore can be used in the
|
|
|
|
* encryption without the possibility of files becoming unreadable.
|
|
|
|
*
|
|
|
|
* Filesystems only need to implement this function if they want to
|
|
|
|
* support the FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{32,64} flags. These
|
|
|
|
* flags are designed to work around the limitations of UFS and eMMC
|
|
|
|
* inline crypto hardware, and they shouldn't be used in scenarios where
|
|
|
|
* such hardware isn't being used.
|
|
|
|
*
|
|
|
|
* Leaving this NULL is equivalent to always returning false.
|
|
|
|
*/
|
fscrypt: add support for IV_INO_LBLK_64 policies
Inline encryption hardware compliant with the UFS v2.1 standard or with
the upcoming version of the eMMC standard has the following properties:
(1) Per I/O request, the encryption key is specified by a previously
loaded keyslot. There might be only a small number of keyslots.
(2) Per I/O request, the starting IV is specified by a 64-bit "data unit
number" (DUN). IV bits 64-127 are assumed to be 0. The hardware
automatically increments the DUN for each "data unit" of
configurable size in the request, e.g. for each filesystem block.
Property (1) makes it inefficient to use the traditional fscrypt
per-file keys. Property (2) precludes the use of the existing
DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits.
Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the
encryption to modified as follows:
- The encryption keys are derived from the master key, encryption mode
number, and filesystem UUID.
- The IVs are chosen as (inode_number << 32) | file_logical_block_num.
For filenames encryption, file_logical_block_num is 0.
Since the file nonces aren't used in the key derivation, many files may
share the same encryption key. This is much more efficient on the
target hardware. Including the inode number in the IVs and mixing the
filesystem UUID into the keys ensures that data in different files is
nevertheless still encrypted differently.
Additionally, limiting the inode and block numbers to 32 bits and
placing the block number in the low bits maintains compatibility with
the 64-bit DUN convention (property (2) above).
Since this scheme assumes that inode numbers are stable (which may
preclude filesystem shrinking) and that inode and file logical block
numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on
filesystems that meet these constraints. These are acceptable
limitations for the cases where this format would actually be used.
Note that IV_INO_LBLK_64 is an on-disk format, not an implementation.
This patch just adds support for it using the existing filesystem layer
encryption. A later patch will add support for inline encryption.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-25 05:54:36 +08:00
|
|
|
bool (*has_stable_inodes)(struct super_block *sb);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the number of bits that the filesystem uses to represent inode
|
|
|
|
* numbers and file logical block numbers.
|
|
|
|
*
|
|
|
|
* By default, both of these are assumed to be 64-bit. This function
|
|
|
|
* can be implemented to declare that either or both of these numbers is
|
|
|
|
* shorter, which may allow the use of the
|
|
|
|
* FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{32,64} flags and/or the use of
|
|
|
|
* inline crypto hardware whose maximum DUN length is less than 64 bits
|
|
|
|
* (e.g., eMMC v5.2 spec compliant hardware). This function only needs
|
|
|
|
* to be implemented if support for one of these features is needed.
|
|
|
|
*/
|
fscrypt: add support for IV_INO_LBLK_64 policies
Inline encryption hardware compliant with the UFS v2.1 standard or with
the upcoming version of the eMMC standard has the following properties:
(1) Per I/O request, the encryption key is specified by a previously
loaded keyslot. There might be only a small number of keyslots.
(2) Per I/O request, the starting IV is specified by a 64-bit "data unit
number" (DUN). IV bits 64-127 are assumed to be 0. The hardware
automatically increments the DUN for each "data unit" of
configurable size in the request, e.g. for each filesystem block.
Property (1) makes it inefficient to use the traditional fscrypt
per-file keys. Property (2) precludes the use of the existing
DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits.
Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the
encryption to modified as follows:
- The encryption keys are derived from the master key, encryption mode
number, and filesystem UUID.
- The IVs are chosen as (inode_number << 32) | file_logical_block_num.
For filenames encryption, file_logical_block_num is 0.
Since the file nonces aren't used in the key derivation, many files may
share the same encryption key. This is much more efficient on the
target hardware. Including the inode number in the IVs and mixing the
filesystem UUID into the keys ensures that data in different files is
nevertheless still encrypted differently.
Additionally, limiting the inode and block numbers to 32 bits and
placing the block number in the low bits maintains compatibility with
the 64-bit DUN convention (property (2) above).
Since this scheme assumes that inode numbers are stable (which may
preclude filesystem shrinking) and that inode and file logical block
numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on
filesystems that meet these constraints. These are acceptable
limitations for the cases where this format would actually be used.
Note that IV_INO_LBLK_64 is an on-disk format, not an implementation.
This patch just adds support for it using the existing filesystem layer
encryption. A later patch will add support for inline encryption.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-25 05:54:36 +08:00
|
|
|
void (*get_ino_and_lblk_bits)(struct super_block *sb,
|
|
|
|
int *ino_bits_ret, int *lblk_bits_ret);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the number of block devices to which the filesystem may write
|
|
|
|
* encrypted file contents.
|
|
|
|
*
|
|
|
|
* If the filesystem can use multiple block devices (other than block
|
|
|
|
* devices that aren't used for encrypted file contents, such as
|
|
|
|
* external journal devices), and wants to support inline encryption,
|
|
|
|
* then it must implement this function. Otherwise it's not needed.
|
|
|
|
*/
|
2020-07-02 09:56:05 +08:00
|
|
|
int (*get_num_devices)(struct super_block *sb);
|
2021-07-29 12:37:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If ->get_num_devices() returns a value greater than 1, then this
|
|
|
|
* function is called to get the array of request_queues that the
|
|
|
|
* filesystem is using -- one per block device. (There may be duplicate
|
|
|
|
* entries in this array, as block devices can share a request_queue.)
|
|
|
|
*/
|
2020-07-02 09:56:05 +08:00
|
|
|
void (*get_devices)(struct super_block *sb,
|
|
|
|
struct request_queue **devs);
|
2018-12-12 17:50:12 +08:00
|
|
|
};
|
|
|
|
|
2020-07-22 06:59:19 +08:00
|
|
|
static inline struct fscrypt_info *fscrypt_get_info(const struct inode *inode)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
2020-07-22 06:59:19 +08:00
|
|
|
/*
|
2020-12-03 10:20:40 +08:00
|
|
|
* Pairs with the cmpxchg_release() in fscrypt_setup_encryption_info().
|
2020-07-22 06:59:19 +08:00
|
|
|
* I.e., another task may publish ->i_crypt_info concurrently, executing
|
|
|
|
* a RELEASE barrier. We need to use smp_load_acquire() here to safely
|
|
|
|
* ACQUIRE the memory the other task published.
|
|
|
|
*/
|
|
|
|
return smp_load_acquire(&inode->i_crypt_info);
|
2018-12-12 17:50:12 +08:00
|
|
|
}
|
|
|
|
|
2019-12-10 04:50:21 +08:00
|
|
|
/**
|
|
|
|
* fscrypt_needs_contents_encryption() - check whether an inode needs
|
|
|
|
* contents encryption
|
2020-05-12 03:13:56 +08:00
|
|
|
* @inode: the inode to check
|
2019-12-10 04:50:21 +08:00
|
|
|
*
|
|
|
|
* Return: %true iff the inode is an encrypted regular file and the kernel was
|
|
|
|
* built with fscrypt support.
|
|
|
|
*
|
|
|
|
* If you need to know whether the encrypt bit is set even when the kernel was
|
|
|
|
* built without fscrypt support, you must use IS_ENCRYPTED() directly instead.
|
|
|
|
*/
|
|
|
|
static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode);
|
|
|
|
}
|
|
|
|
|
2019-03-21 02:39:11 +08:00
|
|
|
/*
|
2020-09-24 12:26:24 +08:00
|
|
|
* When d_splice_alias() moves a directory's no-key alias to its plaintext alias
|
|
|
|
* as a result of the encryption key being added, DCACHE_NOKEY_NAME must be
|
|
|
|
* cleared. Note that we don't have to support arbitrary moves of this flag
|
|
|
|
* because fscrypt doesn't allow no-key names to be the source or target of a
|
|
|
|
* rename().
|
2019-03-21 02:39:11 +08:00
|
|
|
*/
|
|
|
|
static inline void fscrypt_handle_d_move(struct dentry *dentry)
|
|
|
|
{
|
2020-09-24 12:26:24 +08:00
|
|
|
dentry->d_flags &= ~DCACHE_NOKEY_NAME;
|
2019-03-21 02:39:11 +08:00
|
|
|
}
|
|
|
|
|
fscrypt: add fscrypt_is_nokey_name()
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-18 15:56:05 +08:00
|
|
|
/**
|
|
|
|
* fscrypt_is_nokey_name() - test whether a dentry is a no-key name
|
|
|
|
* @dentry: the dentry to check
|
|
|
|
*
|
|
|
|
* This returns true if the dentry is a no-key dentry. A no-key dentry is a
|
|
|
|
* dentry that was created in an encrypted directory that hasn't had its
|
|
|
|
* encryption key added yet. Such dentries may be either positive or negative.
|
|
|
|
*
|
|
|
|
* When a filesystem is asked to create a new filename in an encrypted directory
|
|
|
|
* and the new filename's dentry is a no-key dentry, it must fail the operation
|
|
|
|
* with ENOKEY. This includes ->create(), ->mkdir(), ->mknod(), ->symlink(),
|
|
|
|
* ->rename(), and ->link(). (However, ->rename() and ->link() are already
|
|
|
|
* handled by fscrypt_prepare_rename() and fscrypt_prepare_link().)
|
|
|
|
*
|
|
|
|
* This is necessary because creating a filename requires the directory's
|
|
|
|
* encryption key, but just checking for the key on the directory inode during
|
|
|
|
* the final filesystem operation doesn't guarantee that the key was available
|
|
|
|
* during the preceding dentry lookup. And the key must have already been
|
|
|
|
* available during the dentry lookup in order for it to have been checked
|
|
|
|
* whether the filename already exists in the directory and for the new file's
|
|
|
|
* dentry not to be invalidated due to it incorrectly having the no-key flag.
|
|
|
|
*
|
|
|
|
* Return: %true if the dentry is a no-key name
|
|
|
|
*/
|
|
|
|
static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
|
|
|
|
{
|
|
|
|
return dentry->d_flags & DCACHE_NOKEY_NAME;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/* crypto.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
void fscrypt_enqueue_decrypt_work(struct work_struct *);
|
|
|
|
|
|
|
|
struct page *fscrypt_encrypt_pagecache_blocks(struct page *page,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int offs,
|
|
|
|
gfp_t gfp_flags);
|
|
|
|
int fscrypt_encrypt_block_inplace(const struct inode *inode, struct page *page,
|
|
|
|
unsigned int len, unsigned int offs,
|
|
|
|
u64 lblk_num, gfp_t gfp_flags);
|
|
|
|
|
|
|
|
int fscrypt_decrypt_pagecache_blocks(struct page *page, unsigned int len,
|
|
|
|
unsigned int offs);
|
|
|
|
int fscrypt_decrypt_block_inplace(const struct inode *inode, struct page *page,
|
|
|
|
unsigned int len, unsigned int offs,
|
|
|
|
u64 lblk_num);
|
2018-12-12 17:50:12 +08:00
|
|
|
|
2019-05-21 00:29:39 +08:00
|
|
|
static inline bool fscrypt_is_bounce_page(struct page *page)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
2019-05-21 00:29:39 +08:00
|
|
|
return page->mapping == NULL;
|
2018-12-12 17:50:12 +08:00
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:39 +08:00
|
|
|
static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
|
|
|
|
{
|
|
|
|
return (struct page *)page_private(bounce_page);
|
|
|
|
}
|
|
|
|
|
2020-05-12 03:13:58 +08:00
|
|
|
void fscrypt_free_bounce_page(struct page *bounce_page);
|
2018-12-12 17:50:12 +08:00
|
|
|
|
|
|
|
/* policy.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg);
|
|
|
|
int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_has_permitted_context(struct inode *parent, struct inode *child);
|
fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:24 +08:00
|
|
|
int fscrypt_set_context(struct inode *inode, void *fs_data);
|
2020-05-12 03:13:57 +08:00
|
|
|
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
struct fscrypt_dummy_policy {
|
|
|
|
const union fscrypt_policy *policy;
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
};
|
|
|
|
|
2020-09-17 12:11:36 +08:00
|
|
|
int fscrypt_set_test_dummy_encryption(struct super_block *sb, const char *arg,
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
struct fscrypt_dummy_policy *dummy_policy);
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
void fscrypt_show_test_dummy_encryption(struct seq_file *seq, char sep,
|
|
|
|
struct super_block *sb);
|
|
|
|
static inline void
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy)
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
{
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
kfree(dummy_policy->policy);
|
|
|
|
dummy_policy->policy = NULL;
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
}
|
|
|
|
|
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".
Why we need this
~~~~~~~~~~~~~~~~
The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.
Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.
Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.
Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.
On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).
By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.
Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:
- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.
- Calculating a key-dependent cryptographic identifier and returning it
to userspace.
- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.
Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.
However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.
References:
[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
/* keyring.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
void fscrypt_sb_free(struct super_block *sb);
|
|
|
|
int fscrypt_ioctl_add_key(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_ioctl_remove_key_all_users(struct file *filp, void __user *arg);
|
|
|
|
int fscrypt_ioctl_get_key_status(struct file *filp, void __user *arg);
|
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".
Why we need this
~~~~~~~~~~~~~~~~
The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.
Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.
Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.
Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.
On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).
By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.
Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:
- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.
- Calculating a key-dependent cryptographic identifier and returning it
to userspace.
- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.
Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.
However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.
References:
[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
|
2019-08-05 10:35:45 +08:00
|
|
|
/* keysetup.c */
|
fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:24 +08:00
|
|
|
int fscrypt_prepare_new_inode(struct inode *dir, struct inode *inode,
|
|
|
|
bool *encrypt_ret);
|
2020-05-12 03:13:58 +08:00
|
|
|
void fscrypt_put_encryption_info(struct inode *inode);
|
|
|
|
void fscrypt_free_inode(struct inode *inode);
|
|
|
|
int fscrypt_drop_inode(struct inode *inode);
|
2018-12-12 17:50:12 +08:00
|
|
|
|
|
|
|
/* fname.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
|
|
|
|
int lookup, struct fscrypt_name *fname);
|
2018-12-12 17:50:12 +08:00
|
|
|
|
|
|
|
static inline void fscrypt_free_filename(struct fscrypt_name *fname)
|
|
|
|
{
|
|
|
|
kfree(fname->crypto_buf.name);
|
|
|
|
}
|
|
|
|
|
2020-08-10 22:21:39 +08:00
|
|
|
int fscrypt_fname_alloc_buffer(u32 max_encrypted_len,
|
2020-05-12 03:13:58 +08:00
|
|
|
struct fscrypt_str *crypto_str);
|
|
|
|
void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str);
|
|
|
|
int fscrypt_fname_disk_to_usr(const struct inode *inode,
|
|
|
|
u32 hash, u32 minor_hash,
|
|
|
|
const struct fscrypt_str *iname,
|
|
|
|
struct fscrypt_str *oname);
|
|
|
|
bool fscrypt_match_name(const struct fscrypt_name *fname,
|
|
|
|
const u8 *de_name, u32 de_name_len);
|
|
|
|
u64 fscrypt_fname_siphash(const struct inode *dir, const struct qstr *name);
|
2020-09-24 13:47:21 +08:00
|
|
|
int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
|
fscrypt: derive dirhash key for casefolded directories
When we allow indexed directories to use both encryption and
casefolding, for the dirhash we can't just hash the ciphertext filenames
that are stored on-disk (as is done currently) because the dirhash must
be case insensitive, but the stored names are case-preserving. Nor can
we hash the plaintext names with an unkeyed hash (or a hash keyed with a
value stored on-disk like ext4's s_hash_seed), since that would leak
information about the names that encryption is meant to protect.
Instead, if we can accept a dirhash that's only computable when the
fscrypt key is available, we can hash the plaintext names with a keyed
hash using a secret key derived from the directory's fscrypt master key.
We'll use SipHash-2-4 for this purpose.
Prepare for this by deriving a SipHash key for each casefolded encrypted
directory. Make sure to handle deriving the key not only when setting
up the directory's fscrypt_info, but also in the case where the casefold
flag is enabled after the fscrypt_info was already set up. (We could
just always derive the key regardless of casefolding, but that would
introduce unnecessary overhead for people not using casefolding.)
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved commit message, updated fscrypt.rst, squashed with change
that avoids unnecessarily deriving the key, and many other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-21 06:31:57 +08:00
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/* bio.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
void fscrypt_decrypt_bio(struct bio *bio);
|
|
|
|
int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
|
|
|
|
sector_t pblk, unsigned int len);
|
2018-12-12 17:50:12 +08:00
|
|
|
|
|
|
|
/* hooks.c */
|
2020-05-12 03:13:58 +08:00
|
|
|
int fscrypt_file_open(struct inode *inode, struct file *filp);
|
|
|
|
int __fscrypt_prepare_link(struct inode *inode, struct inode *dir,
|
|
|
|
struct dentry *dentry);
|
|
|
|
int __fscrypt_prepare_rename(struct inode *old_dir, struct dentry *old_dentry,
|
|
|
|
struct inode *new_dir, struct dentry *new_dentry,
|
|
|
|
unsigned int flags);
|
|
|
|
int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry,
|
|
|
|
struct fscrypt_name *fname);
|
2020-12-03 10:20:37 +08:00
|
|
|
int __fscrypt_prepare_readdir(struct inode *dir);
|
2020-12-03 10:20:38 +08:00
|
|
|
int __fscrypt_prepare_setattr(struct dentry *dentry, struct iattr *attr);
|
2020-05-12 03:13:58 +08:00
|
|
|
int fscrypt_prepare_setflags(struct inode *inode,
|
|
|
|
unsigned int oldflags, unsigned int flags);
|
2020-09-17 12:11:34 +08:00
|
|
|
int fscrypt_prepare_symlink(struct inode *dir, const char *target,
|
|
|
|
unsigned int len, unsigned int max_len,
|
|
|
|
struct fscrypt_str *disk_link);
|
2020-05-12 03:13:58 +08:00
|
|
|
int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
|
|
|
|
unsigned int len, struct fscrypt_str *disk_link);
|
|
|
|
const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
|
|
|
|
unsigned int max_size,
|
|
|
|
struct delayed_call *done);
|
fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682
https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-07-02 14:53:46 +08:00
|
|
|
int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat);
|
2019-03-26 15:52:31 +08:00
|
|
|
static inline void fscrypt_set_ops(struct super_block *sb,
|
|
|
|
const struct fscrypt_operations *s_cop)
|
|
|
|
{
|
|
|
|
sb->s_cop = s_cop;
|
|
|
|
}
|
2018-12-12 17:50:12 +08:00
|
|
|
#else /* !CONFIG_FS_ENCRYPTION */
|
|
|
|
|
2020-07-22 06:59:19 +08:00
|
|
|
static inline struct fscrypt_info *fscrypt_get_info(const struct inode *inode)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
2020-07-22 06:59:19 +08:00
|
|
|
return NULL;
|
2018-12-12 17:50:12 +08:00
|
|
|
}
|
|
|
|
|
2019-12-10 04:50:21 +08:00
|
|
|
static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-03-21 02:39:11 +08:00
|
|
|
static inline void fscrypt_handle_d_move(struct dentry *dentry)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
fscrypt: add fscrypt_is_nokey_name()
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-11-18 15:56:05 +08:00
|
|
|
static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/* crypto.c */
|
|
|
|
static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:44 +08:00
|
|
|
static inline struct page *fscrypt_encrypt_pagecache_blocks(struct page *page,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int offs,
|
|
|
|
gfp_t gfp_flags)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
return ERR_PTR(-EOPNOTSUPP);
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:43 +08:00
|
|
|
static inline int fscrypt_encrypt_block_inplace(const struct inode *inode,
|
|
|
|
struct page *page,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int offs, u64 lblk_num,
|
|
|
|
gfp_t gfp_flags)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:47 +08:00
|
|
|
static inline int fscrypt_decrypt_pagecache_blocks(struct page *page,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int offs)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:46 +08:00
|
|
|
static inline int fscrypt_decrypt_block_inplace(const struct inode *inode,
|
|
|
|
struct page *page,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int offs, u64 lblk_num)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:39 +08:00
|
|
|
static inline bool fscrypt_is_bounce_page(struct page *page)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:39 +08:00
|
|
|
static inline void fscrypt_free_bounce_page(struct page *bounce_page)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
/* policy.c */
|
|
|
|
static inline int fscrypt_ioctl_set_policy(struct file *filp,
|
|
|
|
const void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
fscrypt: v2 encryption policy support
Add a new fscrypt policy version, "v2". It has the following changes
from the original policy version, which we call "v1" (*):
- Master keys (the user-provided encryption keys) are only ever used as
input to HKDF-SHA512. This is more flexible and less error-prone, and
it avoids the quirks and limitations of the AES-128-ECB based KDF.
Three classes of cryptographically isolated subkeys are defined:
- Per-file keys, like used in v1 policies except for the new KDF.
- Per-mode keys. These implement the semantics of the DIRECT_KEY
flag, which for v1 policies made the master key be used directly.
These are also planned to be used for inline encryption when
support for it is added.
- Key identifiers (see below).
- Each master key is identified by a 16-byte master_key_identifier,
which is derived from the key itself using HKDF-SHA512. This prevents
users from associating the wrong key with an encrypted file or
directory. This was easily possible with v1 policies, which
identified the key by an arbitrary 8-byte master_key_descriptor.
- The key must be provided in the filesystem-level keyring, not in a
process-subscribed keyring.
The following UAPI additions are made:
- The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a
fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated
from fscrypt_policy/fscrypt_policy_v1 by the version code prefix.
- A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows
getting the v1 or v2 encryption policy of an encrypted file or
directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not
be used because it did not have a way for userspace to indicate which
policy structure is expected. The new ioctl includes a size field, so
it is extensible to future fscrypt policy versions.
- The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY,
and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2
encryption policies. Such keys are kept logically separate from keys
for v1 encryption policies, and are identified by 'identifier' rather
than by 'descriptor'. The 'identifier' need not be provided when
adding a key, since the kernel will calculate it anyway.
This patch temporarily keeps adding/removing v2 policy keys behind the
same permission check done for adding/removing v1 policy keys:
capable(CAP_SYS_ADMIN). However, the next patch will carefully take
advantage of the cryptographically secure master_key_identifier to allow
non-root users to add/remove v2 policy keys, thus providing a full
replacement for v1 policies.
(*) Actually, in the API fscrypt_policy::version is 0 while on-disk
fscrypt_context::format is 1. But I believe it makes the most sense
to advance both to '2' to have them be in sync, and to consider the
numbering to start at 1 except for the API quirk.
Reviewed-by: Paul Crowley <paulcrowley@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:47 +08:00
|
|
|
static inline int fscrypt_ioctl_get_policy_ex(struct file *filp,
|
|
|
|
void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2020-03-15 04:50:49 +08:00
|
|
|
static inline int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
static inline int fscrypt_has_permitted_context(struct inode *parent,
|
|
|
|
struct inode *child)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:24 +08:00
|
|
|
static inline int fscrypt_set_context(struct inode *inode, void *fs_data)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
struct fscrypt_dummy_policy {
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static inline void fscrypt_show_test_dummy_encryption(struct seq_file *seq,
|
|
|
|
char sep,
|
|
|
|
struct super_block *sb)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void
|
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:35 +08:00
|
|
|
fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy)
|
fscrypt: support test_dummy_encryption=v2
v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.
Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.
To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.
To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.
To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)
Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.
Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-13 07:32:50 +08:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".
Why we need this
~~~~~~~~~~~~~~~~
The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.
Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.
Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.
Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.
On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).
By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.
Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:
- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.
- Calculating a key-dependent cryptographic identifier and returning it
to userspace.
- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.
Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.
However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.
References:
[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
/* keyring.c */
|
|
|
|
static inline void fscrypt_sb_free(struct super_block *sb)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int fscrypt_ioctl_add_key(struct file *filp, void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl
removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY.
It wipes the secret key itself, then "locks" the encrypted files and
directories that had been unlocked using that key -- implemented by
evicting the relevant dentries and inodes from the VFS caches.
The problem this solves is that many fscrypt users want the ability to
remove encryption keys, causing the corresponding encrypted directories
to appear "locked" (presented in ciphertext form) again. Moreover,
users want removing an encryption key to *really* remove it, in the
sense that the removed keys cannot be recovered even if kernel memory is
compromised, e.g. by the exploit of a kernel security vulnerability or
by a physical attack. This is desirable after a user logs out of the
system, for example. In many cases users even already assume this to be
the case and are surprised to hear when it's not.
It is not sufficient to simply unlink the master key from the keyring
(or to revoke or invalidate it), since the actual encryption transform
objects are still pinned in memory by their inodes. Therefore, to
really remove a key we must also evict the relevant inodes.
Currently one workaround is to run 'sync && echo 2 >
/proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the
system rather than just the inodes associated with the key being
removed, causing severe performance problems. Moreover, it requires
root privileges, so regular users can't "lock" their encrypted files.
Another workaround, used in Chromium OS kernels, is to add a new
VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of
drop_caches that operates on a single super_block. It does:
shrink_dcache_sb(sb);
invalidate_inodes(sb, false);
But it's still a hack. Yet, the major users of filesystem encryption
want this feature badly enough that they are actually using these hacks.
To properly solve the problem, start maintaining a list of the inodes
which have been "unlocked" using each master key. Originally this
wasn't possible because the kernel didn't keep track of in-use master
keys at all. But, with the ->s_master_keys keyring it is now possible.
Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified
master key in ->s_master_keys, then wipes the secret key itself, which
prevents any additional inodes from being unlocked with the key. Then,
it syncs the filesystem and evicts the inodes in the key's list. The
normal inode eviction code will free and wipe the per-file keys (in
->i_crypt_info). Note that freeing ->i_crypt_info without evicting the
inodes was also considered, but would have been racy.
Some inodes may still be in use when a master key is removed, and we
can't simply revoke random file descriptors, mmap's, etc. Thus, the
ioctl simply skips in-use inodes, and returns -EBUSY to indicate that
some inodes weren't evicted. The master key *secret* is still removed,
but the fscrypt_master_key struct remains to keep track of the remaining
inodes. Userspace can then retry the ioctl to evict the remaining
inodes. Alternatively, if userspace adds the key again, the refreshed
secret will be associated with the existing list of inodes so they
remain correctly tracked for future key removals.
The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a
kernel compromise some portions of plaintext file contents may still be
recoverable from memory. This can be solved by enabling page poisoning
system-wide, which security conscious users may choose to do. But it's
very difficult to solve otherwise, e.g. note that plaintext file
contents may have been read in other places than pagecache pages.
Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is
initially restricted to privileged users only. This is sufficient for
some use cases, but not all. A later patch will relax this restriction,
but it will require introducing key hashes, among other changes.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
static inline int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg)
|
2019-08-05 10:35:47 +08:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int fscrypt_ioctl_remove_key_all_users(struct file *filp,
|
|
|
|
void __user *arg)
|
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl
removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY.
It wipes the secret key itself, then "locks" the encrypted files and
directories that had been unlocked using that key -- implemented by
evicting the relevant dentries and inodes from the VFS caches.
The problem this solves is that many fscrypt users want the ability to
remove encryption keys, causing the corresponding encrypted directories
to appear "locked" (presented in ciphertext form) again. Moreover,
users want removing an encryption key to *really* remove it, in the
sense that the removed keys cannot be recovered even if kernel memory is
compromised, e.g. by the exploit of a kernel security vulnerability or
by a physical attack. This is desirable after a user logs out of the
system, for example. In many cases users even already assume this to be
the case and are surprised to hear when it's not.
It is not sufficient to simply unlink the master key from the keyring
(or to revoke or invalidate it), since the actual encryption transform
objects are still pinned in memory by their inodes. Therefore, to
really remove a key we must also evict the relevant inodes.
Currently one workaround is to run 'sync && echo 2 >
/proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the
system rather than just the inodes associated with the key being
removed, causing severe performance problems. Moreover, it requires
root privileges, so regular users can't "lock" their encrypted files.
Another workaround, used in Chromium OS kernels, is to add a new
VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of
drop_caches that operates on a single super_block. It does:
shrink_dcache_sb(sb);
invalidate_inodes(sb, false);
But it's still a hack. Yet, the major users of filesystem encryption
want this feature badly enough that they are actually using these hacks.
To properly solve the problem, start maintaining a list of the inodes
which have been "unlocked" using each master key. Originally this
wasn't possible because the kernel didn't keep track of in-use master
keys at all. But, with the ->s_master_keys keyring it is now possible.
Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified
master key in ->s_master_keys, then wipes the secret key itself, which
prevents any additional inodes from being unlocked with the key. Then,
it syncs the filesystem and evicts the inodes in the key's list. The
normal inode eviction code will free and wipe the per-file keys (in
->i_crypt_info). Note that freeing ->i_crypt_info without evicting the
inodes was also considered, but would have been racy.
Some inodes may still be in use when a master key is removed, and we
can't simply revoke random file descriptors, mmap's, etc. Thus, the
ioctl simply skips in-use inodes, and returns -EBUSY to indicate that
some inodes weren't evicted. The master key *secret* is still removed,
but the fscrypt_master_key struct remains to keep track of the remaining
inodes. Userspace can then retry the ioctl to evict the remaining
inodes. Alternatively, if userspace adds the key again, the refreshed
secret will be associated with the existing list of inodes so they
remain correctly tracked for future key removals.
The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a
kernel compromise some portions of plaintext file contents may still be
recoverable from memory. This can be solved by enabling page poisoning
system-wide, which security conscious users may choose to do. But it's
very difficult to solve otherwise, e.g. note that plaintext file
contents may have been read in other places than pagecache pages.
Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is
initially restricted to privileged users only. This is sufficient for
some use cases, but not all. A later patch will relax this restriction,
but it will require introducing key hashes, among other changes.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl
Add a new fscrypt ioctl, FS_IOC_GET_ENCRYPTION_KEY_STATUS. Given a key
specified by 'struct fscrypt_key_specifier' (the same way a key is
specified for the other fscrypt key management ioctls), it returns
status information in a 'struct fscrypt_get_key_status_arg'.
The main motivation for this is that applications need to be able to
check whether an encrypted directory is "unlocked" or not, so that they
can add the key if it is not, and avoid adding the key (which may
involve prompting the user for a passphrase) if it already is.
It's possible to use some workarounds such as checking whether opening a
regular file fails with ENOKEY, or checking whether the filenames "look
like gibberish" or not. However, no workaround is usable in all cases.
Like the other key management ioctls, the keyrings syscalls may seem at
first to be a good fit for this. Unfortunately, they are not. Even if
we exposed the keyring ID of the ->s_master_keys keyring and gave
everyone Search permission on it (note: currently the keyrings
permission system would also allow everyone to "invalidate" the keyring
too), the fscrypt keys have an additional state that doesn't map cleanly
to the keyrings API: the secret can be removed, but we can be still
tracking the files that were using the key, and the removal can be
re-attempted or the secret added again.
After later patches, some applications will also need a way to determine
whether a key was added by the current user vs. by some other user.
Reserved fields are included in fscrypt_get_key_status_arg for this and
other future extensions.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
static inline int fscrypt_ioctl_get_key_status(struct file *filp,
|
|
|
|
void __user *arg)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2019-08-05 10:35:45 +08:00
|
|
|
/* keysetup.c */
|
2018-12-12 17:50:12 +08:00
|
|
|
|
fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-17 12:11:24 +08:00
|
|
|
static inline int fscrypt_prepare_new_inode(struct inode *dir,
|
|
|
|
struct inode *inode,
|
|
|
|
bool *encrypt_ret)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(dir))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
static inline void fscrypt_put_encryption_info(struct inode *inode)
|
|
|
|
{
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-04-11 04:21:15 +08:00
|
|
|
static inline void fscrypt_free_inode(struct inode *inode)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl
Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl
removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY.
It wipes the secret key itself, then "locks" the encrypted files and
directories that had been unlocked using that key -- implemented by
evicting the relevant dentries and inodes from the VFS caches.
The problem this solves is that many fscrypt users want the ability to
remove encryption keys, causing the corresponding encrypted directories
to appear "locked" (presented in ciphertext form) again. Moreover,
users want removing an encryption key to *really* remove it, in the
sense that the removed keys cannot be recovered even if kernel memory is
compromised, e.g. by the exploit of a kernel security vulnerability or
by a physical attack. This is desirable after a user logs out of the
system, for example. In many cases users even already assume this to be
the case and are surprised to hear when it's not.
It is not sufficient to simply unlink the master key from the keyring
(or to revoke or invalidate it), since the actual encryption transform
objects are still pinned in memory by their inodes. Therefore, to
really remove a key we must also evict the relevant inodes.
Currently one workaround is to run 'sync && echo 2 >
/proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the
system rather than just the inodes associated with the key being
removed, causing severe performance problems. Moreover, it requires
root privileges, so regular users can't "lock" their encrypted files.
Another workaround, used in Chromium OS kernels, is to add a new
VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of
drop_caches that operates on a single super_block. It does:
shrink_dcache_sb(sb);
invalidate_inodes(sb, false);
But it's still a hack. Yet, the major users of filesystem encryption
want this feature badly enough that they are actually using these hacks.
To properly solve the problem, start maintaining a list of the inodes
which have been "unlocked" using each master key. Originally this
wasn't possible because the kernel didn't keep track of in-use master
keys at all. But, with the ->s_master_keys keyring it is now possible.
Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified
master key in ->s_master_keys, then wipes the secret key itself, which
prevents any additional inodes from being unlocked with the key. Then,
it syncs the filesystem and evicts the inodes in the key's list. The
normal inode eviction code will free and wipe the per-file keys (in
->i_crypt_info). Note that freeing ->i_crypt_info without evicting the
inodes was also considered, but would have been racy.
Some inodes may still be in use when a master key is removed, and we
can't simply revoke random file descriptors, mmap's, etc. Thus, the
ioctl simply skips in-use inodes, and returns -EBUSY to indicate that
some inodes weren't evicted. The master key *secret* is still removed,
but the fscrypt_master_key struct remains to keep track of the remaining
inodes. Userspace can then retry the ioctl to evict the remaining
inodes. Alternatively, if userspace adds the key again, the refreshed
secret will be associated with the existing list of inodes so they
remain correctly tracked for future key removals.
The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a
kernel compromise some portions of plaintext file contents may still be
recoverable from memory. This can be solved by enabling page poisoning
system-wide, which security conscious users may choose to do. But it's
very difficult to solve otherwise, e.g. note that plaintext file
contents may have been read in other places than pagecache pages.
Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is
initially restricted to privileged users only. This is sufficient for
some use cases, but not all. A later patch will relax this restriction,
but it will require introducing key hashes, among other changes.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
|
|
|
static inline int fscrypt_drop_inode(struct inode *inode)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/* fname.c */
|
|
|
|
static inline int fscrypt_setup_filename(struct inode *dir,
|
|
|
|
const struct qstr *iname,
|
|
|
|
int lookup, struct fscrypt_name *fname)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(dir))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
memset(fname, 0, sizeof(*fname));
|
2018-12-12 17:50:12 +08:00
|
|
|
fname->usr_fname = iname;
|
|
|
|
fname->disk_name.name = (unsigned char *)iname->name;
|
|
|
|
fname->disk_name.len = iname->len;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fscrypt_free_filename(struct fscrypt_name *fname)
|
|
|
|
{
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2020-08-10 22:21:39 +08:00
|
|
|
static inline int fscrypt_fname_alloc_buffer(u32 max_encrypted_len,
|
2018-12-12 17:50:12 +08:00
|
|
|
struct fscrypt_str *crypto_str)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str)
|
|
|
|
{
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2019-12-16 05:39:47 +08:00
|
|
|
static inline int fscrypt_fname_disk_to_usr(const struct inode *inode,
|
2018-12-12 17:50:12 +08:00
|
|
|
u32 hash, u32 minor_hash,
|
|
|
|
const struct fscrypt_str *iname,
|
|
|
|
struct fscrypt_str *oname)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool fscrypt_match_name(const struct fscrypt_name *fname,
|
|
|
|
const u8 *de_name, u32 de_name_len)
|
|
|
|
{
|
|
|
|
/* Encryption support disabled; use standard comparison */
|
|
|
|
if (de_name_len != fname->disk_name.len)
|
|
|
|
return false;
|
|
|
|
return !memcmp(de_name, fname->disk_name.name, fname->disk_name.len);
|
|
|
|
}
|
|
|
|
|
fscrypt: derive dirhash key for casefolded directories
When we allow indexed directories to use both encryption and
casefolding, for the dirhash we can't just hash the ciphertext filenames
that are stored on-disk (as is done currently) because the dirhash must
be case insensitive, but the stored names are case-preserving. Nor can
we hash the plaintext names with an unkeyed hash (or a hash keyed with a
value stored on-disk like ext4's s_hash_seed), since that would leak
information about the names that encryption is meant to protect.
Instead, if we can accept a dirhash that's only computable when the
fscrypt key is available, we can hash the plaintext names with a keyed
hash using a secret key derived from the directory's fscrypt master key.
We'll use SipHash-2-4 for this purpose.
Prepare for this by deriving a SipHash key for each casefolded encrypted
directory. Make sure to handle deriving the key not only when setting
up the directory's fscrypt_info, but also in the case where the casefold
flag is enabled after the fscrypt_info was already set up. (We could
just always derive the key regardless of casefolding, but that would
introduce unnecessary overhead for people not using casefolding.)
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[EB: improved commit message, updated fscrypt.rst, squashed with change
that avoids unnecessarily deriving the key, and many other cleanups]
Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-21 06:31:57 +08:00
|
|
|
static inline u64 fscrypt_fname_siphash(const struct inode *dir,
|
|
|
|
const struct qstr *name)
|
|
|
|
{
|
|
|
|
WARN_ON_ONCE(1);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-09-24 13:47:21 +08:00
|
|
|
static inline int fscrypt_d_revalidate(struct dentry *dentry,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
/* bio.c */
|
|
|
|
static inline void fscrypt_decrypt_bio(struct bio *bio)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
|
|
|
|
sector_t pblk, unsigned int len)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* hooks.c */
|
|
|
|
|
|
|
|
static inline int fscrypt_file_open(struct inode *inode, struct file *filp)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(inode))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-03-21 02:39:10 +08:00
|
|
|
static inline int __fscrypt_prepare_link(struct inode *inode, struct inode *dir,
|
|
|
|
struct dentry *dentry)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int __fscrypt_prepare_rename(struct inode *old_dir,
|
|
|
|
struct dentry *old_dentry,
|
|
|
|
struct inode *new_dir,
|
|
|
|
struct dentry *new_dentry,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int __fscrypt_prepare_lookup(struct inode *dir,
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
struct dentry *dentry,
|
|
|
|
struct fscrypt_name *fname)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2020-12-03 10:20:37 +08:00
|
|
|
static inline int __fscrypt_prepare_readdir(struct inode *dir)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2020-12-03 10:20:38 +08:00
|
|
|
static inline int __fscrypt_prepare_setattr(struct dentry *dentry,
|
|
|
|
struct iattr *attr)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2020-01-21 06:31:56 +08:00
|
|
|
static inline int fscrypt_prepare_setflags(struct inode *inode,
|
|
|
|
unsigned int oldflags,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-09-17 12:11:34 +08:00
|
|
|
static inline int fscrypt_prepare_symlink(struct inode *dir,
|
|
|
|
const char *target,
|
|
|
|
unsigned int len,
|
|
|
|
unsigned int max_len,
|
|
|
|
struct fscrypt_str *disk_link)
|
2018-12-12 17:50:12 +08:00
|
|
|
{
|
2020-09-17 12:11:34 +08:00
|
|
|
if (IS_ENCRYPTED(dir))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
disk_link->name = (unsigned char *)target;
|
|
|
|
disk_link->len = len + 1;
|
|
|
|
if (disk_link->len > max_len)
|
|
|
|
return -ENAMETOOLONG;
|
|
|
|
return 0;
|
2018-12-12 17:50:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int __fscrypt_encrypt_symlink(struct inode *inode,
|
|
|
|
const char *target,
|
|
|
|
unsigned int len,
|
|
|
|
struct fscrypt_str *disk_link)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline const char *fscrypt_get_symlink(struct inode *inode,
|
|
|
|
const void *caddr,
|
|
|
|
unsigned int max_size,
|
|
|
|
struct delayed_call *done)
|
|
|
|
{
|
|
|
|
return ERR_PTR(-EOPNOTSUPP);
|
|
|
|
}
|
2019-03-26 15:52:31 +08:00
|
|
|
|
fscrypt: add fscrypt_symlink_getattr() for computing st_size
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682
https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-07-02 14:53:46 +08:00
|
|
|
static inline int fscrypt_symlink_getattr(const struct path *path,
|
|
|
|
struct kstat *stat)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2019-03-26 15:52:31 +08:00
|
|
|
static inline void fscrypt_set_ops(struct super_block *sb,
|
|
|
|
const struct fscrypt_operations *s_cop)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2018-12-12 17:50:12 +08:00
|
|
|
#endif /* !CONFIG_FS_ENCRYPTION */
|
2017-10-10 03:15:34 +08:00
|
|
|
|
2020-07-02 09:56:05 +08:00
|
|
|
/* inline_crypt.c */
|
|
|
|
#ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
|
|
|
|
|
|
|
|
bool __fscrypt_inode_uses_inline_crypto(const struct inode *inode);
|
|
|
|
|
|
|
|
void fscrypt_set_bio_crypt_ctx(struct bio *bio,
|
|
|
|
const struct inode *inode, u64 first_lblk,
|
|
|
|
gfp_t gfp_mask);
|
|
|
|
|
|
|
|
void fscrypt_set_bio_crypt_ctx_bh(struct bio *bio,
|
|
|
|
const struct buffer_head *first_bh,
|
|
|
|
gfp_t gfp_mask);
|
|
|
|
|
|
|
|
bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
|
|
|
|
u64 next_lblk);
|
|
|
|
|
|
|
|
bool fscrypt_mergeable_bio_bh(struct bio *bio,
|
|
|
|
const struct buffer_head *next_bh);
|
|
|
|
|
|
|
|
#else /* CONFIG_FS_ENCRYPTION_INLINE_CRYPT */
|
|
|
|
|
|
|
|
static inline bool __fscrypt_inode_uses_inline_crypto(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void fscrypt_set_bio_crypt_ctx(struct bio *bio,
|
|
|
|
const struct inode *inode,
|
|
|
|
u64 first_lblk, gfp_t gfp_mask) { }
|
|
|
|
|
|
|
|
static inline void fscrypt_set_bio_crypt_ctx_bh(
|
|
|
|
struct bio *bio,
|
|
|
|
const struct buffer_head *first_bh,
|
|
|
|
gfp_t gfp_mask) { }
|
|
|
|
|
|
|
|
static inline bool fscrypt_mergeable_bio(struct bio *bio,
|
|
|
|
const struct inode *inode,
|
|
|
|
u64 next_lblk)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool fscrypt_mergeable_bio_bh(struct bio *bio,
|
|
|
|
const struct buffer_head *next_bh)
|
|
|
|
{
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
#endif /* !CONFIG_FS_ENCRYPTION_INLINE_CRYPT */
|
|
|
|
|
|
|
|
/**
|
|
|
|
* fscrypt_inode_uses_inline_crypto() - test whether an inode uses inline
|
|
|
|
* encryption
|
|
|
|
* @inode: an inode. If encrypted, its key must be set up.
|
|
|
|
*
|
|
|
|
* Return: true if the inode requires file contents encryption and if the
|
|
|
|
* encryption should be done in the block layer via blk-crypto rather
|
|
|
|
* than in the filesystem layer.
|
|
|
|
*/
|
|
|
|
static inline bool fscrypt_inode_uses_inline_crypto(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return fscrypt_needs_contents_encryption(inode) &&
|
|
|
|
__fscrypt_inode_uses_inline_crypto(inode);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* fscrypt_inode_uses_fs_layer_crypto() - test whether an inode uses fs-layer
|
|
|
|
* encryption
|
|
|
|
* @inode: an inode. If encrypted, its key must be set up.
|
|
|
|
*
|
|
|
|
* Return: true if the inode requires file contents encryption and if the
|
|
|
|
* encryption should be done in the filesystem layer rather than in the
|
|
|
|
* block layer via blk-crypto.
|
|
|
|
*/
|
|
|
|
static inline bool fscrypt_inode_uses_fs_layer_crypto(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return fscrypt_needs_contents_encryption(inode) &&
|
|
|
|
!__fscrypt_inode_uses_inline_crypto(inode);
|
|
|
|
}
|
|
|
|
|
2020-07-22 06:59:19 +08:00
|
|
|
/**
|
|
|
|
* fscrypt_has_encryption_key() - check whether an inode has had its key set up
|
|
|
|
* @inode: the inode to check
|
|
|
|
*
|
|
|
|
* Return: %true if the inode has had its encryption key set up, else %false.
|
|
|
|
*
|
|
|
|
* Usually this should be preceded by fscrypt_get_encryption_info() to try to
|
|
|
|
* set up the key first.
|
|
|
|
*/
|
|
|
|
static inline bool fscrypt_has_encryption_key(const struct inode *inode)
|
|
|
|
{
|
|
|
|
return fscrypt_get_info(inode) != NULL;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:15:41 +08:00
|
|
|
/**
|
2020-05-12 03:13:56 +08:00
|
|
|
* fscrypt_prepare_link() - prepare to link an inode into a possibly-encrypted
|
|
|
|
* directory
|
2017-10-10 03:15:41 +08:00
|
|
|
* @old_dentry: an existing dentry for the inode being linked
|
|
|
|
* @dir: the target directory
|
|
|
|
* @dentry: negative dentry for the target filename
|
|
|
|
*
|
|
|
|
* A new link can only be added to an encrypted directory if the directory's
|
|
|
|
* encryption key is available --- since otherwise we'd have no way to encrypt
|
2020-11-18 15:56:09 +08:00
|
|
|
* the filename.
|
2017-10-10 03:15:41 +08:00
|
|
|
*
|
|
|
|
* We also verify that the link will not violate the constraint that all files
|
|
|
|
* in an encrypted directory tree use the same encryption policy.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, -ENOKEY if the directory's encryption key is missing,
|
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
Currently, trying to rename or link a regular file, directory, or
symlink into an encrypted directory fails with EPERM when the source
file is unencrypted or is encrypted with a different encryption policy,
and is on the same mountpoint. It is correct for the operation to fail,
but the choice of EPERM breaks tools like 'mv' that know to copy rather
than rename if they see EXDEV, but don't know what to do with EPERM.
Our original motivation for EPERM was to encourage users to securely
handle their data. Encrypting files by "moving" them into an encrypted
directory can be insecure because the unencrypted data may remain in
free space on disk, where it can later be recovered by an attacker.
It's much better to encrypt the data from the start, or at least try to
securely delete the source data e.g. using the 'shred' program.
However, the current behavior hasn't been effective at achieving its
goal because users tend to be confused, hack around it, and complain;
see e.g. https://github.com/google/fscrypt/issues/76. And in some cases
it's actually inconsistent or unnecessary. For example, 'mv'-ing files
between differently encrypted directories doesn't work even in cases
where it can be secure, such as when in userspace the same passphrase
protects both directories. Yet, you *can* already 'mv' unencrypted
files into an encrypted directory if the source files are on a different
mountpoint, even though doing so is often insecure.
There are probably better ways to teach users to securely handle their
files. For example, the 'fscrypt' userspace tool could provide a
command that migrates unencrypted files into an encrypted directory,
acting like 'shred' on the source files and providing appropriate
warnings depending on the type of the source filesystem and disk.
Receiving errors on unimportant files might also force some users to
disable encryption, thus making the behavior counterproductive. It's
desirable to make encryption as unobtrusive as possible.
Therefore, change the error code from EPERM to EXDEV so that tools
looking for EXDEV will fall back to a copy.
This, of course, doesn't prevent users from still doing the right things
to securely manage their files. Note that this also matches the
behavior when a file is renamed between two project quota hierarchies;
so there's precedent for using EXDEV for things other than mountpoints.
xfstests generic/398 will require an update with this change.
[Rewritten from an earlier patch series by Michael Halcrow.]
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Joe Richey <joerichey@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 08:20:21 +08:00
|
|
|
* -EXDEV if the link would result in an inconsistent encryption policy, or
|
2017-10-10 03:15:41 +08:00
|
|
|
* another -errno code.
|
|
|
|
*/
|
|
|
|
static inline int fscrypt_prepare_link(struct dentry *old_dentry,
|
|
|
|
struct inode *dir,
|
|
|
|
struct dentry *dentry)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(dir))
|
2019-03-21 02:39:10 +08:00
|
|
|
return __fscrypt_prepare_link(d_inode(old_dentry), dir, dentry);
|
2017-10-10 03:15:41 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:15:42 +08:00
|
|
|
/**
|
2020-05-12 03:13:56 +08:00
|
|
|
* fscrypt_prepare_rename() - prepare for a rename between possibly-encrypted
|
|
|
|
* directories
|
2017-10-10 03:15:42 +08:00
|
|
|
* @old_dir: source directory
|
|
|
|
* @old_dentry: dentry for source file
|
|
|
|
* @new_dir: target directory
|
|
|
|
* @new_dentry: dentry for target location (may be negative unless exchanging)
|
|
|
|
* @flags: rename flags (we care at least about %RENAME_EXCHANGE)
|
|
|
|
*
|
|
|
|
* Prepare for ->rename() where the source and/or target directories may be
|
|
|
|
* encrypted. A new link can only be added to an encrypted directory if the
|
|
|
|
* directory's encryption key is available --- since otherwise we'd have no way
|
|
|
|
* to encrypt the filename. A rename to an existing name, on the other hand,
|
|
|
|
* *is* cryptographically possible without the key. However, we take the more
|
|
|
|
* conservative approach and just forbid all no-key renames.
|
|
|
|
*
|
|
|
|
* We also verify that the rename will not violate the constraint that all files
|
|
|
|
* in an encrypted directory tree use the same encryption policy.
|
|
|
|
*
|
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
Currently, trying to rename or link a regular file, directory, or
symlink into an encrypted directory fails with EPERM when the source
file is unencrypted or is encrypted with a different encryption policy,
and is on the same mountpoint. It is correct for the operation to fail,
but the choice of EPERM breaks tools like 'mv' that know to copy rather
than rename if they see EXDEV, but don't know what to do with EPERM.
Our original motivation for EPERM was to encourage users to securely
handle their data. Encrypting files by "moving" them into an encrypted
directory can be insecure because the unencrypted data may remain in
free space on disk, where it can later be recovered by an attacker.
It's much better to encrypt the data from the start, or at least try to
securely delete the source data e.g. using the 'shred' program.
However, the current behavior hasn't been effective at achieving its
goal because users tend to be confused, hack around it, and complain;
see e.g. https://github.com/google/fscrypt/issues/76. And in some cases
it's actually inconsistent or unnecessary. For example, 'mv'-ing files
between differently encrypted directories doesn't work even in cases
where it can be secure, such as when in userspace the same passphrase
protects both directories. Yet, you *can* already 'mv' unencrypted
files into an encrypted directory if the source files are on a different
mountpoint, even though doing so is often insecure.
There are probably better ways to teach users to securely handle their
files. For example, the 'fscrypt' userspace tool could provide a
command that migrates unencrypted files into an encrypted directory,
acting like 'shred' on the source files and providing appropriate
warnings depending on the type of the source filesystem and disk.
Receiving errors on unimportant files might also force some users to
disable encryption, thus making the behavior counterproductive. It's
desirable to make encryption as unobtrusive as possible.
Therefore, change the error code from EPERM to EXDEV so that tools
looking for EXDEV will fall back to a copy.
This, of course, doesn't prevent users from still doing the right things
to securely manage their files. Note that this also matches the
behavior when a file is renamed between two project quota hierarchies;
so there's precedent for using EXDEV for things other than mountpoints.
xfstests generic/398 will require an update with this change.
[Rewritten from an earlier patch series by Michael Halcrow.]
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Joe Richey <joerichey@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 08:20:21 +08:00
|
|
|
* Return: 0 on success, -ENOKEY if an encryption key is missing, -EXDEV if the
|
2017-10-10 03:15:42 +08:00
|
|
|
* rename would cause inconsistent encryption policies, or another -errno code.
|
|
|
|
*/
|
|
|
|
static inline int fscrypt_prepare_rename(struct inode *old_dir,
|
|
|
|
struct dentry *old_dentry,
|
|
|
|
struct inode *new_dir,
|
|
|
|
struct dentry *new_dentry,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(old_dir) || IS_ENCRYPTED(new_dir))
|
|
|
|
return __fscrypt_prepare_rename(old_dir, old_dentry,
|
|
|
|
new_dir, new_dentry, flags);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:15:43 +08:00
|
|
|
/**
|
2020-05-12 03:13:56 +08:00
|
|
|
* fscrypt_prepare_lookup() - prepare to lookup a name in a possibly-encrypted
|
|
|
|
* directory
|
2017-10-10 03:15:43 +08:00
|
|
|
* @dir: directory being searched
|
|
|
|
* @dentry: filename being looked up
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
* @fname: (output) the name to use to search the on-disk directory
|
2017-10-10 03:15:43 +08:00
|
|
|
*
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
* Prepare for ->lookup() in a directory which may be encrypted by determining
|
2020-09-24 12:26:23 +08:00
|
|
|
* the name that will actually be used to search the directory on-disk. If the
|
fscrypt: allow deleting files with unsupported encryption policy
Currently it's impossible to delete files that use an unsupported
encryption policy, as the kernel will just return an error when
performing any operation on the top-level encrypted directory, even just
a path lookup into the directory or opening the directory for readdir.
More specifically, this occurs in any of the following cases:
- The encryption context has an unrecognized version number. Current
kernels know about v1 and v2, but there could be more versions in the
future.
- The encryption context has unrecognized encryption modes
(FSCRYPT_MODE_*) or flags (FSCRYPT_POLICY_FLAG_*), an unrecognized
combination of modes, or reserved bits set.
- The encryption key has been added and the encryption modes are
recognized but aren't available in the crypto API -- for example, a
directory is encrypted with FSCRYPT_MODE_ADIANTUM but the kernel
doesn't have CONFIG_CRYPTO_ADIANTUM enabled.
It's desirable to return errors for most operations on files that use an
unsupported encryption policy, but the current behavior is too strict.
We need to allow enough to delete files, so that people can't be stuck
with undeletable files when downgrading kernel versions. That includes
allowing directories to be listed and allowing dentries to be looked up.
Fix this by modifying the key setup logic to treat an unsupported
encryption policy in the same way as "key unavailable" in the cases that
are required for a recursive delete to work: preparing for a readdir or
a dentry lookup, revalidating a dentry, or checking whether an inode has
the same encryption policy as its parent directory.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201203022041.230976-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-12-03 10:20:41 +08:00
|
|
|
* directory's encryption policy is supported by this kernel and its encryption
|
|
|
|
* key is available, then the lookup is assumed to be by plaintext name;
|
|
|
|
* otherwise, it is assumed to be by no-key name.
|
2017-10-10 03:15:43 +08:00
|
|
|
*
|
2020-11-19 14:09:03 +08:00
|
|
|
* This will set DCACHE_NOKEY_NAME on the dentry if the lookup is by no-key
|
|
|
|
* name. In this case the filesystem must assign the dentry a dentry_operations
|
|
|
|
* which contains fscrypt_d_revalidate (or contains a d_revalidate method that
|
|
|
|
* calls fscrypt_d_revalidate), so that the dentry will be invalidated if the
|
|
|
|
* directory's encryption key is later added.
|
2017-10-10 03:15:43 +08:00
|
|
|
*
|
2020-09-24 12:26:23 +08:00
|
|
|
* Return: 0 on success; -ENOENT if the directory's key is unavailable but the
|
|
|
|
* filename isn't a valid no-key name, so a negative dentry should be created;
|
|
|
|
* or another -errno code.
|
2017-10-10 03:15:43 +08:00
|
|
|
*/
|
|
|
|
static inline int fscrypt_prepare_lookup(struct inode *dir,
|
|
|
|
struct dentry *dentry,
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
struct fscrypt_name *fname)
|
2017-10-10 03:15:43 +08:00
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(dir))
|
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
|
|
|
return __fscrypt_prepare_lookup(dir, dentry, fname);
|
|
|
|
|
|
|
|
memset(fname, 0, sizeof(*fname));
|
|
|
|
fname->usr_fname = &dentry->d_name;
|
|
|
|
fname->disk_name.name = (unsigned char *)dentry->d_name.name;
|
|
|
|
fname->disk_name.len = dentry->d_name.len;
|
2017-10-10 03:15:43 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-12-03 10:20:37 +08:00
|
|
|
/**
|
|
|
|
* fscrypt_prepare_readdir() - prepare to read a possibly-encrypted directory
|
|
|
|
* @dir: the directory inode
|
|
|
|
*
|
|
|
|
* If the directory is encrypted and it doesn't already have its encryption key
|
|
|
|
* set up, try to set it up so that the filenames will be listed in plaintext
|
|
|
|
* form rather than in no-key form.
|
|
|
|
*
|
|
|
|
* Return: 0 on success; -errno on error. Note that the encryption key being
|
fscrypt: allow deleting files with unsupported encryption policy
Currently it's impossible to delete files that use an unsupported
encryption policy, as the kernel will just return an error when
performing any operation on the top-level encrypted directory, even just
a path lookup into the directory or opening the directory for readdir.
More specifically, this occurs in any of the following cases:
- The encryption context has an unrecognized version number. Current
kernels know about v1 and v2, but there could be more versions in the
future.
- The encryption context has unrecognized encryption modes
(FSCRYPT_MODE_*) or flags (FSCRYPT_POLICY_FLAG_*), an unrecognized
combination of modes, or reserved bits set.
- The encryption key has been added and the encryption modes are
recognized but aren't available in the crypto API -- for example, a
directory is encrypted with FSCRYPT_MODE_ADIANTUM but the kernel
doesn't have CONFIG_CRYPTO_ADIANTUM enabled.
It's desirable to return errors for most operations on files that use an
unsupported encryption policy, but the current behavior is too strict.
We need to allow enough to delete files, so that people can't be stuck
with undeletable files when downgrading kernel versions. That includes
allowing directories to be listed and allowing dentries to be looked up.
Fix this by modifying the key setup logic to treat an unsupported
encryption policy in the same way as "key unavailable" in the cases that
are required for a recursive delete to work: preparing for a readdir or
a dentry lookup, revalidating a dentry, or checking whether an inode has
the same encryption policy as its parent directory.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201203022041.230976-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-12-03 10:20:41 +08:00
|
|
|
* unavailable is not considered an error. It is also not an error if
|
|
|
|
* the encryption policy is unsupported by this kernel; that is treated
|
|
|
|
* like the key being unavailable, so that files can still be deleted.
|
2020-12-03 10:20:37 +08:00
|
|
|
*/
|
|
|
|
static inline int fscrypt_prepare_readdir(struct inode *dir)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(dir))
|
|
|
|
return __fscrypt_prepare_readdir(dir);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:15:44 +08:00
|
|
|
/**
|
2020-05-12 03:13:56 +08:00
|
|
|
* fscrypt_prepare_setattr() - prepare to change a possibly-encrypted inode's
|
|
|
|
* attributes
|
2017-10-10 03:15:44 +08:00
|
|
|
* @dentry: dentry through which the inode is being changed
|
|
|
|
* @attr: attributes to change
|
|
|
|
*
|
|
|
|
* Prepare for ->setattr() on a possibly-encrypted inode. On an encrypted file,
|
|
|
|
* most attribute changes are allowed even without the encryption key. However,
|
|
|
|
* without the encryption key we do have to forbid truncates. This is needed
|
|
|
|
* because the size being truncated to may not be a multiple of the filesystem
|
|
|
|
* block size, and in that case we'd have to decrypt the final block, zero the
|
|
|
|
* portion past i_size, and re-encrypt it. (We *could* allow truncating to a
|
|
|
|
* filesystem block boundary, but it's simpler to just forbid all truncates ---
|
|
|
|
* and we already forbid all other contents modifications without the key.)
|
|
|
|
*
|
|
|
|
* Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
|
|
|
|
* if a problem occurred while setting up the encryption key.
|
|
|
|
*/
|
|
|
|
static inline int fscrypt_prepare_setattr(struct dentry *dentry,
|
|
|
|
struct iattr *attr)
|
|
|
|
{
|
2020-12-03 10:20:38 +08:00
|
|
|
if (IS_ENCRYPTED(d_inode(dentry)))
|
|
|
|
return __fscrypt_prepare_setattr(dentry, attr);
|
2017-10-10 03:15:44 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
fscrypt: new helper functions for ->symlink()
Currently, filesystems supporting fscrypt need to implement some tricky
logic when creating encrypted symlinks, including handling a peculiar
on-disk format (struct fscrypt_symlink_data) and correctly calculating
the size of the encrypted symlink. Introduce helper functions to make
things a bit easier:
- fscrypt_prepare_symlink() computes and validates the size the symlink
target will require on-disk.
- fscrypt_encrypt_symlink() creates the encrypted target if needed.
The new helpers actually fix some subtle bugs. First, when checking
whether the symlink target was too long, filesystems didn't account for
the fact that the NUL padding is meant to be truncated if it would cause
the maximum length to be exceeded, as is done for filenames in
directories. Consequently users would receive ENAMETOOLONG when
creating symlinks close to what is supposed to be the maximum length.
For example, with EXT4 with a 4K block size, the maximum symlink target
length in an encrypted directory is supposed to be 4093 bytes (in
comparison to 4095 in an unencrypted directory), but in
FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted.
Second, symlink targets of "." and ".." were not being encrypted, even
though they should be, as these names are special in *directory entries*
but not in symlink targets. Fortunately, we can fix this simply by
starting to encrypt them, as old kernels already accept them in
encrypted form.
Third, the output string length the filesystems were providing when
doing the actual encryption was incorrect, as it was forgotten to
exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this
bug didn't make a difference.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-06 02:45:01 +08:00
|
|
|
/**
|
2020-05-12 03:13:56 +08:00
|
|
|
* fscrypt_encrypt_symlink() - encrypt the symlink target if needed
|
fscrypt: new helper functions for ->symlink()
Currently, filesystems supporting fscrypt need to implement some tricky
logic when creating encrypted symlinks, including handling a peculiar
on-disk format (struct fscrypt_symlink_data) and correctly calculating
the size of the encrypted symlink. Introduce helper functions to make
things a bit easier:
- fscrypt_prepare_symlink() computes and validates the size the symlink
target will require on-disk.
- fscrypt_encrypt_symlink() creates the encrypted target if needed.
The new helpers actually fix some subtle bugs. First, when checking
whether the symlink target was too long, filesystems didn't account for
the fact that the NUL padding is meant to be truncated if it would cause
the maximum length to be exceeded, as is done for filenames in
directories. Consequently users would receive ENAMETOOLONG when
creating symlinks close to what is supposed to be the maximum length.
For example, with EXT4 with a 4K block size, the maximum symlink target
length in an encrypted directory is supposed to be 4093 bytes (in
comparison to 4095 in an unencrypted directory), but in
FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted.
Second, symlink targets of "." and ".." were not being encrypted, even
though they should be, as these names are special in *directory entries*
but not in symlink targets. Fortunately, we can fix this simply by
starting to encrypt them, as old kernels already accept them in
encrypted form.
Third, the output string length the filesystems were providing when
doing the actual encryption was incorrect, as it was forgotten to
exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this
bug didn't make a difference.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-06 02:45:01 +08:00
|
|
|
* @inode: symlink inode
|
|
|
|
* @target: plaintext symlink target
|
|
|
|
* @len: length of @target excluding null terminator
|
|
|
|
* @disk_link: (in/out) the on-disk symlink target being prepared
|
|
|
|
*
|
|
|
|
* If the symlink target needs to be encrypted, then this function encrypts it
|
|
|
|
* into @disk_link->name. fscrypt_prepare_symlink() must have been called
|
|
|
|
* previously to compute @disk_link->len. If the filesystem did not allocate a
|
|
|
|
* buffer for @disk_link->name after calling fscrypt_prepare_link(), then one
|
|
|
|
* will be kmalloc()'ed and the filesystem will be responsible for freeing it.
|
|
|
|
*
|
|
|
|
* Return: 0 on success, -errno on failure
|
|
|
|
*/
|
|
|
|
static inline int fscrypt_encrypt_symlink(struct inode *inode,
|
|
|
|
const char *target,
|
|
|
|
unsigned int len,
|
|
|
|
struct fscrypt_str *disk_link)
|
|
|
|
{
|
|
|
|
if (IS_ENCRYPTED(inode))
|
|
|
|
return __fscrypt_encrypt_symlink(inode, target, len, disk_link);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-05-21 00:29:39 +08:00
|
|
|
/* If *pagep is a bounce page, free it and set *pagep to the pagecache page */
|
|
|
|
static inline void fscrypt_finalize_bounce_page(struct page **pagep)
|
|
|
|
{
|
|
|
|
struct page *page = *pagep;
|
|
|
|
|
|
|
|
if (fscrypt_is_bounce_page(page)) {
|
|
|
|
*pagep = fscrypt_pagecache_page(page);
|
|
|
|
fscrypt_free_bounce_page(page);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:15:34 +08:00
|
|
|
#endif /* _LINUX_FSCRYPT_H */
|