OpenCloudOS-Kernel/include/linux/fscrypt.h

735 lines
22 KiB
C
Raw Normal View History

fscrypt: lots of cleanups, mostly courtesy by Eric Biggers -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAloI8AUACgkQ8vlZVpUN gaMdjgf8CCW7UhPjoZYwF8sUNtAaX9+JZT1maOcXUhpJ3vRQiRn+AzRH6yBYMm79 +NZBwVlk4dlEe55Wh4yFIStMAstqzCrke4C9CSbExjgHNsJdU4znyYuLRMbLfyO0 6c4NObiAIKJdW1/te1aN90keGC6min8pBZot+FqZsRr+Kq2+IOtM43JAv7efOLev v3LCjUf9JKxatoB8tgw4AJRa1p18p7D2APWTG05VlFq63TjhVIYNvvwcQlizLwGY cuEq3X59FbFdX06fJnucujU3WP3ES4/3rhufBK4NNaec5e5dbnH2KlAx7J5SyMIZ 0qUFB/dmXDSb3gsfScSGo1F71Ad0CA== =asAm -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Lots of cleanups, mostly courtesy by Eric Biggers" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: lock mutex before checking for bounce page pool fscrypt: add a documentation file for filesystem-level encryption ext4: switch to fscrypt_prepare_setattr() ext4: switch to fscrypt_prepare_lookup() ext4: switch to fscrypt_prepare_rename() ext4: switch to fscrypt_prepare_link() ext4: switch to fscrypt_file_open() fscrypt: new helper function - fscrypt_prepare_setattr() fscrypt: new helper function - fscrypt_prepare_lookup() fscrypt: new helper function - fscrypt_prepare_rename() fscrypt: new helper function - fscrypt_prepare_link() fscrypt: new helper function - fscrypt_file_open() fscrypt: new helper function - fscrypt_require_key() fscrypt: remove unneeded empty fscrypt_operations structs fscrypt: remove ->is_encrypted() fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() fs, fscrypt: add an S_ENCRYPTED inode flag fscrypt: clean up include file mess
2017-11-15 03:35:15 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
/*
* fscrypt.h: declarations for per-file encryption
*
* Filesystems that implement per-file encryption must include this header
* file.
*
* Copyright (C) 2015, Google, Inc.
*
* Written by Michael Halcrow, 2015.
* Modified by Jaegeuk Kim, 2015.
*/
#ifndef _LINUX_FSCRYPT_H
#define _LINUX_FSCRYPT_H
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <uapi/linux/fscrypt.h>
#define FS_CRYPTO_BLOCK_SIZE 16
struct fscrypt_info;
struct fscrypt_str {
unsigned char *name;
u32 len;
};
struct fscrypt_name {
const struct qstr *usr_fname;
struct fscrypt_str disk_name;
u32 hash;
u32 minor_hash;
struct fscrypt_str crypto_buf;
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
bool is_ciphertext_name;
};
#define FSTR_INIT(n, l) { .name = n, .len = l }
#define FSTR_TO_QSTR(f) QSTR_INIT((f)->name, (f)->len)
#define fname_name(p) ((p)->disk_name.name)
#define fname_len(p) ((p)->disk_name.len)
/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
fscrypt: v2 encryption policy support Add a new fscrypt policy version, "v2". It has the following changes from the original policy version, which we call "v1" (*): - Master keys (the user-provided encryption keys) are only ever used as input to HKDF-SHA512. This is more flexible and less error-prone, and it avoids the quirks and limitations of the AES-128-ECB based KDF. Three classes of cryptographically isolated subkeys are defined: - Per-file keys, like used in v1 policies except for the new KDF. - Per-mode keys. These implement the semantics of the DIRECT_KEY flag, which for v1 policies made the master key be used directly. These are also planned to be used for inline encryption when support for it is added. - Key identifiers (see below). - Each master key is identified by a 16-byte master_key_identifier, which is derived from the key itself using HKDF-SHA512. This prevents users from associating the wrong key with an encrypted file or directory. This was easily possible with v1 policies, which identified the key by an arbitrary 8-byte master_key_descriptor. - The key must be provided in the filesystem-level keyring, not in a process-subscribed keyring. The following UAPI additions are made: - The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated from fscrypt_policy/fscrypt_policy_v1 by the version code prefix. - A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows getting the v1 or v2 encryption policy of an encrypted file or directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not be used because it did not have a way for userspace to indicate which policy structure is expected. The new ioctl includes a size field, so it is extensible to future fscrypt policy versions. - The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY, and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2 encryption policies. Such keys are kept logically separate from keys for v1 encryption policies, and are identified by 'identifier' rather than by 'descriptor'. The 'identifier' need not be provided when adding a key, since the kernel will calculate it anyway. This patch temporarily keeps adding/removing v2 policy keys behind the same permission check done for adding/removing v1 policy keys: capable(CAP_SYS_ADMIN). However, the next patch will carefully take advantage of the cryptographically secure master_key_identifier to allow non-root users to add/remove v2 policy keys, thus providing a full replacement for v1 policies. (*) Actually, in the API fscrypt_policy::version is 0 while on-disk fscrypt_context::format is 1. But I believe it makes the most sense to advance both to '2' to have them be in sync, and to consider the numbering to start at 1 except for the API quirk. Reviewed-by: Paul Crowley <paulcrowley@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:47 +08:00
#define FSCRYPT_SET_CONTEXT_MAX_SIZE 40
#ifdef CONFIG_FS_ENCRYPTION
/*
* fscrypt superblock flags
*/
#define FS_CFLG_OWN_PAGES (1U << 1)
/*
* crypto operations for filesystems
*/
struct fscrypt_operations {
unsigned int flags;
const char *key_prefix;
int (*get_context)(struct inode *inode, void *ctx, size_t len);
int (*set_context)(struct inode *inode, const void *ctx, size_t len,
void *fs_data);
bool (*dummy_context)(struct inode *inode);
bool (*empty_dir)(struct inode *inode);
unsigned int max_namelen;
fscrypt: add support for IV_INO_LBLK_64 policies Inline encryption hardware compliant with the UFS v2.1 standard or with the upcoming version of the eMMC standard has the following properties: (1) Per I/O request, the encryption key is specified by a previously loaded keyslot. There might be only a small number of keyslots. (2) Per I/O request, the starting IV is specified by a 64-bit "data unit number" (DUN). IV bits 64-127 are assumed to be 0. The hardware automatically increments the DUN for each "data unit" of configurable size in the request, e.g. for each filesystem block. Property (1) makes it inefficient to use the traditional fscrypt per-file keys. Property (2) precludes the use of the existing DIRECT_KEY fscrypt policy flag, which needs at least 192 IV bits. Therefore, add a new fscrypt policy flag IV_INO_LBLK_64 which causes the encryption to modified as follows: - The encryption keys are derived from the master key, encryption mode number, and filesystem UUID. - The IVs are chosen as (inode_number << 32) | file_logical_block_num. For filenames encryption, file_logical_block_num is 0. Since the file nonces aren't used in the key derivation, many files may share the same encryption key. This is much more efficient on the target hardware. Including the inode number in the IVs and mixing the filesystem UUID into the keys ensures that data in different files is nevertheless still encrypted differently. Additionally, limiting the inode and block numbers to 32 bits and placing the block number in the low bits maintains compatibility with the 64-bit DUN convention (property (2) above). Since this scheme assumes that inode numbers are stable (which may preclude filesystem shrinking) and that inode and file logical block numbers are at most 32-bit, IV_INO_LBLK_64 will only be allowed on filesystems that meet these constraints. These are acceptable limitations for the cases where this format would actually be used. Note that IV_INO_LBLK_64 is an on-disk format, not an implementation. This patch just adds support for it using the existing filesystem layer encryption. A later patch will add support for inline encryption. Reviewed-by: Paul Crowley <paulcrowley@google.com> Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-10-25 05:54:36 +08:00
bool (*has_stable_inodes)(struct super_block *sb);
void (*get_ino_and_lblk_bits)(struct super_block *sb,
int *ino_bits_ret, int *lblk_bits_ret);
};
static inline bool fscrypt_has_encryption_key(const struct inode *inode)
{
/* pairs with cmpxchg_release() in fscrypt_get_encryption_info() */
return READ_ONCE(inode->i_crypt_info) != NULL;
}
/**
* fscrypt_needs_contents_encryption() - check whether an inode needs
* contents encryption
* @inode: the inode to check
*
* Return: %true iff the inode is an encrypted regular file and the kernel was
* built with fscrypt support.
*
* If you need to know whether the encrypt bit is set even when the kernel was
* built without fscrypt support, you must use IS_ENCRYPTED() directly instead.
*/
static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
{
return IS_ENCRYPTED(inode) && S_ISREG(inode->i_mode);
}
static inline bool fscrypt_dummy_context_enabled(struct inode *inode)
{
return inode->i_sb->s_cop->dummy_context &&
inode->i_sb->s_cop->dummy_context(inode);
}
/*
* When d_splice_alias() moves a directory's encrypted alias to its decrypted
* alias as a result of the encryption key being added, DCACHE_ENCRYPTED_NAME
* must be cleared. Note that we don't have to support arbitrary moves of this
* flag because fscrypt doesn't allow encrypted aliases to be the source or
* target of a rename().
*/
static inline void fscrypt_handle_d_move(struct dentry *dentry)
{
dentry->d_flags &= ~DCACHE_ENCRYPTED_NAME;
}
/* crypto.c */
extern void fscrypt_enqueue_decrypt_work(struct work_struct *);
extern struct page *fscrypt_encrypt_pagecache_blocks(struct page *page,
unsigned int len,
unsigned int offs,
gfp_t gfp_flags);
extern int fscrypt_encrypt_block_inplace(const struct inode *inode,
struct page *page, unsigned int len,
unsigned int offs, u64 lblk_num,
gfp_t gfp_flags);
extern int fscrypt_decrypt_pagecache_blocks(struct page *page, unsigned int len,
unsigned int offs);
extern int fscrypt_decrypt_block_inplace(const struct inode *inode,
struct page *page, unsigned int len,
unsigned int offs, u64 lblk_num);
static inline bool fscrypt_is_bounce_page(struct page *page)
{
return page->mapping == NULL;
}
static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
{
return (struct page *)page_private(bounce_page);
}
extern void fscrypt_free_bounce_page(struct page *bounce_page);
/* policy.c */
extern int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg);
extern int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg);
extern int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg);
extern int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg);
extern int fscrypt_has_permitted_context(struct inode *parent,
struct inode *child);
extern int fscrypt_inherit_context(struct inode *parent, struct inode *child,
void *fs_data, bool preload);
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an encryption key to the filesystem's fscrypt keyring ->s_master_keys, making any files encrypted with that key appear "unlocked". Why we need this ~~~~~~~~~~~~~~~~ The main problem is that the "locked/unlocked" (ciphertext/plaintext) status of encrypted files is global, but the fscrypt keys are not. fscrypt only looks for keys in the keyring(s) the process accessing the filesystem is subscribed to: the thread keyring, process keyring, and session keyring, where the session keyring may contain the user keyring. Therefore, userspace has to put fscrypt keys in the keyrings for individual users or sessions. But this means that when a process with a different keyring tries to access encrypted files, whether they appear "unlocked" or not is nondeterministic. This is because it depends on whether the files are currently present in the inode cache. Fixing this by consistently providing each process its own view of the filesystem depending on whether it has the key or not isn't feasible due to how the VFS caches work. Furthermore, while sometimes users expect this behavior, it is misguided for two reasons. First, it would be an OS-level access control mechanism largely redundant with existing access control mechanisms such as UNIX file permissions, ACLs, LSMs, etc. Encryption is actually for protecting the data at rest. Second, almost all users of fscrypt actually do need the keys to be global. The largest users of fscrypt, Android and Chromium OS, achieve this by having PID 1 create a "session keyring" that is inherited by every process. This works, but it isn't scalable because it prevents session keyrings from being used for any other purpose. On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't similarly abuse the session keyring, so to make 'sudo' work on all systems it has to link all the user keyrings into root's user keyring [2]. This is ugly and raises security concerns. Moreover it can't make the keys available to system services, such as sshd trying to access the user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to read certificates from the user's home directory (see [5]); or to Docker containers (see [6], [7]). By having an API to add a key to the *filesystem* we'll be able to fix the above bugs, remove userspace workarounds, and clearly express the intended semantics: the locked/unlocked status of an encrypted directory is global, and encryption is orthogonal to OS-level access control. Why not use the add_key() syscall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use an ioctl for this API rather than the existing add_key() system call because the ioctl gives us the flexibility needed to implement fscrypt-specific semantics that will be introduced in later patches: - Supporting key removal with the semantics such that the secret is removed immediately and any unused inodes using the key are evicted; also, the eviction of any in-use inodes can be retried. - Calculating a key-dependent cryptographic identifier and returning it to userspace. - Allowing keys to be added and removed by non-root users, but only keys for v2 encryption policies; and to prevent denial-of-service attacks, users can only remove keys they themselves have added, and a key is only really removed after all users who added it have removed it. Trying to shoehorn these semantics into the keyrings syscalls would be very difficult, whereas the ioctls make things much easier. However, to reuse code the implementation still uses the keyrings service internally. Thus we get lockless RCU-mode key lookups without having to re-implement it, and the keys automatically show up in /proc/keys for debugging purposes. References: [1] https://github.com/google/fscrypt [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939 [4] https://github.com/google/fscrypt/issues/116 [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715 [6] https://github.com/google/fscrypt/issues/128 [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
/* keyring.c */
extern void fscrypt_sb_free(struct super_block *sb);
extern int fscrypt_ioctl_add_key(struct file *filp, void __user *arg);
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to *really* remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key *secret* is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
extern int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg);
extern int fscrypt_ioctl_remove_key_all_users(struct file *filp,
void __user *arg);
fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl Add a new fscrypt ioctl, FS_IOC_GET_ENCRYPTION_KEY_STATUS. Given a key specified by 'struct fscrypt_key_specifier' (the same way a key is specified for the other fscrypt key management ioctls), it returns status information in a 'struct fscrypt_get_key_status_arg'. The main motivation for this is that applications need to be able to check whether an encrypted directory is "unlocked" or not, so that they can add the key if it is not, and avoid adding the key (which may involve prompting the user for a passphrase) if it already is. It's possible to use some workarounds such as checking whether opening a regular file fails with ENOKEY, or checking whether the filenames "look like gibberish" or not. However, no workaround is usable in all cases. Like the other key management ioctls, the keyrings syscalls may seem at first to be a good fit for this. Unfortunately, they are not. Even if we exposed the keyring ID of the ->s_master_keys keyring and gave everyone Search permission on it (note: currently the keyrings permission system would also allow everyone to "invalidate" the keyring too), the fscrypt keys have an additional state that doesn't map cleanly to the keyrings API: the secret can be removed, but we can be still tracking the files that were using the key, and the removal can be re-attempted or the secret added again. After later patches, some applications will also need a way to determine whether a key was added by the current user vs. by some other user. Reserved fields are included in fscrypt_get_key_status_arg for this and other future extensions. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
extern int fscrypt_ioctl_get_key_status(struct file *filp, void __user *arg);
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an encryption key to the filesystem's fscrypt keyring ->s_master_keys, making any files encrypted with that key appear "unlocked". Why we need this ~~~~~~~~~~~~~~~~ The main problem is that the "locked/unlocked" (ciphertext/plaintext) status of encrypted files is global, but the fscrypt keys are not. fscrypt only looks for keys in the keyring(s) the process accessing the filesystem is subscribed to: the thread keyring, process keyring, and session keyring, where the session keyring may contain the user keyring. Therefore, userspace has to put fscrypt keys in the keyrings for individual users or sessions. But this means that when a process with a different keyring tries to access encrypted files, whether they appear "unlocked" or not is nondeterministic. This is because it depends on whether the files are currently present in the inode cache. Fixing this by consistently providing each process its own view of the filesystem depending on whether it has the key or not isn't feasible due to how the VFS caches work. Furthermore, while sometimes users expect this behavior, it is misguided for two reasons. First, it would be an OS-level access control mechanism largely redundant with existing access control mechanisms such as UNIX file permissions, ACLs, LSMs, etc. Encryption is actually for protecting the data at rest. Second, almost all users of fscrypt actually do need the keys to be global. The largest users of fscrypt, Android and Chromium OS, achieve this by having PID 1 create a "session keyring" that is inherited by every process. This works, but it isn't scalable because it prevents session keyrings from being used for any other purpose. On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't similarly abuse the session keyring, so to make 'sudo' work on all systems it has to link all the user keyrings into root's user keyring [2]. This is ugly and raises security concerns. Moreover it can't make the keys available to system services, such as sshd trying to access the user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to read certificates from the user's home directory (see [5]); or to Docker containers (see [6], [7]). By having an API to add a key to the *filesystem* we'll be able to fix the above bugs, remove userspace workarounds, and clearly express the intended semantics: the locked/unlocked status of an encrypted directory is global, and encryption is orthogonal to OS-level access control. Why not use the add_key() syscall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use an ioctl for this API rather than the existing add_key() system call because the ioctl gives us the flexibility needed to implement fscrypt-specific semantics that will be introduced in later patches: - Supporting key removal with the semantics such that the secret is removed immediately and any unused inodes using the key are evicted; also, the eviction of any in-use inodes can be retried. - Calculating a key-dependent cryptographic identifier and returning it to userspace. - Allowing keys to be added and removed by non-root users, but only keys for v2 encryption policies; and to prevent denial-of-service attacks, users can only remove keys they themselves have added, and a key is only really removed after all users who added it have removed it. Trying to shoehorn these semantics into the keyrings syscalls would be very difficult, whereas the ioctls make things much easier. However, to reuse code the implementation still uses the keyrings service internally. Thus we get lockless RCU-mode key lookups without having to re-implement it, and the keys automatically show up in /proc/keys for debugging purposes. References: [1] https://github.com/google/fscrypt [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939 [4] https://github.com/google/fscrypt/issues/116 [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715 [6] https://github.com/google/fscrypt/issues/128 [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
/* keysetup.c */
extern int fscrypt_get_encryption_info(struct inode *inode);
extern void fscrypt_put_encryption_info(struct inode *inode);
extern void fscrypt_free_inode(struct inode *inode);
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to *really* remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key *secret* is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
extern int fscrypt_drop_inode(struct inode *inode);
/* fname.c */
extern int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname,
int lookup, struct fscrypt_name *fname);
static inline void fscrypt_free_filename(struct fscrypt_name *fname)
{
kfree(fname->crypto_buf.name);
}
extern int fscrypt_fname_alloc_buffer(const struct inode *inode,
u32 max_encrypted_len,
struct fscrypt_str *crypto_str);
extern void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str);
extern int fscrypt_fname_disk_to_usr(const struct inode *inode,
u32 hash, u32 minor_hash,
const struct fscrypt_str *iname,
struct fscrypt_str *oname);
fscrypt: improve format of no-key names When an encrypted directory is listed without the key, the filesystem must show "no-key names" that uniquely identify directory entries, are at most 255 (NAME_MAX) bytes long, and don't contain '/' or '\0'. Currently, for short names the no-key name is the base64 encoding of the ciphertext filename, while for long names it's the base64 encoding of the ciphertext filename's dirhash and second-to-last 16-byte block. This format has the following problems: - Since it doesn't always include the dirhash, it's incompatible with directories that will use a secret-keyed dirhash over the plaintext filenames. In this case, the dirhash won't be computable from the ciphertext name without the key, so it instead must be retrieved from the directory entry and always included in the no-key name. Casefolded encrypted directories will use this type of dirhash. - It's ambiguous: it's possible to craft two filenames that map to the same no-key name, since the method used to abbreviate long filenames doesn't use a proper cryptographic hash function. Solve both these problems by switching to a new no-key name format that is the base64 encoding of a variable-length structure that contains the dirhash, up to 149 bytes of the ciphertext filename, and (if any bytes remain) the SHA-256 of the remaining bytes of the ciphertext filename. This ensures that each no-key name contains everything needed to find the directory entry again, contains only legal characters, doesn't exceed NAME_MAX, is unambiguous unless there's a SHA-256 collision, and that we only take the performance hit of SHA-256 on very long filenames. Note: this change does *not* address the existing issue where users can modify the 'dirhash' part of a no-key name and the filesystem may still accept the name. Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved comments and commit message, fixed checking return value of base64_decode(), check for SHA-256 error, continue to set disk_name for short names to keep matching simpler, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-21 06:32:01 +08:00
extern bool fscrypt_match_name(const struct fscrypt_name *fname,
const u8 *de_name, u32 de_name_len);
fscrypt: derive dirhash key for casefolded directories When we allow indexed directories to use both encryption and casefolding, for the dirhash we can't just hash the ciphertext filenames that are stored on-disk (as is done currently) because the dirhash must be case insensitive, but the stored names are case-preserving. Nor can we hash the plaintext names with an unkeyed hash (or a hash keyed with a value stored on-disk like ext4's s_hash_seed), since that would leak information about the names that encryption is meant to protect. Instead, if we can accept a dirhash that's only computable when the fscrypt key is available, we can hash the plaintext names with a keyed hash using a secret key derived from the directory's fscrypt master key. We'll use SipHash-2-4 for this purpose. Prepare for this by deriving a SipHash key for each casefolded encrypted directory. Make sure to handle deriving the key not only when setting up the directory's fscrypt_info, but also in the case where the casefold flag is enabled after the fscrypt_info was already set up. (We could just always derive the key regardless of casefolding, but that would introduce unnecessary overhead for people not using casefolding.) Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved commit message, updated fscrypt.rst, squashed with change that avoids unnecessarily deriving the key, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-21 06:31:57 +08:00
extern u64 fscrypt_fname_siphash(const struct inode *dir,
const struct qstr *name);
/* bio.c */
extern void fscrypt_decrypt_bio(struct bio *bio);
extern int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
sector_t pblk, unsigned int len);
/* hooks.c */
extern int fscrypt_file_open(struct inode *inode, struct file *filp);
extern int __fscrypt_prepare_link(struct inode *inode, struct inode *dir,
struct dentry *dentry);
extern int __fscrypt_prepare_rename(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
struct dentry *new_dentry,
unsigned int flags);
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
extern int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry,
struct fscrypt_name *fname);
extern int fscrypt_prepare_setflags(struct inode *inode,
unsigned int oldflags, unsigned int flags);
extern int __fscrypt_prepare_symlink(struct inode *dir, unsigned int len,
unsigned int max_len,
struct fscrypt_str *disk_link);
extern int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
unsigned int len,
struct fscrypt_str *disk_link);
extern const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
unsigned int max_size,
struct delayed_call *done);
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
sb->s_cop = s_cop;
}
#else /* !CONFIG_FS_ENCRYPTION */
static inline bool fscrypt_has_encryption_key(const struct inode *inode)
{
return false;
}
static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
{
return false;
}
static inline bool fscrypt_dummy_context_enabled(struct inode *inode)
{
return false;
}
static inline void fscrypt_handle_d_move(struct dentry *dentry)
{
}
/* crypto.c */
static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
{
}
static inline struct page *fscrypt_encrypt_pagecache_blocks(struct page *page,
unsigned int len,
unsigned int offs,
gfp_t gfp_flags)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline int fscrypt_encrypt_block_inplace(const struct inode *inode,
struct page *page,
unsigned int len,
unsigned int offs, u64 lblk_num,
gfp_t gfp_flags)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_decrypt_pagecache_blocks(struct page *page,
unsigned int len,
unsigned int offs)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_decrypt_block_inplace(const struct inode *inode,
struct page *page,
unsigned int len,
unsigned int offs, u64 lblk_num)
{
return -EOPNOTSUPP;
}
static inline bool fscrypt_is_bounce_page(struct page *page)
{
return false;
}
static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
{
WARN_ON_ONCE(1);
return ERR_PTR(-EINVAL);
}
static inline void fscrypt_free_bounce_page(struct page *bounce_page)
{
}
/* policy.c */
static inline int fscrypt_ioctl_set_policy(struct file *filp,
const void __user *arg)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg)
{
return -EOPNOTSUPP;
}
fscrypt: v2 encryption policy support Add a new fscrypt policy version, "v2". It has the following changes from the original policy version, which we call "v1" (*): - Master keys (the user-provided encryption keys) are only ever used as input to HKDF-SHA512. This is more flexible and less error-prone, and it avoids the quirks and limitations of the AES-128-ECB based KDF. Three classes of cryptographically isolated subkeys are defined: - Per-file keys, like used in v1 policies except for the new KDF. - Per-mode keys. These implement the semantics of the DIRECT_KEY flag, which for v1 policies made the master key be used directly. These are also planned to be used for inline encryption when support for it is added. - Key identifiers (see below). - Each master key is identified by a 16-byte master_key_identifier, which is derived from the key itself using HKDF-SHA512. This prevents users from associating the wrong key with an encrypted file or directory. This was easily possible with v1 policies, which identified the key by an arbitrary 8-byte master_key_descriptor. - The key must be provided in the filesystem-level keyring, not in a process-subscribed keyring. The following UAPI additions are made: - The existing ioctl FS_IOC_SET_ENCRYPTION_POLICY can now be passed a fscrypt_policy_v2 to set a v2 encryption policy. It's disambiguated from fscrypt_policy/fscrypt_policy_v1 by the version code prefix. - A new ioctl FS_IOC_GET_ENCRYPTION_POLICY_EX is added. It allows getting the v1 or v2 encryption policy of an encrypted file or directory. The existing FS_IOC_GET_ENCRYPTION_POLICY ioctl could not be used because it did not have a way for userspace to indicate which policy structure is expected. The new ioctl includes a size field, so it is extensible to future fscrypt policy versions. - The ioctls FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY, and FS_IOC_GET_ENCRYPTION_KEY_STATUS now support managing keys for v2 encryption policies. Such keys are kept logically separate from keys for v1 encryption policies, and are identified by 'identifier' rather than by 'descriptor'. The 'identifier' need not be provided when adding a key, since the kernel will calculate it anyway. This patch temporarily keeps adding/removing v2 policy keys behind the same permission check done for adding/removing v1 policy keys: capable(CAP_SYS_ADMIN). However, the next patch will carefully take advantage of the cryptographically secure master_key_identifier to allow non-root users to add/remove v2 policy keys, thus providing a full replacement for v1 policies. (*) Actually, in the API fscrypt_policy::version is 0 while on-disk fscrypt_context::format is 1. But I believe it makes the most sense to advance both to '2' to have them be in sync, and to consider the numbering to start at 1 except for the API quirk. Reviewed-by: Paul Crowley <paulcrowley@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:47 +08:00
static inline int fscrypt_ioctl_get_policy_ex(struct file *filp,
void __user *arg)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_has_permitted_context(struct inode *parent,
struct inode *child)
{
return 0;
}
static inline int fscrypt_inherit_context(struct inode *parent,
struct inode *child,
void *fs_data, bool preload)
{
return -EOPNOTSUPP;
}
fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an encryption key to the filesystem's fscrypt keyring ->s_master_keys, making any files encrypted with that key appear "unlocked". Why we need this ~~~~~~~~~~~~~~~~ The main problem is that the "locked/unlocked" (ciphertext/plaintext) status of encrypted files is global, but the fscrypt keys are not. fscrypt only looks for keys in the keyring(s) the process accessing the filesystem is subscribed to: the thread keyring, process keyring, and session keyring, where the session keyring may contain the user keyring. Therefore, userspace has to put fscrypt keys in the keyrings for individual users or sessions. But this means that when a process with a different keyring tries to access encrypted files, whether they appear "unlocked" or not is nondeterministic. This is because it depends on whether the files are currently present in the inode cache. Fixing this by consistently providing each process its own view of the filesystem depending on whether it has the key or not isn't feasible due to how the VFS caches work. Furthermore, while sometimes users expect this behavior, it is misguided for two reasons. First, it would be an OS-level access control mechanism largely redundant with existing access control mechanisms such as UNIX file permissions, ACLs, LSMs, etc. Encryption is actually for protecting the data at rest. Second, almost all users of fscrypt actually do need the keys to be global. The largest users of fscrypt, Android and Chromium OS, achieve this by having PID 1 create a "session keyring" that is inherited by every process. This works, but it isn't scalable because it prevents session keyrings from being used for any other purpose. On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't similarly abuse the session keyring, so to make 'sudo' work on all systems it has to link all the user keyrings into root's user keyring [2]. This is ugly and raises security concerns. Moreover it can't make the keys available to system services, such as sshd trying to access the user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to read certificates from the user's home directory (see [5]); or to Docker containers (see [6], [7]). By having an API to add a key to the *filesystem* we'll be able to fix the above bugs, remove userspace workarounds, and clearly express the intended semantics: the locked/unlocked status of an encrypted directory is global, and encryption is orthogonal to OS-level access control. Why not use the add_key() syscall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use an ioctl for this API rather than the existing add_key() system call because the ioctl gives us the flexibility needed to implement fscrypt-specific semantics that will be introduced in later patches: - Supporting key removal with the semantics such that the secret is removed immediately and any unused inodes using the key are evicted; also, the eviction of any in-use inodes can be retried. - Calculating a key-dependent cryptographic identifier and returning it to userspace. - Allowing keys to be added and removed by non-root users, but only keys for v2 encryption policies; and to prevent denial-of-service attacks, users can only remove keys they themselves have added, and a key is only really removed after all users who added it have removed it. Trying to shoehorn these semantics into the keyrings syscalls would be very difficult, whereas the ioctls make things much easier. However, to reuse code the implementation still uses the keyrings service internally. Thus we get lockless RCU-mode key lookups without having to re-implement it, and the keys automatically show up in /proc/keys for debugging purposes. References: [1] https://github.com/google/fscrypt [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939 [4] https://github.com/google/fscrypt/issues/116 [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715 [6] https://github.com/google/fscrypt/issues/128 [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
/* keyring.c */
static inline void fscrypt_sb_free(struct super_block *sb)
{
}
static inline int fscrypt_ioctl_add_key(struct file *filp, void __user *arg)
{
return -EOPNOTSUPP;
}
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to *really* remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key *secret* is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
static inline int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_ioctl_remove_key_all_users(struct file *filp,
void __user *arg)
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to *really* remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key *secret* is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
{
return -EOPNOTSUPP;
}
fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl Add a new fscrypt ioctl, FS_IOC_GET_ENCRYPTION_KEY_STATUS. Given a key specified by 'struct fscrypt_key_specifier' (the same way a key is specified for the other fscrypt key management ioctls), it returns status information in a 'struct fscrypt_get_key_status_arg'. The main motivation for this is that applications need to be able to check whether an encrypted directory is "unlocked" or not, so that they can add the key if it is not, and avoid adding the key (which may involve prompting the user for a passphrase) if it already is. It's possible to use some workarounds such as checking whether opening a regular file fails with ENOKEY, or checking whether the filenames "look like gibberish" or not. However, no workaround is usable in all cases. Like the other key management ioctls, the keyrings syscalls may seem at first to be a good fit for this. Unfortunately, they are not. Even if we exposed the keyring ID of the ->s_master_keys keyring and gave everyone Search permission on it (note: currently the keyrings permission system would also allow everyone to "invalidate" the keyring too), the fscrypt keys have an additional state that doesn't map cleanly to the keyrings API: the secret can be removed, but we can be still tracking the files that were using the key, and the removal can be re-attempted or the secret added again. After later patches, some applications will also need a way to determine whether a key was added by the current user vs. by some other user. Reserved fields are included in fscrypt_get_key_status_arg for this and other future extensions. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
static inline int fscrypt_ioctl_get_key_status(struct file *filp,
void __user *arg)
{
return -EOPNOTSUPP;
}
/* keysetup.c */
static inline int fscrypt_get_encryption_info(struct inode *inode)
{
return -EOPNOTSUPP;
}
static inline void fscrypt_put_encryption_info(struct inode *inode)
{
return;
}
static inline void fscrypt_free_inode(struct inode *inode)
{
}
fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl Add a new fscrypt ioctl, FS_IOC_REMOVE_ENCRYPTION_KEY. This ioctl removes an encryption key that was added by FS_IOC_ADD_ENCRYPTION_KEY. It wipes the secret key itself, then "locks" the encrypted files and directories that had been unlocked using that key -- implemented by evicting the relevant dentries and inodes from the VFS caches. The problem this solves is that many fscrypt users want the ability to remove encryption keys, causing the corresponding encrypted directories to appear "locked" (presented in ciphertext form) again. Moreover, users want removing an encryption key to *really* remove it, in the sense that the removed keys cannot be recovered even if kernel memory is compromised, e.g. by the exploit of a kernel security vulnerability or by a physical attack. This is desirable after a user logs out of the system, for example. In many cases users even already assume this to be the case and are surprised to hear when it's not. It is not sufficient to simply unlink the master key from the keyring (or to revoke or invalidate it), since the actual encryption transform objects are still pinned in memory by their inodes. Therefore, to really remove a key we must also evict the relevant inodes. Currently one workaround is to run 'sync && echo 2 > /proc/sys/vm/drop_caches'. But, that evicts all unused inodes in the system rather than just the inodes associated with the key being removed, causing severe performance problems. Moreover, it requires root privileges, so regular users can't "lock" their encrypted files. Another workaround, used in Chromium OS kernels, is to add a new VFS-level ioctl FS_IOC_DROP_CACHE which is a more restricted version of drop_caches that operates on a single super_block. It does: shrink_dcache_sb(sb); invalidate_inodes(sb, false); But it's still a hack. Yet, the major users of filesystem encryption want this feature badly enough that they are actually using these hacks. To properly solve the problem, start maintaining a list of the inodes which have been "unlocked" using each master key. Originally this wasn't possible because the kernel didn't keep track of in-use master keys at all. But, with the ->s_master_keys keyring it is now possible. Then, add an ioctl FS_IOC_REMOVE_ENCRYPTION_KEY. It finds the specified master key in ->s_master_keys, then wipes the secret key itself, which prevents any additional inodes from being unlocked with the key. Then, it syncs the filesystem and evicts the inodes in the key's list. The normal inode eviction code will free and wipe the per-file keys (in ->i_crypt_info). Note that freeing ->i_crypt_info without evicting the inodes was also considered, but would have been racy. Some inodes may still be in use when a master key is removed, and we can't simply revoke random file descriptors, mmap's, etc. Thus, the ioctl simply skips in-use inodes, and returns -EBUSY to indicate that some inodes weren't evicted. The master key *secret* is still removed, but the fscrypt_master_key struct remains to keep track of the remaining inodes. Userspace can then retry the ioctl to evict the remaining inodes. Alternatively, if userspace adds the key again, the refreshed secret will be associated with the existing list of inodes so they remain correctly tracked for future key removals. The ioctl doesn't wipe pagecache pages. Thus, we tolerate that after a kernel compromise some portions of plaintext file contents may still be recoverable from memory. This can be solved by enabling page poisoning system-wide, which security conscious users may choose to do. But it's very difficult to solve otherwise, e.g. note that plaintext file contents may have been read in other places than pagecache pages. Like FS_IOC_ADD_ENCRYPTION_KEY, FS_IOC_REMOVE_ENCRYPTION_KEY is initially restricted to privileged users only. This is sufficient for some use cases, but not all. A later patch will relax this restriction, but it will require introducing key hashes, among other changes. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-05 10:35:46 +08:00
static inline int fscrypt_drop_inode(struct inode *inode)
{
return 0;
}
/* fname.c */
static inline int fscrypt_setup_filename(struct inode *dir,
const struct qstr *iname,
int lookup, struct fscrypt_name *fname)
{
if (IS_ENCRYPTED(dir))
return -EOPNOTSUPP;
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
memset(fname, 0, sizeof(*fname));
fname->usr_fname = iname;
fname->disk_name.name = (unsigned char *)iname->name;
fname->disk_name.len = iname->len;
return 0;
}
static inline void fscrypt_free_filename(struct fscrypt_name *fname)
{
return;
}
static inline int fscrypt_fname_alloc_buffer(const struct inode *inode,
u32 max_encrypted_len,
struct fscrypt_str *crypto_str)
{
return -EOPNOTSUPP;
}
static inline void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str)
{
return;
}
static inline int fscrypt_fname_disk_to_usr(const struct inode *inode,
u32 hash, u32 minor_hash,
const struct fscrypt_str *iname,
struct fscrypt_str *oname)
{
return -EOPNOTSUPP;
}
static inline bool fscrypt_match_name(const struct fscrypt_name *fname,
const u8 *de_name, u32 de_name_len)
{
/* Encryption support disabled; use standard comparison */
if (de_name_len != fname->disk_name.len)
return false;
return !memcmp(de_name, fname->disk_name.name, fname->disk_name.len);
}
fscrypt: derive dirhash key for casefolded directories When we allow indexed directories to use both encryption and casefolding, for the dirhash we can't just hash the ciphertext filenames that are stored on-disk (as is done currently) because the dirhash must be case insensitive, but the stored names are case-preserving. Nor can we hash the plaintext names with an unkeyed hash (or a hash keyed with a value stored on-disk like ext4's s_hash_seed), since that would leak information about the names that encryption is meant to protect. Instead, if we can accept a dirhash that's only computable when the fscrypt key is available, we can hash the plaintext names with a keyed hash using a secret key derived from the directory's fscrypt master key. We'll use SipHash-2-4 for this purpose. Prepare for this by deriving a SipHash key for each casefolded encrypted directory. Make sure to handle deriving the key not only when setting up the directory's fscrypt_info, but also in the case where the casefold flag is enabled after the fscrypt_info was already set up. (We could just always derive the key regardless of casefolding, but that would introduce unnecessary overhead for people not using casefolding.) Signed-off-by: Daniel Rosenberg <drosen@google.com> [EB: improved commit message, updated fscrypt.rst, squashed with change that avoids unnecessarily deriving the key, and many other cleanups] Link: https://lore.kernel.org/r/20200120223201.241390-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-21 06:31:57 +08:00
static inline u64 fscrypt_fname_siphash(const struct inode *dir,
const struct qstr *name)
{
WARN_ON_ONCE(1);
return 0;
}
/* bio.c */
static inline void fscrypt_decrypt_bio(struct bio *bio)
{
}
static inline int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk,
sector_t pblk, unsigned int len)
{
return -EOPNOTSUPP;
}
/* hooks.c */
static inline int fscrypt_file_open(struct inode *inode, struct file *filp)
{
if (IS_ENCRYPTED(inode))
return -EOPNOTSUPP;
return 0;
}
static inline int __fscrypt_prepare_link(struct inode *inode, struct inode *dir,
struct dentry *dentry)
{
return -EOPNOTSUPP;
}
static inline int __fscrypt_prepare_rename(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
struct dentry *new_dentry,
unsigned int flags)
{
return -EOPNOTSUPP;
}
static inline int __fscrypt_prepare_lookup(struct inode *dir,
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
struct dentry *dentry,
struct fscrypt_name *fname)
{
return -EOPNOTSUPP;
}
static inline int fscrypt_prepare_setflags(struct inode *inode,
unsigned int oldflags,
unsigned int flags)
{
return 0;
}
static inline int __fscrypt_prepare_symlink(struct inode *dir,
unsigned int len,
unsigned int max_len,
struct fscrypt_str *disk_link)
{
return -EOPNOTSUPP;
}
static inline int __fscrypt_encrypt_symlink(struct inode *inode,
const char *target,
unsigned int len,
struct fscrypt_str *disk_link)
{
return -EOPNOTSUPP;
}
static inline const char *fscrypt_get_symlink(struct inode *inode,
const void *caddr,
unsigned int max_size,
struct delayed_call *done)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline void fscrypt_set_ops(struct super_block *sb,
const struct fscrypt_operations *s_cop)
{
}
#endif /* !CONFIG_FS_ENCRYPTION */
/**
* fscrypt_require_key() - require an inode's encryption key
* @inode: the inode we need the key for
*
* If the inode is encrypted, set up its encryption key if not already done.
* Then require that the key be present and return -ENOKEY otherwise.
*
* No locks are needed, and the key will live as long as the struct inode --- so
* it won't go away from under you.
*
* Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
* if a problem occurred while setting up the encryption key.
*/
static inline int fscrypt_require_key(struct inode *inode)
{
if (IS_ENCRYPTED(inode)) {
int err = fscrypt_get_encryption_info(inode);
if (err)
return err;
if (!fscrypt_has_encryption_key(inode))
return -ENOKEY;
}
return 0;
}
/**
* fscrypt_prepare_link() - prepare to link an inode into a possibly-encrypted
* directory
* @old_dentry: an existing dentry for the inode being linked
* @dir: the target directory
* @dentry: negative dentry for the target filename
*
* A new link can only be added to an encrypted directory if the directory's
* encryption key is available --- since otherwise we'd have no way to encrypt
* the filename. Therefore, we first set up the directory's encryption key (if
* not already done) and return an error if it's unavailable.
*
* We also verify that the link will not violate the constraint that all files
* in an encrypted directory tree use the same encryption policy.
*
* Return: 0 on success, -ENOKEY if the directory's encryption key is missing,
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir Currently, trying to rename or link a regular file, directory, or symlink into an encrypted directory fails with EPERM when the source file is unencrypted or is encrypted with a different encryption policy, and is on the same mountpoint. It is correct for the operation to fail, but the choice of EPERM breaks tools like 'mv' that know to copy rather than rename if they see EXDEV, but don't know what to do with EPERM. Our original motivation for EPERM was to encourage users to securely handle their data. Encrypting files by "moving" them into an encrypted directory can be insecure because the unencrypted data may remain in free space on disk, where it can later be recovered by an attacker. It's much better to encrypt the data from the start, or at least try to securely delete the source data e.g. using the 'shred' program. However, the current behavior hasn't been effective at achieving its goal because users tend to be confused, hack around it, and complain; see e.g. https://github.com/google/fscrypt/issues/76. And in some cases it's actually inconsistent or unnecessary. For example, 'mv'-ing files between differently encrypted directories doesn't work even in cases where it can be secure, such as when in userspace the same passphrase protects both directories. Yet, you *can* already 'mv' unencrypted files into an encrypted directory if the source files are on a different mountpoint, even though doing so is often insecure. There are probably better ways to teach users to securely handle their files. For example, the 'fscrypt' userspace tool could provide a command that migrates unencrypted files into an encrypted directory, acting like 'shred' on the source files and providing appropriate warnings depending on the type of the source filesystem and disk. Receiving errors on unimportant files might also force some users to disable encryption, thus making the behavior counterproductive. It's desirable to make encryption as unobtrusive as possible. Therefore, change the error code from EPERM to EXDEV so that tools looking for EXDEV will fall back to a copy. This, of course, doesn't prevent users from still doing the right things to securely manage their files. Note that this also matches the behavior when a file is renamed between two project quota hierarchies; so there's precedent for using EXDEV for things other than mountpoints. xfstests generic/398 will require an update with this change. [Rewritten from an earlier patch series by Michael Halcrow.] Cc: Michael Halcrow <mhalcrow@google.com> Cc: Joe Richey <joerichey@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 08:20:21 +08:00
* -EXDEV if the link would result in an inconsistent encryption policy, or
* another -errno code.
*/
static inline int fscrypt_prepare_link(struct dentry *old_dentry,
struct inode *dir,
struct dentry *dentry)
{
if (IS_ENCRYPTED(dir))
return __fscrypt_prepare_link(d_inode(old_dentry), dir, dentry);
return 0;
}
/**
* fscrypt_prepare_rename() - prepare for a rename between possibly-encrypted
* directories
* @old_dir: source directory
* @old_dentry: dentry for source file
* @new_dir: target directory
* @new_dentry: dentry for target location (may be negative unless exchanging)
* @flags: rename flags (we care at least about %RENAME_EXCHANGE)
*
* Prepare for ->rename() where the source and/or target directories may be
* encrypted. A new link can only be added to an encrypted directory if the
* directory's encryption key is available --- since otherwise we'd have no way
* to encrypt the filename. A rename to an existing name, on the other hand,
* *is* cryptographically possible without the key. However, we take the more
* conservative approach and just forbid all no-key renames.
*
* We also verify that the rename will not violate the constraint that all files
* in an encrypted directory tree use the same encryption policy.
*
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir Currently, trying to rename or link a regular file, directory, or symlink into an encrypted directory fails with EPERM when the source file is unencrypted or is encrypted with a different encryption policy, and is on the same mountpoint. It is correct for the operation to fail, but the choice of EPERM breaks tools like 'mv' that know to copy rather than rename if they see EXDEV, but don't know what to do with EPERM. Our original motivation for EPERM was to encourage users to securely handle their data. Encrypting files by "moving" them into an encrypted directory can be insecure because the unencrypted data may remain in free space on disk, where it can later be recovered by an attacker. It's much better to encrypt the data from the start, or at least try to securely delete the source data e.g. using the 'shred' program. However, the current behavior hasn't been effective at achieving its goal because users tend to be confused, hack around it, and complain; see e.g. https://github.com/google/fscrypt/issues/76. And in some cases it's actually inconsistent or unnecessary. For example, 'mv'-ing files between differently encrypted directories doesn't work even in cases where it can be secure, such as when in userspace the same passphrase protects both directories. Yet, you *can* already 'mv' unencrypted files into an encrypted directory if the source files are on a different mountpoint, even though doing so is often insecure. There are probably better ways to teach users to securely handle their files. For example, the 'fscrypt' userspace tool could provide a command that migrates unencrypted files into an encrypted directory, acting like 'shred' on the source files and providing appropriate warnings depending on the type of the source filesystem and disk. Receiving errors on unimportant files might also force some users to disable encryption, thus making the behavior counterproductive. It's desirable to make encryption as unobtrusive as possible. Therefore, change the error code from EPERM to EXDEV so that tools looking for EXDEV will fall back to a copy. This, of course, doesn't prevent users from still doing the right things to securely manage their files. Note that this also matches the behavior when a file is renamed between two project quota hierarchies; so there's precedent for using EXDEV for things other than mountpoints. xfstests generic/398 will require an update with this change. [Rewritten from an earlier patch series by Michael Halcrow.] Cc: Michael Halcrow <mhalcrow@google.com> Cc: Joe Richey <joerichey@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-23 08:20:21 +08:00
* Return: 0 on success, -ENOKEY if an encryption key is missing, -EXDEV if the
* rename would cause inconsistent encryption policies, or another -errno code.
*/
static inline int fscrypt_prepare_rename(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
struct dentry *new_dentry,
unsigned int flags)
{
if (IS_ENCRYPTED(old_dir) || IS_ENCRYPTED(new_dir))
return __fscrypt_prepare_rename(old_dir, old_dentry,
new_dir, new_dentry, flags);
return 0;
}
/**
* fscrypt_prepare_lookup() - prepare to lookup a name in a possibly-encrypted
* directory
* @dir: directory being searched
* @dentry: filename being looked up
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
* @fname: (output) the name to use to search the on-disk directory
*
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
* Prepare for ->lookup() in a directory which may be encrypted by determining
* the name that will actually be used to search the directory on-disk. Lookups
* can be done with or without the directory's encryption key; without the key,
* filenames are presented in encrypted form. Therefore, we'll try to set up
* the directory's encryption key, but even without it the lookup can continue.
*
* This also installs a custom ->d_revalidate() method which will invalidate the
* dentry if it was created without the key and the key is later added.
*
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
* Return: 0 on success; -ENOENT if key is unavailable but the filename isn't a
* correctly formed encoded ciphertext name, so a negative dentry should be
* created; or another -errno code.
*/
static inline int fscrypt_prepare_lookup(struct inode *dir,
struct dentry *dentry,
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
struct fscrypt_name *fname)
{
if (IS_ENCRYPTED(dir))
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext ->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-21 02:39:13 +08:00
return __fscrypt_prepare_lookup(dir, dentry, fname);
memset(fname, 0, sizeof(*fname));
fname->usr_fname = &dentry->d_name;
fname->disk_name.name = (unsigned char *)dentry->d_name.name;
fname->disk_name.len = dentry->d_name.len;
return 0;
}
/**
* fscrypt_prepare_setattr() - prepare to change a possibly-encrypted inode's
* attributes
* @dentry: dentry through which the inode is being changed
* @attr: attributes to change
*
* Prepare for ->setattr() on a possibly-encrypted inode. On an encrypted file,
* most attribute changes are allowed even without the encryption key. However,
* without the encryption key we do have to forbid truncates. This is needed
* because the size being truncated to may not be a multiple of the filesystem
* block size, and in that case we'd have to decrypt the final block, zero the
* portion past i_size, and re-encrypt it. (We *could* allow truncating to a
* filesystem block boundary, but it's simpler to just forbid all truncates ---
* and we already forbid all other contents modifications without the key.)
*
* Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
* if a problem occurred while setting up the encryption key.
*/
static inline int fscrypt_prepare_setattr(struct dentry *dentry,
struct iattr *attr)
{
if (attr->ia_valid & ATTR_SIZE)
return fscrypt_require_key(d_inode(dentry));
return 0;
}
fscrypt: new helper functions for ->symlink() Currently, filesystems supporting fscrypt need to implement some tricky logic when creating encrypted symlinks, including handling a peculiar on-disk format (struct fscrypt_symlink_data) and correctly calculating the size of the encrypted symlink. Introduce helper functions to make things a bit easier: - fscrypt_prepare_symlink() computes and validates the size the symlink target will require on-disk. - fscrypt_encrypt_symlink() creates the encrypted target if needed. The new helpers actually fix some subtle bugs. First, when checking whether the symlink target was too long, filesystems didn't account for the fact that the NUL padding is meant to be truncated if it would cause the maximum length to be exceeded, as is done for filenames in directories. Consequently users would receive ENAMETOOLONG when creating symlinks close to what is supposed to be the maximum length. For example, with EXT4 with a 4K block size, the maximum symlink target length in an encrypted directory is supposed to be 4093 bytes (in comparison to 4095 in an unencrypted directory), but in FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted. Second, symlink targets of "." and ".." were not being encrypted, even though they should be, as these names are special in *directory entries* but not in symlink targets. Fortunately, we can fix this simply by starting to encrypt them, as old kernels already accept them in encrypted form. Third, the output string length the filesystems were providing when doing the actual encryption was incorrect, as it was forgotten to exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this bug didn't make a difference. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-06 02:45:01 +08:00
/**
* fscrypt_prepare_symlink() - prepare to create a possibly-encrypted symlink
fscrypt: new helper functions for ->symlink() Currently, filesystems supporting fscrypt need to implement some tricky logic when creating encrypted symlinks, including handling a peculiar on-disk format (struct fscrypt_symlink_data) and correctly calculating the size of the encrypted symlink. Introduce helper functions to make things a bit easier: - fscrypt_prepare_symlink() computes and validates the size the symlink target will require on-disk. - fscrypt_encrypt_symlink() creates the encrypted target if needed. The new helpers actually fix some subtle bugs. First, when checking whether the symlink target was too long, filesystems didn't account for the fact that the NUL padding is meant to be truncated if it would cause the maximum length to be exceeded, as is done for filenames in directories. Consequently users would receive ENAMETOOLONG when creating symlinks close to what is supposed to be the maximum length. For example, with EXT4 with a 4K block size, the maximum symlink target length in an encrypted directory is supposed to be 4093 bytes (in comparison to 4095 in an unencrypted directory), but in FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted. Second, symlink targets of "." and ".." were not being encrypted, even though they should be, as these names are special in *directory entries* but not in symlink targets. Fortunately, we can fix this simply by starting to encrypt them, as old kernels already accept them in encrypted form. Third, the output string length the filesystems were providing when doing the actual encryption was incorrect, as it was forgotten to exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this bug didn't make a difference. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-06 02:45:01 +08:00
* @dir: directory in which the symlink is being created
* @target: plaintext symlink target
* @len: length of @target excluding null terminator
* @max_len: space the filesystem has available to store the symlink target
* @disk_link: (out) the on-disk symlink target being prepared
*
* This function computes the size the symlink target will require on-disk,
* stores it in @disk_link->len, and validates it against @max_len. An
* encrypted symlink may be longer than the original.
*
* Additionally, @disk_link->name is set to @target if the symlink will be
* unencrypted, but left NULL if the symlink will be encrypted. For encrypted
* symlinks, the filesystem must call fscrypt_encrypt_symlink() to create the
* on-disk target later. (The reason for the two-step process is that some
* filesystems need to know the size of the symlink target before creating the
* inode, e.g. to determine whether it will be a "fast" or "slow" symlink.)
*
* Return: 0 on success, -ENAMETOOLONG if the symlink target is too long,
* -ENOKEY if the encryption key is missing, or another -errno code if a problem
* occurred while setting up the encryption key.
*/
static inline int fscrypt_prepare_symlink(struct inode *dir,
const char *target,
unsigned int len,
unsigned int max_len,
struct fscrypt_str *disk_link)
{
if (IS_ENCRYPTED(dir) || fscrypt_dummy_context_enabled(dir))
return __fscrypt_prepare_symlink(dir, len, max_len, disk_link);
disk_link->name = (unsigned char *)target;
disk_link->len = len + 1;
if (disk_link->len > max_len)
return -ENAMETOOLONG;
return 0;
}
/**
* fscrypt_encrypt_symlink() - encrypt the symlink target if needed
fscrypt: new helper functions for ->symlink() Currently, filesystems supporting fscrypt need to implement some tricky logic when creating encrypted symlinks, including handling a peculiar on-disk format (struct fscrypt_symlink_data) and correctly calculating the size of the encrypted symlink. Introduce helper functions to make things a bit easier: - fscrypt_prepare_symlink() computes and validates the size the symlink target will require on-disk. - fscrypt_encrypt_symlink() creates the encrypted target if needed. The new helpers actually fix some subtle bugs. First, when checking whether the symlink target was too long, filesystems didn't account for the fact that the NUL padding is meant to be truncated if it would cause the maximum length to be exceeded, as is done for filenames in directories. Consequently users would receive ENAMETOOLONG when creating symlinks close to what is supposed to be the maximum length. For example, with EXT4 with a 4K block size, the maximum symlink target length in an encrypted directory is supposed to be 4093 bytes (in comparison to 4095 in an unencrypted directory), but in FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted. Second, symlink targets of "." and ".." were not being encrypted, even though they should be, as these names are special in *directory entries* but not in symlink targets. Fortunately, we can fix this simply by starting to encrypt them, as old kernels already accept them in encrypted form. Third, the output string length the filesystems were providing when doing the actual encryption was incorrect, as it was forgotten to exclude 'sizeof(struct fscrypt_symlink_data)'. Fortunately though, this bug didn't make a difference. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-01-06 02:45:01 +08:00
* @inode: symlink inode
* @target: plaintext symlink target
* @len: length of @target excluding null terminator
* @disk_link: (in/out) the on-disk symlink target being prepared
*
* If the symlink target needs to be encrypted, then this function encrypts it
* into @disk_link->name. fscrypt_prepare_symlink() must have been called
* previously to compute @disk_link->len. If the filesystem did not allocate a
* buffer for @disk_link->name after calling fscrypt_prepare_link(), then one
* will be kmalloc()'ed and the filesystem will be responsible for freeing it.
*
* Return: 0 on success, -errno on failure
*/
static inline int fscrypt_encrypt_symlink(struct inode *inode,
const char *target,
unsigned int len,
struct fscrypt_str *disk_link)
{
if (IS_ENCRYPTED(inode))
return __fscrypt_encrypt_symlink(inode, target, len, disk_link);
return 0;
}
/* If *pagep is a bounce page, free it and set *pagep to the pagecache page */
static inline void fscrypt_finalize_bounce_page(struct page **pagep)
{
struct page *page = *pagep;
if (fscrypt_is_bounce_page(page)) {
*pagep = fscrypt_pagecache_page(page);
fscrypt_free_bounce_page(page);
}
}
#endif /* _LINUX_FSCRYPT_H */