Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
 "In addition to bug fixes and cleanups there are two new features from
  Amir:

   - Consistent inode number support for the case when layers are not
     all on the same filesystem (feature is dubbed "xino").

   - Optimize overlayfs file handle decoding. This one touches the
     exportfs interface to allow detecting the disconnected directory
     case"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update documentation w.r.t "xino" feature
  ovl: add support for "xino" mount and config options
  ovl: consistent d_ino for non-samefs with xino
  ovl: consistent i_ino for non-samefs with xino
  ovl: constant st_ino for non-samefs with xino
  ovl: allocate anon bdev per unique lower fs
  ovl: factor out ovl_map_dev_ino() helper
  ovl: cleanup ovl_update_time()
  ovl: add WARN_ON() for non-dir redirect cases
  ovl: cleanup setting OVL_INDEX
  ovl: set d->is_dir and d->opaque for last path element
  ovl: Do not check for redirect if this is last layer
  ovl: lookup in inode cache first when decoding lower file handle
  ovl: do not try to reconnect a disconnected origin dentry
  ovl: disambiguate ovl_encode_fh()
  ovl: set lower layer st_dev only if setting lower st_ino
  ovl: fix lookup with middle layer opaque dir and absolute path redirects
  ovl: Set d->last properly during lookup
  ovl: set i_ino to the value of st_ino for NFS export
This commit is contained in:
Linus Torvalds 2018-04-13 16:55:41 -07:00
commit 48023102b7
12 changed files with 510 additions and 172 deletions

View File

@ -14,9 +14,13 @@ The result will inevitably fail to look exactly like a normal
filesystem for various technical reasons. The expectation is that filesystem for various technical reasons. The expectation is that
many use cases will be able to ignore these differences. many use cases will be able to ignore these differences.
This approach is 'hybrid' because the objects that appear in the
filesystem do not all appear to belong to that filesystem. In many Overlay objects
cases an object accessed in the union will be indistinguishable ---------------
The overlay filesystem approach is 'hybrid', because the objects that
appear in the filesystem do not always appear to belong to that filesystem.
In many cases, an object accessed in the union will be indistinguishable
from accessing the corresponding object from the original filesystem. from accessing the corresponding object from the original filesystem.
This is most obvious from the 'st_dev' field returned by stat(2). This is most obvious from the 'st_dev' field returned by stat(2).
@ -34,6 +38,19 @@ make the overlay mount more compliant with filesystem scanners and
overlay objects will be distinguishable from the corresponding overlay objects will be distinguishable from the corresponding
objects in the original filesystem. objects in the original filesystem.
On 64bit systems, even if all overlay layers are not on the same
underlying filesystem, the same compliant behavior could be achieved
with the "xino" feature. The "xino" feature composes a unique object
identifier from the real object st_ino and an underlying fsid index.
If all underlying filesystems support NFS file handles and export file
handles with 32bit inode number encoding (e.g. ext4), overlay filesystem
will use the high inode number bits for fsid. Even when the underlying
filesystem uses 64bit inode numbers, users can still enable the "xino"
feature with the "-o xino=on" overlay mount option. That is useful for the
case of underlying filesystems like xfs and tmpfs, which use 64bit inode
numbers, but are very unlikely to use the high inode number bit.
Upper and Lower Upper and Lower
--------------- ---------------
@ -290,10 +307,19 @@ Non-standard behavior
--------------------- ---------------------
The copy_up operation essentially creates a new, identical file and The copy_up operation essentially creates a new, identical file and
moves it over to the old name. The new file may be on a different moves it over to the old name. Any open files referring to this inode
filesystem, so both st_dev and st_ino of the file may change. will access the old data.
Any open files referring to this inode will access the old data. The new file may be on a different filesystem, so both st_dev and st_ino
of the real file may change. The values of st_dev and st_ino returned by
stat(2) on an overlay object are often not the same as the real file
stat(2) values to prevent the values from changing on copy_up.
Unless "xino" feature is enabled, when overlay layers are not all on the
same underlying filesystem, the value of st_dev may be different for two
non-directory objects in the same overlay filesystem and the value of
st_ino for directory objects may be non persistent and could change even
while the overlay filesystem is still mounted.
Unless "inode index" feature is enabled, if a file with multiple hard Unless "inode index" feature is enabled, if a file with multiple hard
links is copied up, then this will "break" the link. Changes will not be links is copied up, then this will "break" the link. Changes will not be
@ -302,6 +328,7 @@ propagated to other names referring to the same inode.
Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged
directory will fail with EXDEV. directory will fail with EXDEV.
Changes to underlying filesystems Changes to underlying filesystems
--------------------------------- ---------------------------------

View File

@ -435,6 +435,15 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
if (IS_ERR_OR_NULL(result)) if (IS_ERR_OR_NULL(result))
return ERR_PTR(-ESTALE); return ERR_PTR(-ESTALE);
/*
* If no acceptance criteria was specified by caller, a disconnected
* dentry is also accepatable. Callers may use this mode to query if
* file handle is stale or to get a reference to an inode without
* risking the high overhead caused by directory reconnect.
*/
if (!acceptable)
return result;
if (d_is_dir(result)) { if (d_is_dir(result)) {
/* /*
* This request is for a directory. * This request is for a directory.

View File

@ -86,3 +86,20 @@ config OVERLAY_FS_NFS_EXPORT
case basis with the "nfs_export=on" mount option. case basis with the "nfs_export=on" mount option.
Say N unless you fully understand the consequences. Say N unless you fully understand the consequences.
config OVERLAY_FS_XINO_AUTO
bool "Overlayfs: auto enable inode number mapping"
default n
depends on OVERLAY_FS
help
If this config option is enabled then overlay filesystems will use
unused high bits in undelying filesystem inode numbers to map all
inodes to a unified address space. The mapped 64bit inode numbers
might not be compatible with applications that expect 32bit inodes.
If compatibility with applications that expect 32bit inodes is not an
issue, then it is safe and recommended to say Y here.
For more information, see Documentation/filesystems/overlayfs.txt
If unsure, say N.

View File

@ -232,7 +232,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat)
return err; return err;
} }
struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper) struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
{ {
struct ovl_fh *fh; struct ovl_fh *fh;
int fh_type, fh_len, dwords; int fh_type, fh_len, dwords;
@ -300,7 +300,7 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
* up and a pure upper inode. * up and a pure upper inode.
*/ */
if (ovl_can_decode_fh(lower->d_sb)) { if (ovl_can_decode_fh(lower->d_sb)) {
fh = ovl_encode_fh(lower, false); fh = ovl_encode_real_fh(lower, false);
if (IS_ERR(fh)) if (IS_ERR(fh))
return PTR_ERR(fh); return PTR_ERR(fh);
} }
@ -321,7 +321,7 @@ static int ovl_set_upper_fh(struct dentry *upper, struct dentry *index)
const struct ovl_fh *fh; const struct ovl_fh *fh;
int err; int err;
fh = ovl_encode_fh(upper, true); fh = ovl_encode_real_fh(upper, true);
if (IS_ERR(fh)) if (IS_ERR(fh))
return PTR_ERR(fh); return PTR_ERR(fh);

View File

@ -228,8 +228,8 @@ static int ovl_d_to_fh(struct dentry *dentry, char *buf, int buflen)
goto fail; goto fail;
/* Encode an upper or lower file handle */ /* Encode an upper or lower file handle */
fh = ovl_encode_fh(enc_lower ? ovl_dentry_lower(dentry) : fh = ovl_encode_real_fh(enc_lower ? ovl_dentry_lower(dentry) :
ovl_dentry_upper(dentry), !enc_lower); ovl_dentry_upper(dentry), !enc_lower);
err = PTR_ERR(fh); err = PTR_ERR(fh);
if (IS_ERR(fh)) if (IS_ERR(fh))
goto fail; goto fail;
@ -267,8 +267,8 @@ static int ovl_dentry_to_fh(struct dentry *dentry, u32 *fid, int *max_len)
return OVL_FILEID; return OVL_FILEID;
} }
static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len, static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len,
struct inode *parent) struct inode *parent)
{ {
struct dentry *dentry; struct dentry *dentry;
int type; int type;
@ -305,15 +305,12 @@ static struct dentry *ovl_obtain_alias(struct super_block *sb,
if (d_is_dir(upper ?: lower)) if (d_is_dir(upper ?: lower))
return ERR_PTR(-EIO); return ERR_PTR(-EIO);
inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower); inode = ovl_get_inode(sb, dget(upper), lowerpath, index, !!lower);
if (IS_ERR(inode)) { if (IS_ERR(inode)) {
dput(upper); dput(upper);
return ERR_CAST(inode); return ERR_CAST(inode);
} }
if (index)
ovl_set_flag(OVL_INDEX, inode);
dentry = d_find_any_alias(inode); dentry = d_find_any_alias(inode);
if (!dentry) { if (!dentry) {
dentry = d_alloc_anon(inode->i_sb); dentry = d_alloc_anon(inode->i_sb);
@ -685,7 +682,7 @@ static struct dentry *ovl_upper_fh_to_d(struct super_block *sb,
if (!ofs->upper_mnt) if (!ofs->upper_mnt)
return ERR_PTR(-EACCES); return ERR_PTR(-EACCES);
upper = ovl_decode_fh(fh, ofs->upper_mnt); upper = ovl_decode_real_fh(fh, ofs->upper_mnt, true);
if (IS_ERR_OR_NULL(upper)) if (IS_ERR_OR_NULL(upper))
return upper; return upper;
@ -703,25 +700,39 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
struct ovl_path *stack = &origin; struct ovl_path *stack = &origin;
struct dentry *dentry = NULL; struct dentry *dentry = NULL;
struct dentry *index = NULL; struct dentry *index = NULL;
struct inode *inode = NULL; struct inode *inode;
bool is_deleted = false;
int err; int err;
/* First lookup indexed upper by fh */ /* First lookup overlay inode in inode cache by origin fh */
err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack);
if (err)
return ERR_PTR(err);
if (!d_is_dir(origin.dentry) ||
!(origin.dentry->d_flags & DCACHE_DISCONNECTED)) {
inode = ovl_lookup_inode(sb, origin.dentry, false);
err = PTR_ERR(inode);
if (IS_ERR(inode))
goto out_err;
if (inode) {
dentry = d_find_any_alias(inode);
iput(inode);
if (dentry)
goto out;
}
}
/* Then lookup indexed upper/whiteout by origin fh */
if (ofs->indexdir) { if (ofs->indexdir) {
index = ovl_get_index_fh(ofs, fh); index = ovl_get_index_fh(ofs, fh);
err = PTR_ERR(index); err = PTR_ERR(index);
if (IS_ERR(index)) { if (IS_ERR(index)) {
if (err != -ESTALE)
return ERR_PTR(err);
/* Found a whiteout index - treat as deleted inode */
is_deleted = true;
index = NULL; index = NULL;
goto out_err;
} }
} }
/* Then try to get upper dir by index */ /* Then try to get a connected upper dir by index */
if (index && d_is_dir(index)) { if (index && d_is_dir(index)) {
struct dentry *upper = ovl_index_upper(ofs, index); struct dentry *upper = ovl_index_upper(ofs, index);
@ -734,24 +745,19 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
goto out; goto out;
} }
/* Then lookup origin by fh */ /* Otherwise, get a connected non-upper dir or disconnected non-dir */
err = ovl_check_origin_fh(ofs, fh, NULL, &stack); if (d_is_dir(origin.dentry) &&
if (err) { (origin.dentry->d_flags & DCACHE_DISCONNECTED)) {
goto out_err; dput(origin.dentry);
} else if (index) { origin.dentry = NULL;
err = ovl_check_origin_fh(ofs, fh, true, NULL, &stack);
if (err)
goto out_err;
}
if (index) {
err = ovl_verify_origin(index, origin.dentry, false); err = ovl_verify_origin(index, origin.dentry, false);
if (err) if (err)
goto out_err; goto out_err;
} else if (is_deleted) {
/* Lookup deleted non-dir by origin inode */
if (!d_is_dir(origin.dentry))
inode = ovl_lookup_inode(sb, origin.dentry, false);
err = -ESTALE;
if (!inode || atomic_read(&inode->i_count) == 1)
goto out_err;
/* Deleted but still open? */
index = dget(ovl_i_dentry_upper(inode));
} }
dentry = ovl_get_dentry(sb, NULL, &origin, index); dentry = ovl_get_dentry(sb, NULL, &origin, index);
@ -759,7 +765,6 @@ static struct dentry *ovl_lower_fh_to_d(struct super_block *sb,
out: out:
dput(origin.dentry); dput(origin.dentry);
dput(index); dput(index);
iput(inode);
return dentry; return dentry;
out_err: out_err:
@ -829,7 +834,7 @@ static struct dentry *ovl_get_parent(struct dentry *dentry)
} }
const struct export_operations ovl_export_operations = { const struct export_operations ovl_export_operations = {
.encode_fh = ovl_encode_inode_fh, .encode_fh = ovl_encode_fh,
.fh_to_dentry = ovl_fh_to_dentry, .fh_to_dentry = ovl_fh_to_dentry,
.fh_to_parent = ovl_fh_to_parent, .fh_to_parent = ovl_fh_to_parent,
.get_name = ovl_get_name, .get_name = ovl_get_name,

View File

@ -16,13 +16,6 @@
#include "overlayfs.h" #include "overlayfs.h"
static dev_t ovl_get_pseudo_dev(struct dentry *dentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
return oe->lowerstack[0].layer->pseudo_dev;
}
int ovl_setattr(struct dentry *dentry, struct iattr *attr) int ovl_setattr(struct dentry *dentry, struct iattr *attr)
{ {
int err; int err;
@ -66,6 +59,69 @@ out:
return err; return err;
} }
static int ovl_map_dev_ino(struct dentry *dentry, struct kstat *stat,
struct ovl_layer *lower_layer)
{
bool samefs = ovl_same_sb(dentry->d_sb);
unsigned int xinobits = ovl_xino_bits(dentry->d_sb);
if (samefs) {
/*
* When all layers are on the same fs, all real inode
* number are unique, so we use the overlay st_dev,
* which is friendly to du -x.
*/
stat->dev = dentry->d_sb->s_dev;
return 0;
} else if (xinobits) {
unsigned int shift = 64 - xinobits;
/*
* All inode numbers of underlying fs should not be using the
* high xinobits, so we use high xinobits to partition the
* overlay st_ino address space. The high bits holds the fsid
* (upper fsid is 0). This way overlay inode numbers are unique
* and all inodes use overlay st_dev. Inode numbers are also
* persistent for a given layer configuration.
*/
if (stat->ino >> shift) {
pr_warn_ratelimited("overlayfs: inode number too big (%pd2, ino=%llu, xinobits=%d)\n",
dentry, stat->ino, xinobits);
} else {
if (lower_layer)
stat->ino |= ((u64)lower_layer->fsid) << shift;
stat->dev = dentry->d_sb->s_dev;
return 0;
}
}
/* The inode could not be mapped to a unified st_ino address space */
if (S_ISDIR(dentry->d_inode->i_mode)) {
/*
* Always use the overlay st_dev for directories, so 'find
* -xdev' will scan the entire overlay mount and won't cross the
* overlay mount boundaries.
*
* If not all layers are on the same fs the pair {real st_ino;
* overlay st_dev} is not unique, so use the non persistent
* overlay st_ino for directories.
*/
stat->dev = dentry->d_sb->s_dev;
stat->ino = dentry->d_inode->i_ino;
} else if (lower_layer && lower_layer->fsid) {
/*
* For non-samefs setup, if we cannot map all layers st_ino
* to a unified address space, we need to make sure that st_dev
* is unique per lower fs. Upper layer uses real st_dev and
* lower layers use the unique anonymous bdev assigned to the
* lower fs.
*/
stat->dev = lower_layer->fs->pseudo_dev;
}
return 0;
}
int ovl_getattr(const struct path *path, struct kstat *stat, int ovl_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags) u32 request_mask, unsigned int flags)
{ {
@ -75,6 +131,7 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
const struct cred *old_cred; const struct cred *old_cred;
bool is_dir = S_ISDIR(dentry->d_inode->i_mode); bool is_dir = S_ISDIR(dentry->d_inode->i_mode);
bool samefs = ovl_same_sb(dentry->d_sb); bool samefs = ovl_same_sb(dentry->d_sb);
struct ovl_layer *lower_layer = NULL;
int err; int err;
type = ovl_path_real(dentry, &realpath); type = ovl_path_real(dentry, &realpath);
@ -84,14 +141,18 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
goto out; goto out;
/* /*
* For non-dir or same fs, we use st_ino of the copy up origin, if we * For non-dir or same fs, we use st_ino of the copy up origin.
* know it. This guaranties constant st_dev/st_ino across copy up. * This guaranties constant st_dev/st_ino across copy up.
* With xino feature and non-samefs, we use st_ino of the copy up
* origin masked with high bits that represent the layer id.
* *
* If filesystem supports NFS export ops, this also guaranties * If lower filesystem supports NFS file handles, this also guaranties
* persistent st_ino across mount cycle. * persistent st_ino across mount cycle.
*/ */
if (!is_dir || samefs) { if (!is_dir || samefs || ovl_xino_bits(dentry->d_sb)) {
if (OVL_TYPE_ORIGIN(type)) { if (!OVL_TYPE_UPPER(type)) {
lower_layer = ovl_layer_lower(dentry);
} else if (OVL_TYPE_ORIGIN(type)) {
struct kstat lowerstat; struct kstat lowerstat;
u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0); u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
@ -118,43 +179,17 @@ int ovl_getattr(const struct path *path, struct kstat *stat,
*/ */
if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) || if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) ||
(!ovl_verify_lower(dentry->d_sb) && (!ovl_verify_lower(dentry->d_sb) &&
(is_dir || lowerstat.nlink == 1))) (is_dir || lowerstat.nlink == 1))) {
stat->ino = lowerstat.ino; stat->ino = lowerstat.ino;
lower_layer = ovl_layer_lower(dentry);
if (samefs) }
WARN_ON_ONCE(stat->dev != lowerstat.dev);
else
stat->dev = ovl_get_pseudo_dev(dentry);
} }
if (samefs) {
/*
* When all layers are on the same fs, all real inode
* number are unique, so we use the overlay st_dev,
* which is friendly to du -x.
*/
stat->dev = dentry->d_sb->s_dev;
} else if (!OVL_TYPE_UPPER(type)) {
/*
* For non-samefs setup, to make sure that st_dev/st_ino
* pair is unique across the system, we use a unique
* anonymous st_dev for lower layer inode.
*/
stat->dev = ovl_get_pseudo_dev(dentry);
}
} else {
/*
* Always use the overlay st_dev for directories, so 'find
* -xdev' will scan the entire overlay mount and won't cross the
* overlay mount boundaries.
*
* If not all layers are on the same fs the pair {real st_ino;
* overlay st_dev} is not unique, so use the non persistent
* overlay st_ino for directories.
*/
stat->dev = dentry->d_sb->s_dev;
stat->ino = dentry->d_inode->i_ino;
} }
err = ovl_map_dev_ino(dentry, stat, lower_layer);
if (err)
goto out;
/* /*
* It's probably not worth it to count subdirs to get the * It's probably not worth it to count subdirs to get the
* correct link count. nlink=1 seems to pacify 'find' and * correct link count. nlink=1 seems to pacify 'find' and
@ -383,24 +418,18 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags)
int ovl_update_time(struct inode *inode, struct timespec *ts, int flags) int ovl_update_time(struct inode *inode, struct timespec *ts, int flags)
{ {
struct dentry *alias; if (flags & S_ATIME) {
struct path upperpath; struct ovl_fs *ofs = inode->i_sb->s_fs_info;
struct path upperpath = {
.mnt = ofs->upper_mnt,
.dentry = ovl_upperdentry_dereference(OVL_I(inode)),
};
if (!(flags & S_ATIME)) if (upperpath.dentry) {
return 0; touch_atime(&upperpath);
inode->i_atime = d_inode(upperpath.dentry)->i_atime;
alias = d_find_any_alias(inode); }
if (!alias)
return 0;
ovl_path_upper(alias, &upperpath);
if (upperpath.dentry) {
touch_atime(&upperpath);
inode->i_atime = d_inode(upperpath.dentry)->i_atime;
} }
dput(alias);
return 0; return 0;
} }
@ -459,9 +488,27 @@ static inline void ovl_lockdep_annotate_inode_mutex_key(struct inode *inode)
#endif #endif
} }
static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev) static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
unsigned long ino, int fsid)
{ {
inode->i_ino = get_next_ino(); int xinobits = ovl_xino_bits(inode->i_sb);
/*
* When NFS export is enabled and d_ino is consistent with st_ino
* (samefs or i_ino has enough bits to encode layer), set the same
* value used for d_ino to i_ino, because nfsd readdirplus compares
* d_ino values to i_ino values of child entries. When called from
* ovl_new_inode(), ino arg is 0, so i_ino will be updated to real
* upper inode i_ino on ovl_inode_init() or ovl_inode_update().
*/
if (inode->i_sb->s_export_op &&
(ovl_same_sb(inode->i_sb) || xinobits)) {
inode->i_ino = ino;
if (xinobits && fsid && !(ino >> (64 - xinobits)))
inode->i_ino |= (unsigned long)fsid << (64 - xinobits);
} else {
inode->i_ino = get_next_ino();
}
inode->i_mode = mode; inode->i_mode = mode;
inode->i_flags |= S_NOCMTIME; inode->i_flags |= S_NOCMTIME;
#ifdef CONFIG_FS_POSIX_ACL #ifdef CONFIG_FS_POSIX_ACL
@ -597,7 +644,7 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev)
inode = new_inode(sb); inode = new_inode(sb);
if (inode) if (inode)
ovl_fill_inode(inode, mode, rdev); ovl_fill_inode(inode, mode, rdev, 0, 0);
return inode; return inode;
} }
@ -703,13 +750,16 @@ static bool ovl_hash_bylower(struct super_block *sb, struct dentry *upper,
} }
struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry, struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
struct dentry *lowerdentry, struct dentry *index, struct ovl_path *lowerpath, struct dentry *index,
unsigned int numlower) unsigned int numlower)
{ {
struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL; struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL;
struct inode *inode; struct inode *inode;
struct dentry *lowerdentry = lowerpath ? lowerpath->dentry : NULL;
bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index); bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index);
int fsid = bylower ? lowerpath->layer->fsid : 0;
bool is_dir; bool is_dir;
unsigned long ino = 0;
if (!realinode) if (!realinode)
realinode = d_inode(lowerdentry); realinode = d_inode(lowerdentry);
@ -748,18 +798,22 @@ struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
if (!is_dir) if (!is_dir)
nlink = ovl_get_nlink(lowerdentry, upperdentry, nlink); nlink = ovl_get_nlink(lowerdentry, upperdentry, nlink);
set_nlink(inode, nlink); set_nlink(inode, nlink);
ino = key->i_ino;
} else { } else {
/* Lower hardlink that will be broken on copy up */ /* Lower hardlink that will be broken on copy up */
inode = new_inode(sb); inode = new_inode(sb);
if (!inode) if (!inode)
goto out_nomem; goto out_nomem;
} }
ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev); ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev, ino, fsid);
ovl_inode_init(inode, upperdentry, lowerdentry); ovl_inode_init(inode, upperdentry, lowerdentry);
if (upperdentry && ovl_is_impuredir(upperdentry)) if (upperdentry && ovl_is_impuredir(upperdentry))
ovl_set_flag(OVL_IMPURE, inode); ovl_set_flag(OVL_IMPURE, inode);
if (index)
ovl_set_flag(OVL_INDEX, inode);
/* Check for non-merge dir that may have whiteouts */ /* Check for non-merge dir that may have whiteouts */
if (is_dir) { if (is_dir) {
if (((upperdentry && lowerdentry) || numlower > 1) || if (((upperdentry && lowerdentry) || numlower > 1) ||

View File

@ -56,6 +56,15 @@ static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
if (s == next) if (s == next)
goto invalid; goto invalid;
} }
/*
* One of the ancestor path elements in an absolute path
* lookup in ovl_lookup_layer() could have been opaque and
* that will stop further lookup in lower layers (d->stop=true)
* But we have found an absolute redirect in decendant path
* element and that should force continue lookup in lower
* layers (reset d->stop).
*/
d->stop = false;
} else { } else {
if (strchr(buf, '/') != NULL) if (strchr(buf, '/') != NULL)
goto invalid; goto invalid;
@ -171,7 +180,8 @@ invalid:
goto out; goto out;
} }
struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt) struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
bool connected)
{ {
struct dentry *real; struct dentry *real;
int bytes; int bytes;
@ -186,7 +196,7 @@ struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt)
bytes = (fh->len - offsetof(struct ovl_fh, fid)); bytes = (fh->len - offsetof(struct ovl_fh, fid));
real = exportfs_decode_fh(mnt, (struct fid *)fh->fid, real = exportfs_decode_fh(mnt, (struct fid *)fh->fid,
bytes >> 2, (int)fh->type, bytes >> 2, (int)fh->type,
ovl_acceptable, mnt); connected ? ovl_acceptable : NULL, mnt);
if (IS_ERR(real)) { if (IS_ERR(real)) {
/* /*
* Treat stale file handle to lower file as "origin unknown". * Treat stale file handle to lower file as "origin unknown".
@ -220,6 +230,7 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
{ {
struct dentry *this; struct dentry *this;
int err; int err;
bool last_element = !post[0];
this = lookup_one_len_unlocked(name, base, namelen); this = lookup_one_len_unlocked(name, base, namelen);
if (IS_ERR(this)) { if (IS_ERR(this)) {
@ -245,11 +256,23 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
d->stop = true; d->stop = true;
if (d->is_dir) if (d->is_dir)
goto put_and_out; goto put_and_out;
/*
* NB: handle failure to lookup non-last element when non-dir
* redirects become possible
*/
WARN_ON(!last_element);
goto out; goto out;
} }
d->is_dir = true; if (last_element)
if (!d->last && ovl_is_opaquedir(this)) { d->is_dir = true;
d->stop = d->opaque = true; if (d->last)
goto out;
if (ovl_is_opaquedir(this)) {
d->stop = true;
if (last_element)
d->opaque = true;
goto out; goto out;
} }
err = ovl_check_redirect(this, d, prelen, post); err = ovl_check_redirect(this, d, prelen, post);
@ -310,14 +333,15 @@ static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
} }
int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected,
struct dentry *upperdentry, struct ovl_path **stackp) struct dentry *upperdentry, struct ovl_path **stackp)
{ {
struct dentry *origin = NULL; struct dentry *origin = NULL;
int i; int i;
for (i = 0; i < ofs->numlower; i++) { for (i = 0; i < ofs->numlower; i++) {
origin = ovl_decode_fh(fh, ofs->lower_layers[i].mnt); origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt,
connected);
if (origin) if (origin)
break; break;
} }
@ -361,7 +385,7 @@ static int ovl_check_origin(struct ovl_fs *ofs, struct dentry *upperdentry,
if (IS_ERR_OR_NULL(fh)) if (IS_ERR_OR_NULL(fh))
return PTR_ERR(fh); return PTR_ERR(fh);
err = ovl_check_origin_fh(ofs, fh, upperdentry, stackp); err = ovl_check_origin_fh(ofs, fh, false, upperdentry, stackp);
kfree(fh); kfree(fh);
if (err) { if (err) {
@ -415,7 +439,7 @@ int ovl_verify_set_fh(struct dentry *dentry, const char *name,
struct ovl_fh *fh; struct ovl_fh *fh;
int err; int err;
fh = ovl_encode_fh(real, is_upper); fh = ovl_encode_real_fh(real, is_upper);
err = PTR_ERR(fh); err = PTR_ERR(fh);
if (IS_ERR(fh)) if (IS_ERR(fh))
goto fail; goto fail;
@ -451,7 +475,7 @@ struct dentry *ovl_index_upper(struct ovl_fs *ofs, struct dentry *index)
if (IS_ERR_OR_NULL(fh)) if (IS_ERR_OR_NULL(fh))
return ERR_CAST(fh); return ERR_CAST(fh);
upper = ovl_decode_fh(fh, ofs->upper_mnt); upper = ovl_decode_real_fh(fh, ofs->upper_mnt, true);
kfree(fh); kfree(fh);
if (IS_ERR_OR_NULL(upper)) if (IS_ERR_OR_NULL(upper))
@ -558,7 +582,7 @@ int ovl_verify_index(struct ovl_fs *ofs, struct dentry *index)
/* Check if non-dir index is orphan and don't warn before cleaning it */ /* Check if non-dir index is orphan and don't warn before cleaning it */
if (!d_is_dir(index) && d_inode(index)->i_nlink == 1) { if (!d_is_dir(index) && d_inode(index)->i_nlink == 1) {
err = ovl_check_origin_fh(ofs, fh, index, &stack); err = ovl_check_origin_fh(ofs, fh, false, index, &stack);
if (err) if (err)
goto fail; goto fail;
@ -619,7 +643,7 @@ int ovl_get_index_name(struct dentry *origin, struct qstr *name)
struct ovl_fh *fh; struct ovl_fh *fh;
int err; int err;
fh = ovl_encode_fh(origin, false); fh = ovl_encode_real_fh(origin, false);
if (IS_ERR(fh)) if (IS_ERR(fh))
return PTR_ERR(fh); return PTR_ERR(fh);
@ -815,7 +839,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
.is_dir = false, .is_dir = false,
.opaque = false, .opaque = false,
.stop = false, .stop = false,
.last = !poe->numlower, .last = ofs->config.redirect_follow ? false : !poe->numlower,
.redirect = NULL, .redirect = NULL,
}; };
@ -873,7 +897,11 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
for (i = 0; !d.stop && i < poe->numlower; i++) { for (i = 0; !d.stop && i < poe->numlower; i++) {
struct ovl_path lower = poe->lowerstack[i]; struct ovl_path lower = poe->lowerstack[i];
d.last = i == poe->numlower - 1; if (!ofs->config.redirect_follow)
d.last = i == poe->numlower - 1;
else
d.last = lower.layer->idx == roe->numlower;
err = ovl_lookup_layer(lower.dentry, &d, &this); err = ovl_lookup_layer(lower.dentry, &d, &this);
if (err) if (err)
goto out_put; goto out_put;
@ -976,17 +1004,18 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
upperdentry = dget(index); upperdentry = dget(index);
if (upperdentry || ctr) { if (upperdentry || ctr) {
if (ctr) inode = ovl_get_inode(dentry->d_sb, upperdentry, stack, index,
origin = stack[0].dentry;
inode = ovl_get_inode(dentry->d_sb, upperdentry, origin, index,
ctr); ctr);
err = PTR_ERR(inode); err = PTR_ERR(inode);
if (IS_ERR(inode)) if (IS_ERR(inode))
goto out_free_oe; goto out_free_oe;
/*
* NB: handle redirected hard links when non-dir redirects
* become possible
*/
WARN_ON(OVL_I(inode)->redirect);
OVL_I(inode)->redirect = upperredirect; OVL_I(inode)->redirect = upperredirect;
if (index)
ovl_set_flag(OVL_INDEX, inode);
} }
revert_creds(old_cred); revert_creds(old_cred);

View File

@ -202,7 +202,7 @@ void ovl_drop_write(struct dentry *dentry);
struct dentry *ovl_workdir(struct dentry *dentry); struct dentry *ovl_workdir(struct dentry *dentry);
const struct cred *ovl_override_creds(struct super_block *sb); const struct cred *ovl_override_creds(struct super_block *sb);
struct super_block *ovl_same_sb(struct super_block *sb); struct super_block *ovl_same_sb(struct super_block *sb);
bool ovl_can_decode_fh(struct super_block *sb); int ovl_can_decode_fh(struct super_block *sb);
struct dentry *ovl_indexdir(struct super_block *sb); struct dentry *ovl_indexdir(struct super_block *sb);
bool ovl_index_all(struct super_block *sb); bool ovl_index_all(struct super_block *sb);
bool ovl_verify_lower(struct super_block *sb); bool ovl_verify_lower(struct super_block *sb);
@ -215,6 +215,7 @@ void ovl_path_lower(struct dentry *dentry, struct path *path);
enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path); enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path);
struct dentry *ovl_dentry_upper(struct dentry *dentry); struct dentry *ovl_dentry_upper(struct dentry *dentry);
struct dentry *ovl_dentry_lower(struct dentry *dentry); struct dentry *ovl_dentry_lower(struct dentry *dentry);
struct ovl_layer *ovl_layer_lower(struct dentry *dentry);
struct dentry *ovl_dentry_real(struct dentry *dentry); struct dentry *ovl_dentry_real(struct dentry *dentry);
struct dentry *ovl_i_dentry_upper(struct inode *inode); struct dentry *ovl_i_dentry_upper(struct inode *inode);
struct inode *ovl_inode_upper(struct inode *inode); struct inode *ovl_inode_upper(struct inode *inode);
@ -263,11 +264,19 @@ static inline bool ovl_is_impuredir(struct dentry *dentry)
return ovl_check_dir_xattr(dentry, OVL_XATTR_IMPURE); return ovl_check_dir_xattr(dentry, OVL_XATTR_IMPURE);
} }
static inline unsigned int ovl_xino_bits(struct super_block *sb)
{
struct ovl_fs *ofs = sb->s_fs_info;
return ofs->xino_bits;
}
/* namei.c */ /* namei.c */
int ovl_check_fh_len(struct ovl_fh *fh, int fh_len); int ovl_check_fh_len(struct ovl_fh *fh, int fh_len);
struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt); struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt,
int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected);
int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected,
struct dentry *upperdentry, struct ovl_path **stackp); struct dentry *upperdentry, struct ovl_path **stackp);
int ovl_verify_set_fh(struct dentry *dentry, const char *name, int ovl_verify_set_fh(struct dentry *dentry, const char *name,
struct dentry *real, bool is_upper, bool set); struct dentry *real, bool is_upper, bool set);
@ -329,7 +338,7 @@ struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev);
struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real, struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real,
bool is_upper); bool is_upper);
struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry, struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry,
struct dentry *lowerdentry, struct dentry *index, struct ovl_path *lowerpath, struct dentry *index,
unsigned int numlower); unsigned int numlower);
static inline void ovl_copyattr(struct inode *from, struct inode *to) static inline void ovl_copyattr(struct inode *from, struct inode *to)
{ {
@ -361,7 +370,7 @@ int ovl_copy_up(struct dentry *dentry);
int ovl_copy_up_flags(struct dentry *dentry, int flags); int ovl_copy_up_flags(struct dentry *dentry, int flags);
int ovl_copy_xattr(struct dentry *old, struct dentry *new); int ovl_copy_xattr(struct dentry *old, struct dentry *new);
int ovl_set_attr(struct dentry *upper, struct kstat *stat); int ovl_set_attr(struct dentry *upper, struct kstat *stat);
struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper); struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper);
int ovl_set_origin(struct dentry *dentry, struct dentry *lower, int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
struct dentry *upper); struct dentry *upper);

View File

@ -18,13 +18,21 @@ struct ovl_config {
const char *redirect_mode; const char *redirect_mode;
bool index; bool index;
bool nfs_export; bool nfs_export;
int xino;
};
struct ovl_sb {
struct super_block *sb;
dev_t pseudo_dev;
}; };
struct ovl_layer { struct ovl_layer {
struct vfsmount *mnt; struct vfsmount *mnt;
dev_t pseudo_dev; struct ovl_sb *fs;
/* Index of this layer in fs root (upper == 0) */ /* Index of this layer in fs root (upper idx == 0) */
int idx; int idx;
/* One fsid per unique underlying sb (upper fsid == 0) */
int fsid;
}; };
struct ovl_path { struct ovl_path {
@ -35,8 +43,11 @@ struct ovl_path {
/* private information held for overlayfs's superblock */ /* private information held for overlayfs's superblock */
struct ovl_fs { struct ovl_fs {
struct vfsmount *upper_mnt; struct vfsmount *upper_mnt;
unsigned numlower; unsigned int numlower;
/* Number of unique lower sb that differ from upper sb */
unsigned int numlowerfs;
struct ovl_layer *lower_layers; struct ovl_layer *lower_layers;
struct ovl_sb *lower_fs;
/* workbasedir is the path at workdir= mount option */ /* workbasedir is the path at workdir= mount option */
struct dentry *workbasedir; struct dentry *workbasedir;
/* workdir is the 'work' directory under workbasedir */ /* workdir is the 'work' directory under workbasedir */
@ -50,11 +61,11 @@ struct ovl_fs {
const struct cred *creator_cred; const struct cred *creator_cred;
bool tmpfile; bool tmpfile;
bool noxattr; bool noxattr;
/* sb common to all layers */
struct super_block *same_sb;
/* Did we take the inuse lock? */ /* Did we take the inuse lock? */
bool upperdir_locked; bool upperdir_locked;
bool workdir_locked; bool workdir_locked;
/* Inode numbers in all layers do not use the high xino_bits */
unsigned int xino_bits;
}; };
/* private information held for every overlayfs dentry */ /* private information held for every overlayfs dentry */

View File

@ -120,6 +120,10 @@ static bool ovl_calc_d_ino(struct ovl_readdir_data *rdd,
if (!rdd->dentry) if (!rdd->dentry)
return false; return false;
/* Always recalc d_ino when remapping lower inode numbers */
if (ovl_xino_bits(rdd->dentry->d_sb))
return true;
/* Always recalc d_ino for parent */ /* Always recalc d_ino for parent */
if (strcmp(p->name, "..") == 0) if (strcmp(p->name, "..") == 0)
return true; return true;
@ -435,6 +439,19 @@ static struct ovl_dir_cache *ovl_cache_get(struct dentry *dentry)
return cache; return cache;
} }
/* Map inode number to lower fs unique range */
static u64 ovl_remap_lower_ino(u64 ino, int xinobits, int fsid,
const char *name, int namelen)
{
if (ino >> (64 - xinobits)) {
pr_warn_ratelimited("overlayfs: d_ino too big (%.*s, ino=%llu, xinobits=%d)\n",
namelen, name, ino, xinobits);
return ino;
}
return ino | ((u64)fsid) << (64 - xinobits);
}
/* /*
* Set d_ino for upper entries. Non-upper entries should always report * Set d_ino for upper entries. Non-upper entries should always report
* the uppermost real inode ino and should not call this function. * the uppermost real inode ino and should not call this function.
@ -452,9 +469,10 @@ static int ovl_cache_update_ino(struct path *path, struct ovl_cache_entry *p)
struct dentry *this = NULL; struct dentry *this = NULL;
enum ovl_path_type type; enum ovl_path_type type;
u64 ino = p->real_ino; u64 ino = p->real_ino;
int xinobits = ovl_xino_bits(dir->d_sb);
int err = 0; int err = 0;
if (!ovl_same_sb(dir->d_sb)) if (!ovl_same_sb(dir->d_sb) && !xinobits)
goto out; goto out;
if (p->name[0] == '.') { if (p->name[0] == '.') {
@ -491,6 +509,10 @@ get:
WARN_ON_ONCE(dir->d_sb->s_dev != stat.dev); WARN_ON_ONCE(dir->d_sb->s_dev != stat.dev);
ino = stat.ino; ino = stat.ino;
} else if (xinobits && !OVL_TYPE_UPPER(type)) {
ino = ovl_remap_lower_ino(ino, xinobits,
ovl_layer_lower(this)->fsid,
p->name, p->len);
} }
out: out:
@ -618,6 +640,8 @@ struct ovl_readdir_translate {
struct ovl_dir_cache *cache; struct ovl_dir_cache *cache;
struct dir_context ctx; struct dir_context ctx;
u64 parent_ino; u64 parent_ino;
int fsid;
int xinobits;
}; };
static int ovl_fill_real(struct dir_context *ctx, const char *name, static int ovl_fill_real(struct dir_context *ctx, const char *name,
@ -628,14 +652,17 @@ static int ovl_fill_real(struct dir_context *ctx, const char *name,
container_of(ctx, struct ovl_readdir_translate, ctx); container_of(ctx, struct ovl_readdir_translate, ctx);
struct dir_context *orig_ctx = rdt->orig_ctx; struct dir_context *orig_ctx = rdt->orig_ctx;
if (rdt->parent_ino && strcmp(name, "..") == 0) if (rdt->parent_ino && strcmp(name, "..") == 0) {
ino = rdt->parent_ino; ino = rdt->parent_ino;
else if (rdt->cache) { } else if (rdt->cache) {
struct ovl_cache_entry *p; struct ovl_cache_entry *p;
p = ovl_cache_entry_find(&rdt->cache->root, name, namelen); p = ovl_cache_entry_find(&rdt->cache->root, name, namelen);
if (p) if (p)
ino = p->ino; ino = p->ino;
} else if (rdt->xinobits) {
ino = ovl_remap_lower_ino(ino, rdt->xinobits, rdt->fsid,
name, namelen);
} }
return orig_ctx->actor(orig_ctx, name, namelen, offset, ino, d_type); return orig_ctx->actor(orig_ctx, name, namelen, offset, ino, d_type);
@ -646,11 +673,16 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
int err; int err;
struct ovl_dir_file *od = file->private_data; struct ovl_dir_file *od = file->private_data;
struct dentry *dir = file->f_path.dentry; struct dentry *dir = file->f_path.dentry;
struct ovl_layer *lower_layer = ovl_layer_lower(dir);
struct ovl_readdir_translate rdt = { struct ovl_readdir_translate rdt = {
.ctx.actor = ovl_fill_real, .ctx.actor = ovl_fill_real,
.orig_ctx = ctx, .orig_ctx = ctx,
.xinobits = ovl_xino_bits(dir->d_sb),
}; };
if (rdt.xinobits && lower_layer)
rdt.fsid = lower_layer->fsid;
if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) { if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) {
struct kstat stat; struct kstat stat;
struct path statpath = file->f_path; struct path statpath = file->f_path;
@ -693,9 +725,10 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
* dir is impure then need to adjust d_ino for copied up * dir is impure then need to adjust d_ino for copied up
* entries. * entries.
*/ */
if (ovl_same_sb(dentry->d_sb) && if (ovl_xino_bits(dentry->d_sb) ||
(ovl_test_flag(OVL_IMPURE, d_inode(dentry)) || (ovl_same_sb(dentry->d_sb) &&
OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)))) { (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) ||
OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent))))) {
return ovl_iterate_real(file, ctx); return ovl_iterate_real(file, ctx);
} }
return iterate_dir(od->realfile, ctx); return iterate_dir(od->realfile, ctx);

View File

@ -17,6 +17,7 @@
#include <linux/statfs.h> #include <linux/statfs.h>
#include <linux/seq_file.h> #include <linux/seq_file.h>
#include <linux/posix_acl_xattr.h> #include <linux/posix_acl_xattr.h>
#include <linux/exportfs.h>
#include "overlayfs.h" #include "overlayfs.h"
MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>");
@ -50,6 +51,11 @@ module_param_named(nfs_export, ovl_nfs_export_def, bool, 0644);
MODULE_PARM_DESC(ovl_nfs_export_def, MODULE_PARM_DESC(ovl_nfs_export_def,
"Default to on or off for the NFS export feature"); "Default to on or off for the NFS export feature");
static bool ovl_xino_auto_def = IS_ENABLED(CONFIG_OVERLAY_FS_XINO_AUTO);
module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
MODULE_PARM_DESC(ovl_xino_auto_def,
"Auto enable xino feature");
static void ovl_entry_stack_free(struct ovl_entry *oe) static void ovl_entry_stack_free(struct ovl_entry *oe)
{ {
unsigned int i; unsigned int i;
@ -236,11 +242,12 @@ static void ovl_free_fs(struct ovl_fs *ofs)
if (ofs->upperdir_locked) if (ofs->upperdir_locked)
ovl_inuse_unlock(ofs->upper_mnt->mnt_root); ovl_inuse_unlock(ofs->upper_mnt->mnt_root);
mntput(ofs->upper_mnt); mntput(ofs->upper_mnt);
for (i = 0; i < ofs->numlower; i++) { for (i = 0; i < ofs->numlower; i++)
mntput(ofs->lower_layers[i].mnt); mntput(ofs->lower_layers[i].mnt);
free_anon_bdev(ofs->lower_layers[i].pseudo_dev); for (i = 0; i < ofs->numlowerfs; i++)
} free_anon_bdev(ofs->lower_fs[i].pseudo_dev);
kfree(ofs->lower_layers); kfree(ofs->lower_layers);
kfree(ofs->lower_fs);
kfree(ofs->config.lowerdir); kfree(ofs->config.lowerdir);
kfree(ofs->config.upperdir); kfree(ofs->config.upperdir);
@ -325,6 +332,23 @@ static const char *ovl_redirect_mode_def(void)
return ovl_redirect_dir_def ? "on" : "off"; return ovl_redirect_dir_def ? "on" : "off";
} }
enum {
OVL_XINO_OFF,
OVL_XINO_AUTO,
OVL_XINO_ON,
};
static const char * const ovl_xino_str[] = {
"off",
"auto",
"on",
};
static inline int ovl_xino_def(void)
{
return ovl_xino_auto_def ? OVL_XINO_AUTO : OVL_XINO_OFF;
}
/** /**
* ovl_show_options * ovl_show_options
* *
@ -350,6 +374,8 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
if (ofs->config.nfs_export != ovl_nfs_export_def) if (ofs->config.nfs_export != ovl_nfs_export_def)
seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ? seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ?
"on" : "off"); "on" : "off");
if (ofs->config.xino != ovl_xino_def())
seq_printf(m, ",xino=%s", ovl_xino_str[ofs->config.xino]);
return 0; return 0;
} }
@ -384,6 +410,9 @@ enum {
OPT_INDEX_OFF, OPT_INDEX_OFF,
OPT_NFS_EXPORT_ON, OPT_NFS_EXPORT_ON,
OPT_NFS_EXPORT_OFF, OPT_NFS_EXPORT_OFF,
OPT_XINO_ON,
OPT_XINO_OFF,
OPT_XINO_AUTO,
OPT_ERR, OPT_ERR,
}; };
@ -397,6 +426,9 @@ static const match_table_t ovl_tokens = {
{OPT_INDEX_OFF, "index=off"}, {OPT_INDEX_OFF, "index=off"},
{OPT_NFS_EXPORT_ON, "nfs_export=on"}, {OPT_NFS_EXPORT_ON, "nfs_export=on"},
{OPT_NFS_EXPORT_OFF, "nfs_export=off"}, {OPT_NFS_EXPORT_OFF, "nfs_export=off"},
{OPT_XINO_ON, "xino=on"},
{OPT_XINO_OFF, "xino=off"},
{OPT_XINO_AUTO, "xino=auto"},
{OPT_ERR, NULL} {OPT_ERR, NULL}
}; };
@ -511,6 +543,18 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
config->nfs_export = false; config->nfs_export = false;
break; break;
case OPT_XINO_ON:
config->xino = OVL_XINO_ON;
break;
case OPT_XINO_OFF:
config->xino = OVL_XINO_OFF;
break;
case OPT_XINO_AUTO:
config->xino = OVL_XINO_AUTO;
break;
default: default:
pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p); pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p);
return -EINVAL; return -EINVAL;
@ -700,6 +744,7 @@ static int ovl_check_namelen(struct path *path, struct ovl_fs *ofs,
static int ovl_lower_dir(const char *name, struct path *path, static int ovl_lower_dir(const char *name, struct path *path,
struct ovl_fs *ofs, int *stack_depth, bool *remote) struct ovl_fs *ofs, int *stack_depth, bool *remote)
{ {
int fh_type;
int err; int err;
err = ovl_mount_dir_noesc(name, path); err = ovl_mount_dir_noesc(name, path);
@ -719,15 +764,19 @@ static int ovl_lower_dir(const char *name, struct path *path,
* The inodes index feature and NFS export need to encode and decode * The inodes index feature and NFS export need to encode and decode
* file handles, so they require that all layers support them. * file handles, so they require that all layers support them.
*/ */
fh_type = ovl_can_decode_fh(path->dentry->d_sb);
if ((ofs->config.nfs_export || if ((ofs->config.nfs_export ||
(ofs->config.index && ofs->config.upperdir)) && (ofs->config.index && ofs->config.upperdir)) && !fh_type) {
!ovl_can_decode_fh(path->dentry->d_sb)) {
ofs->config.index = false; ofs->config.index = false;
ofs->config.nfs_export = false; ofs->config.nfs_export = false;
pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off,nfs_export=off.\n", pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off,nfs_export=off.\n",
name); name);
} }
/* Check if lower fs has 32bit inode numbers */
if (fh_type != FILEID_INO32_GEN)
ofs->xino_bits = 0;
return 0; return 0;
out_put: out_put:
@ -951,6 +1000,7 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
{ {
struct vfsmount *mnt = ofs->upper_mnt; struct vfsmount *mnt = ofs->upper_mnt;
struct dentry *temp; struct dentry *temp;
int fh_type;
int err; int err;
err = mnt_want_write(mnt); err = mnt_want_write(mnt);
@ -1000,12 +1050,16 @@ static int ovl_make_workdir(struct ovl_fs *ofs, struct path *workpath)
} }
/* Check if upper/work fs supports file handles */ /* Check if upper/work fs supports file handles */
if (ofs->config.index && fh_type = ovl_can_decode_fh(ofs->workdir->d_sb);
!ovl_can_decode_fh(ofs->workdir->d_sb)) { if (ofs->config.index && !fh_type) {
ofs->config.index = false; ofs->config.index = false;
pr_warn("overlayfs: upper fs does not support file handles, falling back to index=off.\n"); pr_warn("overlayfs: upper fs does not support file handles, falling back to index=off.\n");
} }
/* Check if upper fs has 32bit inode numbers */
if (fh_type != FILEID_INO32_GEN)
ofs->xino_bits = 0;
/* NFS export of r/w mount depends on index */ /* NFS export of r/w mount depends on index */
if (ofs->config.nfs_export && !ofs->config.index) { if (ofs->config.nfs_export && !ofs->config.index) {
pr_warn("overlayfs: NFS export requires \"index=on\", falling back to nfs_export=off.\n"); pr_warn("overlayfs: NFS export requires \"index=on\", falling back to nfs_export=off.\n");
@ -1108,6 +1162,35 @@ out:
return err; return err;
} }
/* Get a unique fsid for the layer */
static int ovl_get_fsid(struct ovl_fs *ofs, struct super_block *sb)
{
unsigned int i;
dev_t dev;
int err;
/* fsid 0 is reserved for upper fs even with non upper overlay */
if (ofs->upper_mnt && ofs->upper_mnt->mnt_sb == sb)
return 0;
for (i = 0; i < ofs->numlowerfs; i++) {
if (ofs->lower_fs[i].sb == sb)
return i + 1;
}
err = get_anon_bdev(&dev);
if (err) {
pr_err("overlayfs: failed to get anonymous bdev for lowerpath\n");
return err;
}
ofs->lower_fs[ofs->numlowerfs].sb = sb;
ofs->lower_fs[ofs->numlowerfs].pseudo_dev = dev;
ofs->numlowerfs++;
return ofs->numlowerfs;
}
static int ovl_get_lower_layers(struct ovl_fs *ofs, struct path *stack, static int ovl_get_lower_layers(struct ovl_fs *ofs, struct path *stack,
unsigned int numlower) unsigned int numlower)
{ {
@ -1119,23 +1202,27 @@ static int ovl_get_lower_layers(struct ovl_fs *ofs, struct path *stack,
GFP_KERNEL); GFP_KERNEL);
if (ofs->lower_layers == NULL) if (ofs->lower_layers == NULL)
goto out; goto out;
ofs->lower_fs = kcalloc(numlower, sizeof(struct ovl_sb),
GFP_KERNEL);
if (ofs->lower_fs == NULL)
goto out;
for (i = 0; i < numlower; i++) { for (i = 0; i < numlower; i++) {
struct vfsmount *mnt; struct vfsmount *mnt;
dev_t dev; int fsid;
err = get_anon_bdev(&dev); err = fsid = ovl_get_fsid(ofs, stack[i].mnt->mnt_sb);
if (err) { if (err < 0)
pr_err("overlayfs: failed to get anonymous bdev for lowerpath\n");
goto out; goto out;
}
mnt = clone_private_mount(&stack[i]); mnt = clone_private_mount(&stack[i]);
err = PTR_ERR(mnt); err = PTR_ERR(mnt);
if (IS_ERR(mnt)) { if (IS_ERR(mnt)) {
pr_err("overlayfs: failed to clone lowerpath\n"); pr_err("overlayfs: failed to clone lowerpath\n");
free_anon_bdev(dev);
goto out; goto out;
} }
/* /*
* Make lower layers R/O. That way fchmod/fchown on lower file * Make lower layers R/O. That way fchmod/fchown on lower file
* will fail instead of modifying lower fs. * will fail instead of modifying lower fs.
@ -1143,16 +1230,41 @@ static int ovl_get_lower_layers(struct ovl_fs *ofs, struct path *stack,
mnt->mnt_flags |= MNT_READONLY | MNT_NOATIME; mnt->mnt_flags |= MNT_READONLY | MNT_NOATIME;
ofs->lower_layers[ofs->numlower].mnt = mnt; ofs->lower_layers[ofs->numlower].mnt = mnt;
ofs->lower_layers[ofs->numlower].pseudo_dev = dev;
ofs->lower_layers[ofs->numlower].idx = i + 1; ofs->lower_layers[ofs->numlower].idx = i + 1;
ofs->lower_layers[ofs->numlower].fsid = fsid;
if (fsid) {
ofs->lower_layers[ofs->numlower].fs =
&ofs->lower_fs[fsid - 1];
}
ofs->numlower++; ofs->numlower++;
/* Check if all lower layers are on same sb */
if (i == 0)
ofs->same_sb = mnt->mnt_sb;
else if (ofs->same_sb != mnt->mnt_sb)
ofs->same_sb = NULL;
} }
/*
* When all layers on same fs, overlay can use real inode numbers.
* With mount option "xino=on", mounter declares that there are enough
* free high bits in underlying fs to hold the unique fsid.
* If overlayfs does encounter underlying inodes using the high xino
* bits reserved for fsid, it emits a warning and uses the original
* inode number.
*/
if (!ofs->numlowerfs || (ofs->numlowerfs == 1 && !ofs->upper_mnt)) {
ofs->xino_bits = 0;
ofs->config.xino = OVL_XINO_OFF;
} else if (ofs->config.xino == OVL_XINO_ON && !ofs->xino_bits) {
/*
* This is a roundup of number of bits needed for numlowerfs+1
* (i.e. ilog2(numlowerfs+1 - 1) + 1). fsid 0 is reserved for
* upper fs even with non upper overlay.
*/
BUILD_BUG_ON(ilog2(OVL_MAX_STACK) > 31);
ofs->xino_bits = ilog2(ofs->numlowerfs) + 1;
}
if (ofs->xino_bits) {
pr_info("overlayfs: \"xino\" feature enabled using %d upper inode bits.\n",
ofs->xino_bits);
}
err = 0; err = 0;
out: out:
return err; return err;
@ -1263,6 +1375,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
ofs->config.index = ovl_index_def; ofs->config.index = ovl_index_def;
ofs->config.nfs_export = ovl_nfs_export_def; ofs->config.nfs_export = ovl_nfs_export_def;
ofs->config.xino = ovl_xino_def();
err = ovl_parse_opt((char *) data, &ofs->config); err = ovl_parse_opt((char *) data, &ofs->config);
if (err) if (err)
goto out_err; goto out_err;
@ -1276,6 +1389,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
sb->s_stack_depth = 0; sb->s_stack_depth = 0;
sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_maxbytes = MAX_LFS_FILESIZE;
/* Assume underlaying fs uses 32bit inodes unless proven otherwise */
if (ofs->config.xino != OVL_XINO_OFF)
ofs->xino_bits = BITS_PER_LONG - 32;
if (ofs->config.upperdir) { if (ofs->config.upperdir) {
if (!ofs->config.workdir) { if (!ofs->config.workdir) {
pr_err("overlayfs: missing 'workdir'\n"); pr_err("overlayfs: missing 'workdir'\n");
@ -1305,8 +1422,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
/* If the upper fs is nonexistent, we mark overlayfs r/o too */ /* If the upper fs is nonexistent, we mark overlayfs r/o too */
if (!ofs->upper_mnt) if (!ofs->upper_mnt)
sb->s_flags |= SB_RDONLY; sb->s_flags |= SB_RDONLY;
else if (ofs->upper_mnt->mnt_sb != ofs->same_sb)
ofs->same_sb = NULL;
if (!(ovl_force_readonly(ofs)) && ofs->config.index) { if (!(ovl_force_readonly(ofs)) && ofs->config.index) {
err = ovl_get_indexdir(ofs, oe, &upperpath); err = ovl_get_indexdir(ofs, oe, &upperpath);

View File

@ -47,13 +47,29 @@ struct super_block *ovl_same_sb(struct super_block *sb)
{ {
struct ovl_fs *ofs = sb->s_fs_info; struct ovl_fs *ofs = sb->s_fs_info;
return ofs->same_sb; if (!ofs->numlowerfs)
return ofs->upper_mnt->mnt_sb;
else if (ofs->numlowerfs == 1 && !ofs->upper_mnt)
return ofs->lower_fs[0].sb;
else
return NULL;
} }
bool ovl_can_decode_fh(struct super_block *sb) /*
* Check if underlying fs supports file handles and try to determine encoding
* type, in order to deduce maximum inode number used by fs.
*
* Return 0 if file handles are not supported.
* Return 1 (FILEID_INO32_GEN) if fs uses the default 32bit inode encoding.
* Return -1 if fs uses a non default encoding with unknown inode size.
*/
int ovl_can_decode_fh(struct super_block *sb)
{ {
return (sb->s_export_op && sb->s_export_op->fh_to_dentry && if (!sb->s_export_op || !sb->s_export_op->fh_to_dentry ||
!uuid_is_null(&sb->s_uuid)); uuid_is_null(&sb->s_uuid))
return 0;
return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN;
} }
struct dentry *ovl_indexdir(struct super_block *sb) struct dentry *ovl_indexdir(struct super_block *sb)
@ -172,6 +188,13 @@ struct dentry *ovl_dentry_lower(struct dentry *dentry)
return oe->numlower ? oe->lowerstack[0].dentry : NULL; return oe->numlower ? oe->lowerstack[0].dentry : NULL;
} }
struct ovl_layer *ovl_layer_lower(struct dentry *dentry)
{
struct ovl_entry *oe = dentry->d_fsdata;
return oe->numlower ? oe->lowerstack[0].layer : NULL;
}
struct dentry *ovl_dentry_real(struct dentry *dentry) struct dentry *ovl_dentry_real(struct dentry *dentry)
{ {
return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry); return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry);
@ -279,12 +302,16 @@ void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect)
void ovl_inode_init(struct inode *inode, struct dentry *upperdentry, void ovl_inode_init(struct inode *inode, struct dentry *upperdentry,
struct dentry *lowerdentry) struct dentry *lowerdentry)
{ {
struct inode *realinode = d_inode(upperdentry ?: lowerdentry);
if (upperdentry) if (upperdentry)
OVL_I(inode)->__upperdentry = upperdentry; OVL_I(inode)->__upperdentry = upperdentry;
if (lowerdentry) if (lowerdentry)
OVL_I(inode)->lower = igrab(d_inode(lowerdentry)); OVL_I(inode)->lower = igrab(d_inode(lowerdentry));
ovl_copyattr(d_inode(upperdentry ?: lowerdentry), inode); ovl_copyattr(realinode, inode);
if (!inode->i_ino)
inode->i_ino = realinode->i_ino;
} }
void ovl_inode_update(struct inode *inode, struct dentry *upperdentry) void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
@ -299,6 +326,8 @@ void ovl_inode_update(struct inode *inode, struct dentry *upperdentry)
smp_wmb(); smp_wmb();
OVL_I(inode)->__upperdentry = upperdentry; OVL_I(inode)->__upperdentry = upperdentry;
if (inode_unhashed(inode)) { if (inode_unhashed(inode)) {
if (!inode->i_ino)
inode->i_ino = upperinode->i_ino;
inode->i_private = upperinode; inode->i_private = upperinode;
__insert_inode_hash(inode, (unsigned long) upperinode); __insert_inode_hash(inode, (unsigned long) upperinode);
} }