From 47aaabdedf366ac5894c7fddec388832f0d8193e Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 15 Jul 2020 14:06:21 +0200 Subject: [PATCH 01/37] fanotify: Avoid softlockups when reading many events When user provides large buffer for events and there are lots of events available, we can try to copy them all to userspace without scheduling which can softlockup the kernel (furthermore exacerbated by the contention on notification_lock). Add a scheduling point after copying each event. Note that usually the real underlying problem is the cost of fanotify event merging and the resulting contention on notification_lock but this is a cheap way to somewhat reduce the problem until we can properly address that. Reported-by: Francesco Ruggeri Link: https://lore.kernel.org/lkml/20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com Reviewed-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify_user.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 63b5dffdca9e..d7f63aeca992 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -412,6 +412,11 @@ static ssize_t fanotify_read(struct file *file, char __user *buf, add_wait_queue(&group->notification_waitq, &wait); while (1) { + /* + * User can supply arbitrarily large buffer. Avoid softlockups + * in case there are lots of available events. + */ + cond_resched(); event = get_one_event(group, count); if (IS_ERR(event)) { ret = PTR_ERR(event); From 71d734103edfa2b4c6657578a3082ee0e51d767e Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 8 Jul 2020 14:11:36 +0300 Subject: [PATCH 02/37] fsnotify: Rearrange fast path to minimise overhead when there is no watcher The fsnotify paths are trivial to hit even when there are no watchers and they are surprisingly expensive. For example, every successful vfs_write() hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless FMODE_NONOTIFY is set which is an internal flag invisible to userspace. As it stands, fsnotify_parent is a guaranteed functional call even if there are no watchers and fsnotify() does a substantial amount of unnecessary work before it checks if there are any watchers. A perf profile showed that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the total samples taken in that function during a test. This patch rearranges the fast paths to reduce the amount of work done when there are no watchers. The test motivating this was "perf bench sched messaging --pipe". Despite the fact the pipes are anonymous, fsnotify is still called a lot and the overhead is noticeable even though it's completely pointless. It's likely the overhead is negligible for real IO so this is an extreme example. This is a comparison of hackbench using processes and pipes on a 1-socket machine with 8 CPU threads without fanotify watchers. 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Duration User 157.05 152.79 Duration System 1279.98 1219.32 Duration Elapsed 182.81 174.52 This is showing that the latencies are improved by roughly 2-9%. The variability is not shown but some of these results are within the noise as this workload heavily overloads the machine. That said, the system CPU usage is reduced by quite a bit so it makes sense to avoid the overhead even if it is a bit tricky to detect at times. A perf profile of just 1 group of tasks showed that 5.14% of samples taken were in either fsnotify() or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, mostly function entry and the initial check for watchers. The check for watchers is complicated enough that inlining it may be controversial. [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask Link: https://lore.kernel.org/r/20200708111156.24659-1-amir73il@gmail.com Signed-off-by: Mel Gorman Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fsnotify.c | 27 +++++++++++++++------------ include/linux/fsnotify.h | 10 ++++++++++ include/linux/fsnotify_backend.h | 4 ++-- 3 files changed, 27 insertions(+), 14 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 72d332ce8e12..d59a58d10b84 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -143,7 +143,7 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) } /* Notify this dentry's parent about a child's events. */ -int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, +int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { struct dentry *parent; @@ -174,7 +174,7 @@ int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, return ret; } -EXPORT_SYMBOL_GPL(fsnotify_parent); +EXPORT_SYMBOL_GPL(__fsnotify_parent); static int send_to_group(struct inode *to_tell, __u32 mask, const void *data, @@ -315,17 +315,11 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, struct fsnotify_iter_info iter_info = {}; struct super_block *sb = to_tell->i_sb; struct mount *mnt = NULL; - __u32 mnt_or_sb_mask = sb->s_fsnotify_mask; int ret = 0; - __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); + __u32 test_mask, marks_mask; - if (path) { + if (path) mnt = real_mount(path->mnt); - mnt_or_sb_mask |= mnt->mnt_fsnotify_mask; - } - /* An event "on child" is not intended for a mount/sb mark */ - if (mask & FS_EVENT_ON_CHILD) - mnt_or_sb_mask = 0; /* * Optimization: srcu_read_lock() has a memory barrier which can @@ -337,13 +331,22 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks)) return 0; + + /* An event "on child" is not intended for a mount/sb mark */ + marks_mask = to_tell->i_fsnotify_mask; + if (!(mask & FS_EVENT_ON_CHILD)) { + marks_mask |= sb->s_fsnotify_mask; + if (mnt) + marks_mask |= mnt->mnt_fsnotify_mask; + } + /* * if this is a modify event we may need to clear the ignored masks * otherwise return if neither the inode nor the vfsmount/sb care about * this type of event. */ - if (!(mask & FS_MODIFY) && - !(test_mask & (to_tell->i_fsnotify_mask | mnt_or_sb_mask))) + test_mask = (mask & ALL_FSNOTIFY_EVENTS); + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) return 0; iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 5ab28f6c7d26..508f6bb0b06b 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -44,6 +44,16 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); } +/* Notify this dentry's parent about a child's events. */ +static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, + const void *data, int data_type) +{ + if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + return 0; + + return __fsnotify_parent(dentry, mask, data, data_type); +} + /* * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when * an event is on a file/dentry. diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index f0c506405b54..1626fa7d10ff 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -379,7 +379,7 @@ struct fsnotify_mark { /* main fsnotify call to send events */ extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, const struct qstr *name, u32 cookie); -extern int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, +extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type); extern void __fsnotify_inode_delete(struct inode *inode); extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt); @@ -541,7 +541,7 @@ static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data, return 0; } -static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, +static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { return 0; From c738fbabb0ff62d0f9a9572e56e65d05a1b34c6a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:37 +0300 Subject: [PATCH 03/37] fsnotify: fold fsnotify() call into fsnotify_parent() All (two) callers of fsnotify_parent() also call fsnotify() to notify the child inode. Move the second fsnotify() call into fsnotify_parent(). This will allow more flexibility in making decisions about which of the two event falvors should be sent. Using 'goto notify_child' in the inline helper seems a bit strange, but it mimics the code in __fsnotify_parent() for clarity and the goto pattern will become less strage after following patches are applied. Link: https://lore.kernel.org/r/20200708111156.24659-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fsnotify.c | 27 +++++++++++++++++--------- include/linux/fsnotify.h | 41 +++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 33 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index d59a58d10b84..30628a72ca01 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -142,16 +142,20 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) spin_unlock(&inode->i_lock); } -/* Notify this dentry's parent about a child's events. */ +/* + * Notify this dentry's parent about a child's events with child name info + * if parent is watching. + * Notify also the child without name info if child inode is watching. + */ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, - int data_type) + int data_type) { struct dentry *parent; struct inode *p_inode; int ret = 0; if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) - return 0; + goto notify_child; parent = dget_parent(dentry); p_inode = parent->d_inode; @@ -161,18 +165,23 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, } else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) { struct name_snapshot name; - /* we are notifying a parent so come up with the new mask which - * specifies these are events which came from a child. */ - mask |= FS_EVENT_ON_CHILD; - + /* + * We are notifying a parent, so set a flag in mask to inform + * backend that event has information about a child entry. + */ take_dentry_name_snapshot(&name, dentry); - ret = fsnotify(p_inode, mask, data, data_type, &name.name, 0); + ret = fsnotify(p_inode, mask | FS_EVENT_ON_CHILD, data, + data_type, &name.name, 0); release_dentry_name_snapshot(&name); } dput(parent); - return ret; + if (ret) + return ret; + +notify_child: + return fsnotify(d_inode(dentry), mask, data, data_type, NULL, 0); } EXPORT_SYMBOL_GPL(__fsnotify_parent); diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 508f6bb0b06b..316c9b820517 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -47,45 +47,38 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, /* Notify this dentry's parent about a child's events. */ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) -{ - if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) - return 0; - - return __fsnotify_parent(dentry, mask, data, data_type); -} - -/* - * Simple wrappers to consolidate calls fsnotify_parent()/fsnotify() when - * an event is on a file/dentry. - */ -static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) { struct inode *inode = d_inode(dentry); if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; - fsnotify_parent(dentry, mask, inode, FSNOTIFY_EVENT_INODE); - fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0); + if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + goto notify_child; + + return __fsnotify_parent(dentry, mask, data, data_type); + +notify_child: + return fsnotify(inode, mask, data, data_type, NULL, 0); +} + +/* + * Simple wrappers to consolidate calls to fsnotify_parent() when an event + * is on a file/dentry. + */ +static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) +{ + fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE); } static inline int fsnotify_file(struct file *file, __u32 mask) { const struct path *path = &file->f_path; - struct inode *inode = file_inode(file); - int ret; if (file->f_mode & FMODE_NONOTIFY) return 0; - if (S_ISDIR(inode->i_mode)) - mask |= FS_ISDIR; - - ret = fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); - if (ret) - return ret; - - return fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0); + return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); } /* Simple call site for access decisions */ From cbcf47adc8aadbcaa741391ccfd96f764b50be7e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:38 +0300 Subject: [PATCH 04/37] fsnotify: return non const from fsnotify_data_inode() Return non const inode pointer from fsnotify_data_inode(). None of the fsnotify hooks pass const inode pointer as data and callers often need to cast to a non const pointer. Link: https://lore.kernel.org/r/20200708111156.24659-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 2 +- include/linux/fsnotify_backend.h | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 85eda539b35f..d9fc83dd994a 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -341,7 +341,7 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask, if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) return to_tell; - return (struct inode *)fsnotify_data_inode(data, data_type); + return fsnotify_data_inode(data, data_type); } struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 1626fa7d10ff..97300f3b8ff0 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -220,12 +220,11 @@ enum fsnotify_data_type { FSNOTIFY_EVENT_INODE, }; -static inline const struct inode *fsnotify_data_inode(const void *data, - int data_type) +static inline struct inode *fsnotify_data_inode(const void *data, int data_type) { switch (data_type) { case FSNOTIFY_EVENT_INODE: - return data; + return (struct inode *)data; case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); default: From 9a02aa40dd5a95a62b184365d5b7847e8e6a3c96 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:39 +0300 Subject: [PATCH 05/37] nfsd: use fsnotify_data_inode() to get the unlinked inode The inode argument to handle_event() is about to become obsolete. Link: https://lore.kernel.org/r/20200708111156.24659-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/nfsd/filecache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 82198d747c4c..ace8e5c30952 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -599,11 +599,13 @@ static struct notifier_block nfsd_file_lease_notifier = { static int nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, - struct inode *inode, + struct inode *to_tell, u32 mask, const void *data, int data_type, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { + struct inode *inode = fsnotify_data_inode(data, data_type); + trace_nfsd_file_fsnotify_handle_event(inode, mask); /* Should be no marks on non-regular files */ From 9991bb84b27a2594187898f261866cfc50255454 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:40 +0300 Subject: [PATCH 06/37] kernfs: do not call fsnotify() with name without a parent When creating an FS_MODIFY event on inode itself (not on parent) the file_name argument should be NULL. The change to send a non NULL name to inode itself was done on purpuse as part of another commit, as Tejun writes: "...While at it, supply the target file name to fsnotify() from kernfs_node->name.". But this is wrong practice and inconsistent with inotify behavior when watching a single file. When a child is being watched (as opposed to the parent directory) the inotify event should contain the watch descriptor, but not the file name. Fixes: df6a58c5c5aa ("kernfs: don't depend on d_find_any_alias()...") Link: https://lore.kernel.org/r/20200708111156.24659-5-amir73il@gmail.com Acked-by: Tejun Heo Acked-by: Greg Kroah-Hartman Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/kernfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 06b342d8462b..e23b3f62483c 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -912,7 +912,7 @@ repeat: } fsnotify(inode, FS_MODIFY, inode, FSNOTIFY_EVENT_INODE, - &name, 0); + NULL, 0); iput(inode); } From 956235afd145e60094642fca83486c0f3ccaca16 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:41 +0300 Subject: [PATCH 07/37] inotify: do not use objectid when comparing events inotify's event->wd is the object identifier. Compare that instead of the common fsnotidy event objectid, so we can get rid of the objectid field later. Link: https://lore.kernel.org/r/20200708111156.24659-6-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/inotify/inotify_fsnotify.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 2ebc89047153..9b481460a2dc 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -39,7 +39,7 @@ static bool event_compare(struct fsnotify_event *old_fsn, if (old->mask & FS_IN_IGNORED) return false; if ((old->mask == new->mask) && - (old_fsn->objectid == new_fsn->objectid) && + (old->wd == new->wd) && (old->name_len == new->name_len) && (!old->name_len || !strcmp(old->name, new->name))) return true; @@ -116,7 +116,7 @@ int inotify_handle_event(struct fsnotify_group *group, mask &= ~IN_ISDIR; fsn_event = &event->fse; - fsnotify_init_event(fsn_event, (unsigned long)inode); + fsnotify_init_event(fsn_event, 0); event->mask = mask; event->wd = i_mark->wd; event->sync_cookie = cookie; From b8a6c3a2f0ae4d82732810e55ca8aa455e040bbf Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:42 +0300 Subject: [PATCH 08/37] fanotify: create overflow event type The special overflow event is allocated as struct fanotify_path_event, but with a null path. Use a special event type to identify the overflow event, so the helper fanotify_has_event_path() will always indicate a non null path. Allocating the overflow event doesn't need any of the fancy stuff in fanotify_alloc_event(), so create a simplified helper for allocating the overflow event. There is also no need to store and report the pid with an overflow event. Link: https://lore.kernel.org/r/20200708111156.24659-7-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 27 +++++++++++---------------- fs/notify/fanotify/fanotify.h | 15 +++++++++------ fs/notify/fanotify/fanotify_user.c | 21 ++++++++++++++++----- 3 files changed, 36 insertions(+), 27 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index d9fc83dd994a..921ff05e1877 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -344,11 +344,11 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask, return fsnotify_data_inode(data, data_type); } -struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - struct inode *inode, u32 mask, - const void *data, int data_type, - const struct qstr *file_name, - __kernel_fsid_t *fsid) +static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, + struct inode *inode, u32 mask, + const void *data, int data_type, + const struct qstr *file_name, + __kernel_fsid_t *fsid) { struct fanotify_event *event = NULL; struct fanotify_fid_event *ffe = NULL; @@ -426,8 +426,7 @@ init: * event queue, so event reported on parent is merged with event * reported on child when both directory and child watches exist. */ - fsnotify_init_event(&event->fse, (unsigned long)id); - event->mask = mask; + fanotify_init_event(event, (unsigned long)id, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) event->pid = get_pid(task_pid(current)); else @@ -443,15 +442,8 @@ init: fanotify_encode_fh(fanotify_event_dir_fh(event), id, gfp); if (fanotify_event_has_path(event)) { - struct path *p = fanotify_event_path(event); - - if (path) { - *p = *path; - path_get(path); - } else { - p->mnt = NULL; - p->dentry = NULL; - } + *fanotify_event_path(event) = *path; + path_get(path); } out: memalloc_unuse_memcg(); @@ -640,6 +632,9 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event) case FANOTIFY_EVENT_TYPE_FID_NAME: fanotify_free_name_event(event); break; + case FANOTIFY_EVENT_TYPE_OVERFLOW: + kfree(event); + break; default: WARN_ON_ONCE(1); } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 8ce7ccfc4b0d..1b2a3bbe6008 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -63,6 +63,7 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_FID_NAME, /* variable length */ FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, + FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ }; struct fanotify_event { @@ -72,6 +73,14 @@ struct fanotify_event { struct pid *pid; }; +static inline void fanotify_init_event(struct fanotify_event *event, + unsigned long id, u32 mask) +{ + fsnotify_init_event(&event->fse, id); + event->mask = mask; + event->pid = NULL; +} + struct fanotify_fid_event { struct fanotify_event fae; __kernel_fsid_t fsid; @@ -202,9 +211,3 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) else return NULL; } - -struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - struct inode *inode, u32 mask, - const void *data, int data_type, - const struct qstr *file_name, - __kernel_fsid_t *fsid); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index d7f63aeca992..c9a824e5c045 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -836,13 +836,26 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group, FSNOTIFY_OBJ_TYPE_INODE, mask, flags, fsid); } +static struct fsnotify_event *fanotify_alloc_overflow_event(void) +{ + struct fanotify_event *oevent; + + oevent = kmalloc(sizeof(*oevent), GFP_KERNEL_ACCOUNT); + if (!oevent) + return NULL; + + fanotify_init_event(oevent, 0, FS_Q_OVERFLOW); + oevent->type = FANOTIFY_EVENT_TYPE_OVERFLOW; + + return &oevent->fse; +} + /* fanotify syscalls */ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { struct fsnotify_group *group; int f_flags, fd; struct user_struct *user; - struct fanotify_event *oevent; pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); @@ -897,13 +910,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm); - oevent = fanotify_alloc_event(group, NULL, FS_Q_OVERFLOW, NULL, - FSNOTIFY_EVENT_NONE, NULL, NULL); - if (unlikely(!oevent)) { + group->overflow_event = fanotify_alloc_overflow_event(); + if (unlikely(!group->overflow_event)) { fd = -ENOMEM; goto out_destroy_group; } - group->overflow_event = &oevent->fse; if (force_o_largefile()) event_f_flags |= O_LARGEFILE; From 9c61f3b560f5dd82329322cd7c7131e6b27d3053 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 30 Mar 2020 10:55:28 +0300 Subject: [PATCH 09/37] fanotify: break up fanotify_alloc_event() Break up fanotify_alloc_event() into helpers by event struct type. Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 154 ++++++++++++++++++++-------------- 1 file changed, 89 insertions(+), 65 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 921ff05e1877..41f5fc9a8f19 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -344,15 +344,84 @@ static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask, return fsnotify_data_inode(data, data_type); } +static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, + gfp_t gfp) +{ + struct fanotify_path_event *pevent; + + pevent = kmem_cache_alloc(fanotify_path_event_cachep, gfp); + if (!pevent) + return NULL; + + pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH; + pevent->path = *path; + path_get(path); + + return &pevent->fae; +} + +static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path, + gfp_t gfp) +{ + struct fanotify_perm_event *pevent; + + pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp); + if (!pevent) + return NULL; + + pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH_PERM; + pevent->response = 0; + pevent->state = FAN_EVENT_INIT; + pevent->path = *path; + path_get(path); + + return &pevent->fae; +} + +static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, + __kernel_fsid_t *fsid, + gfp_t gfp) +{ + struct fanotify_fid_event *ffe; + + ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp); + if (!ffe) + return NULL; + + ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; + ffe->fsid = *fsid; + fanotify_encode_fh(&ffe->object_fh, id, gfp); + + return &ffe->fae; +} + +static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, + __kernel_fsid_t *fsid, + const struct qstr *file_name, + gfp_t gfp) +{ + struct fanotify_name_event *fne; + + fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp); + if (!fne) + return NULL; + + fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME; + fne->fsid = *fsid; + fanotify_encode_fh(&fne->dir_fh, id, gfp); + fne->name_len = file_name->len; + strcpy(fne->name, file_name->name); + + return &fne->fae; +} + static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - struct inode *inode, u32 mask, - const void *data, int data_type, - const struct qstr *file_name, - __kernel_fsid_t *fsid) + struct inode *inode, u32 mask, + const void *data, int data_type, + const struct qstr *file_name, + __kernel_fsid_t *fsid) { struct fanotify_event *event = NULL; - struct fanotify_fid_event *ffe = NULL; - struct fanotify_name_event *fne = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); const struct path *path = fsnotify_data_path(data, data_type); @@ -372,55 +441,23 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, memalloc_use_memcg(group->memcg); if (fanotify_is_perm_event(mask)) { - struct fanotify_perm_event *pevent; - - pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp); - if (!pevent) - goto out; - - event = &pevent->fae; - event->type = FANOTIFY_EVENT_TYPE_PATH_PERM; - pevent->response = 0; - pevent->state = FAN_EVENT_INIT; - goto init; - } - - /* - * For FAN_DIR_MODIFY event, we report the fid of the directory and - * the name of the modified entry. - * Allocate an fanotify_name_event struct and copy the name. - */ - if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) { - fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp); - if (!fne) - goto out; - - event = &fne->fae; - event->type = FANOTIFY_EVENT_TYPE_FID_NAME; - fne->name_len = file_name->len; - strcpy(fne->name, file_name->name); - goto init; - } - - if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { - ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp); - if (!ffe) - goto out; - - event = &ffe->fae; - event->type = FANOTIFY_EVENT_TYPE_FID; + event = fanotify_alloc_perm_event(path, gfp); + } else if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) { + /* + * For FAN_DIR_MODIFY event, we report the fid of the directory + * and the name of the modified entry. + * Allocate an fanotify_name_event struct and copy the name. + */ + event = fanotify_alloc_name_event(id, fsid, file_name, gfp); + } else if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + event = fanotify_alloc_fid_event(id, fsid, gfp); } else { - struct fanotify_path_event *pevent; - - pevent = kmem_cache_alloc(fanotify_path_event_cachep, gfp); - if (!pevent) - goto out; - - event = &pevent->fae; - event->type = FANOTIFY_EVENT_TYPE_PATH; + event = fanotify_alloc_path_event(path, gfp); } -init: + if (!event) + goto out; + /* * Use the victim inode instead of the watching inode as the id for * event queue, so event reported on parent is merged with event @@ -432,19 +469,6 @@ init: else event->pid = get_pid(task_tgid(current)); - if (fsid && fanotify_event_fsid(event)) - *fanotify_event_fsid(event) = *fsid; - - if (fanotify_event_object_fh(event)) - fanotify_encode_fh(fanotify_event_object_fh(event), id, gfp); - - if (fanotify_event_dir_fh(event)) - fanotify_encode_fh(fanotify_event_dir_fh(event), id, gfp); - - if (fanotify_event_has_path(event)) { - *fanotify_event_path(event) = *path; - path_get(path); - } out: memalloc_unuse_memcg(); return event; From b54cecf5e2293d15620f7b3f8d1bf486243d5643 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Sun, 7 Jun 2020 12:10:40 +0300 Subject: [PATCH 10/37] fsnotify: pass dir argument to handle_event() callback The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/nfsd/filecache.c | 6 +++--- fs/notify/dnotify/dnotify.c | 8 ++++---- fs/notify/fanotify/fanotify.c | 23 +++++++++++------------ fs/notify/fsnotify.c | 26 ++++++++++++-------------- fs/notify/inotify/inotify.h | 6 +++--- fs/notify/inotify/inotify_fsnotify.c | 7 +++---- fs/notify/inotify/inotify_user.c | 4 ++-- include/linux/fsnotify_backend.h | 16 +++++++++++++--- kernel/audit_fsnotify.c | 10 +++++----- kernel/audit_tree.c | 6 +++--- kernel/audit_watch.c | 6 +++--- 11 files changed, 62 insertions(+), 56 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index ace8e5c30952..bbc7892d2928 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -598,9 +598,9 @@ static struct notifier_block nfsd_file_lease_notifier = { }; static int -nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, - struct inode *to_tell, - u32 mask, const void *data, int data_type, +nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 7a42c2ebe28d..2d2eadfb5186 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -70,9 +70,9 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark) * destroy the dnotify struct if it was not registered to receive multiple * events. */ -static int dnotify_handle_event(struct fsnotify_group *group, - struct inode *inode, - u32 mask, const void *data, int data_type, +static int dnotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { @@ -84,7 +84,7 @@ static int dnotify_handle_event(struct fsnotify_group *group, __u32 test_mask = mask & ~FS_EVENT_ON_CHILD; /* not a dir, dnotify doesn't care */ - if (!S_ISDIR(inode->i_mode)) + if (!dir && !(mask & FS_ISDIR)) return 0; if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 41f5fc9a8f19..e417c64c365b 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -335,11 +335,11 @@ out: * FS_ATTRIB reports the child inode even if reported on a watched parent. * FS_CREATE reports the modified dir inode and not the created inode. */ -static struct inode *fanotify_fid_inode(struct inode *to_tell, u32 event_mask, - const void *data, int data_type) +static struct inode *fanotify_fid_inode(u32 event_mask, const void *data, + int data_type, struct inode *dir) { if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) - return to_tell; + return dir; return fsnotify_data_inode(data, data_type); } @@ -416,14 +416,14 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, } static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - struct inode *inode, u32 mask, - const void *data, int data_type, + u32 mask, const void *data, + int data_type, struct inode *dir, const struct qstr *file_name, __kernel_fsid_t *fsid) { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; - struct inode *id = fanotify_fid_inode(inode, mask, data, data_type); + struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); /* @@ -507,9 +507,9 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) return fsid; } -static int fanotify_handle_event(struct fsnotify_group *group, - struct inode *inode, - u32 mask, const void *data, int data_type, +static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { @@ -546,8 +546,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, if (!mask) return 0; - pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode, - mask); + pr_debug("%s: group=%p mask=%x\n", __func__, group, mask); if (fanotify_is_perm_event(mask)) { /* @@ -565,7 +564,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, return 0; } - event = fanotify_alloc_event(group, inode, mask, data, data_type, + event = fanotify_alloc_event(group, mask, data, data_type, dir, file_name, &fsid); ret = -ENOMEM; if (unlikely(!event)) { diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 30628a72ca01..c4ac4d13e10f 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -185,11 +185,9 @@ notify_child: } EXPORT_SYMBOL_GPL(__fsnotify_parent); -static int send_to_group(struct inode *to_tell, - __u32 mask, const void *data, - int data_is, u32 cookie, - const struct qstr *file_name, - struct fsnotify_iter_info *iter_info) +static int send_to_group(__u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, + u32 cookie, struct fsnotify_iter_info *iter_info) { struct fsnotify_group *group = NULL; __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); @@ -225,15 +223,14 @@ static int send_to_group(struct inode *to_tell, } } - pr_debug("%s: group=%p to_tell=%p mask=%x marks_mask=%x marks_ignored_mask=%x" - " data=%p data_is=%d cookie=%d\n", - __func__, group, to_tell, mask, marks_mask, marks_ignored_mask, - data, data_is, cookie); + pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", + __func__, group, mask, marks_mask, marks_ignored_mask, + data, data_type, dir, cookie); if (!(test_mask & marks_mask & ~marks_ignored_mask)) return 0; - return group->ops->handle_event(group, to_tell, mask, data, data_is, + return group->ops->handle_event(group, mask, data, data_type, dir, file_name, cookie, iter_info); } @@ -317,12 +314,13 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) * out to all of the registered fsnotify_group. Those groups can then use the * notification event in whatever means they feel necessary. */ -int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, +int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, const struct qstr *file_name, u32 cookie) { - const struct path *path = fsnotify_data_path(data, data_is); + const struct path *path = fsnotify_data_path(data, data_type); struct fsnotify_iter_info iter_info = {}; struct super_block *sb = to_tell->i_sb; + struct inode *dir = file_name ? to_tell : NULL; struct mount *mnt = NULL; int ret = 0; __u32 test_mask, marks_mask; @@ -375,8 +373,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is, * That's why this traversal is so complicated... */ while (fsnotify_iter_select_report_types(&iter_info)) { - ret = send_to_group(to_tell, mask, data, data_is, cookie, - file_name, &iter_info); + ret = send_to_group(mask, data, data_type, dir, file_name, + cookie, &iter_info); if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) goto out; diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index 3f246f7b8a92..4327d0e9c364 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -24,9 +24,9 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse) extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group); -extern int inotify_handle_event(struct fsnotify_group *group, - struct inode *inode, - u32 mask, const void *data, int data_type, +extern int inotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 9b481460a2dc..dfd455798a1b 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -55,9 +55,8 @@ static int inotify_merge(struct list_head *list, return event_compare(last_event, event); } -int inotify_handle_event(struct fsnotify_group *group, - struct inode *inode, - u32 mask, const void *data, int data_type, +int inotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { @@ -82,7 +81,7 @@ int inotify_handle_event(struct fsnotify_group *group, alloc_len += len + 1; } - pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode, + pr_debug("%s: group=%p mark=%p mask=%x\n", __func__, group, inode_mark, mask); i_mark = container_of(inode_mark, struct inotify_inode_mark, diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index f88bbcc9efeb..5385d5817dd9 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -490,8 +490,8 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, fsn_mark); /* Queue ignore event for the watch */ - inotify_handle_event(group, NULL, FS_IN_IGNORED, NULL, - FSNOTIFY_EVENT_NONE, NULL, 0, &iter_info); + inotify_handle_event(group, FS_IN_IGNORED, NULL, FSNOTIFY_EVENT_NONE, + NULL, NULL, 0, &iter_info); i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark); /* remove this mark from the idr */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 97300f3b8ff0..0de130cbf72d 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -108,6 +108,17 @@ struct mem_cgroup; * these operations for each relevant group. * * handle_event - main call for a group to handle an fs event + * @group: group to notify + * @mask: event type and flags + * @data: object that event happened on + * @data_type: type of object for fanotify_data_XXX() accessors + * @dir: optional directory associated with event - + * if @file_name is not NULL, this is the directory that + * @file_name is relative to + * @file_name: optional file name associated with event + * @cookie: inotify rename cookie + * @iter_info: array of marks from this group that are interested in the event + * * free_group_priv - called when a group refcnt hits 0 to clean up the private union * freeing_mark - called when a mark is being destroyed for some reason. The group * MUST be holding a reference on each mark and that reference must be @@ -115,9 +126,8 @@ struct mem_cgroup; * userspace messages that marks have been removed. */ struct fsnotify_ops { - int (*handle_event)(struct fsnotify_group *group, - struct inode *inode, - u32 mask, const void *data, int data_type, + int (*handle_event)(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); void (*free_group_priv)(struct fsnotify_group *group); diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 3596448bfdab..30ca239285a3 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -152,11 +152,11 @@ static void audit_autoremove_mark_rule(struct audit_fsnotify_mark *audit_mark) } /* Update mark data in audit rules based on fsnotify events. */ -static int audit_mark_handle_event(struct fsnotify_group *group, - struct inode *to_tell, - u32 mask, const void *data, int data_type, - const struct qstr *dname, u32 cookie, - struct fsnotify_iter_info *iter_info) +static int audit_mark_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, + const struct qstr *dname, u32 cookie, + struct fsnotify_iter_info *iter_info) { struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct audit_fsnotify_mark *audit_mark; diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index e49c912f862d..2ce2ac1ce100 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -1037,9 +1037,9 @@ static void evict_chunk(struct audit_chunk *chunk) audit_schedule_prune(); } -static int audit_tree_handle_event(struct fsnotify_group *group, - struct inode *to_tell, - u32 mask, const void *data, int data_type, +static int audit_tree_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) { diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index e09c551ae52d..61fd601f1edf 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -464,9 +464,9 @@ void audit_remove_watch_rule(struct audit_krule *krule) } /* Update watch data in audit rules based on fsnotify events. */ -static int audit_watch_handle_event(struct fsnotify_group *group, - struct inode *to_tell, - u32 mask, const void *data, int data_type, +static int audit_watch_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *dname, u32 cookie, struct fsnotify_iter_info *iter_info) { From 08b95c338e0c5a96e47f4ca314ea1e7580ecb5d7 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 8 Jul 2020 14:11:52 +0300 Subject: [PATCH 11/37] fanotify: remove event FAN_DIR_MODIFY It was never enabled in uapi and its functionality is about to be superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group flag FAN_REPORT_NAME. Keep a place holder variable name_event instead of removing the name recording code since it will be used by the new events. Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 9 ++------- fs/notify/fsnotify.c | 2 +- include/linux/fsnotify.h | 6 ------ include/linux/fsnotify_backend.h | 4 +--- include/uapi/linux/fanotify.h | 1 - 5 files changed, 4 insertions(+), 18 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index e417c64c365b..e6ba605732d7 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -425,6 +425,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); + bool name_event = false; /* * For queues with unlimited length lost events are not expected and @@ -442,12 +443,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); - } else if (mask & FAN_DIR_MODIFY && !(WARN_ON_ONCE(!file_name))) { - /* - * For FAN_DIR_MODIFY event, we report the fid of the directory - * and the name of the modified entry. - * Allocate an fanotify_name_event struct and copy the name. - */ + } else if (name_event && file_name) { event = fanotify_alloc_name_event(id, fsid, file_name, gfp); } else if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { event = fanotify_alloc_fid_event(id, fsid, gfp); @@ -528,7 +524,6 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_MOVED_FROM != FS_MOVED_FROM); BUILD_BUG_ON(FAN_CREATE != FS_CREATE); BUILD_BUG_ON(FAN_DELETE != FS_DELETE); - BUILD_BUG_ON(FAN_DIR_MODIFY != FS_DIR_MODIFY); BUILD_BUG_ON(FAN_DELETE_SELF != FS_DELETE_SELF); BUILD_BUG_ON(FAN_MOVE_SELF != FS_MOVE_SELF); BUILD_BUG_ON(FAN_EVENT_ON_CHILD != FS_EVENT_ON_CHILD); diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index c4ac4d13e10f..f12a554be3f0 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -393,7 +393,7 @@ static __init int fsnotify_init(void) { int ret; - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 26); + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25); ret = init_srcu_struct(&fsnotify_mark_srcu); if (ret) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 316c9b820517..9b2566d273a9 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -30,12 +30,6 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask, const struct qstr *name, u32 cookie) { fsnotify(dir, mask, child, FSNOTIFY_EVENT_INODE, name, cookie); - /* - * Send another flavor of the event without child inode data and - * without the specific event type (e.g. FS_CREATE|FS_IS_DIR). - * The name is relative to the dir inode the event is reported to. - */ - fsnotify(dir, FS_DIR_MODIFY, dir, FSNOTIFY_EVENT_INODE, name, 0); } static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 0de130cbf72d..94a4ff3d5bbe 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -47,7 +47,6 @@ #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ -#define FS_DIR_MODIFY 0x00080000 /* Directory entry was modified */ #define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ /* This inode cares about things that happen to its children. Always set for @@ -67,8 +66,7 @@ * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event * when a directory entry inside a child subdir changes. */ -#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | \ - FS_DIR_MODIFY) +#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE) #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \ FS_OPEN_EXEC_PERM) diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index a88c7c6d0692..7f2f17eacbf9 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -24,7 +24,6 @@ #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */ #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */ #define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */ -#define FAN_DIR_MODIFY 0x00080000 /* Directory entry was modified */ #define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */ From 0badfa029e5fd6d5462adb767937319335637c83 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:09 +0300 Subject: [PATCH 12/37] fanotify: generalize the handling of extra event flags In fanotify_group_event_mask() there is logic in place to make sure we are not going to handle an event with no type and just FAN_ONDIR flag. Generalize this logic to any FANOTIFY_EVENT_FLAGS. There is only one more flag in this group at the moment - FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to fanotify_alloc_event() when group is reporting fid as indication that event happened on child. We will have use for this indication later on. Link: https://lore.kernel.org/r/20200716084230.30611-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index e6ba605732d7..110835a9bf99 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -211,7 +211,8 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, int data_type) { __u32 marks_mask = 0, marks_ignored_mask = 0; - __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS; + __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | + FANOTIFY_EVENT_FLAGS; const struct path *path = fsnotify_data_path(data, data_type); struct fsnotify_mark *mark; int type; @@ -264,14 +265,18 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, * * For backward compatibility and consistency, do not report FAN_ONDIR * to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR - * to user in FAN_REPORT_FID mode for all event types. + * to user in fid mode for all event types. + * + * We never report FAN_EVENT_ON_CHILD to user, but we do pass it in to + * fanotify_alloc_event() when group is reporting fid as indication + * that event happened on child. */ if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { - /* Do not report FAN_ONDIR without any event */ - if (!(test_mask & ~FAN_ONDIR)) + /* Do not report event flags without any event */ + if (!(test_mask & ~FANOTIFY_EVENT_FLAGS)) return 0; } else { - user_mask &= ~FAN_ONDIR; + user_mask &= ~FANOTIFY_EVENT_FLAGS; } return test_mask & user_mask; From 103ff6a554923e0da54e0a28a686c91facf5bffd Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:10 +0300 Subject: [PATCH 13/37] fanotify: generalize merge logic of events on dir An event on directory should never be merged with an event on non-directory regardless of the event struct type. This change has no visible effect, because currently, with struct fanotify_path_event, the relevant events will not be merged because event path of dir will be different than event path of non-dir. Link: https://lore.kernel.org/r/20200716084230.30611-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 110835a9bf99..84c86a45874c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -83,22 +83,22 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, old->type != new->type || old->pid != new->pid) return false; + /* + * We want to merge many dirent events in the same dir (i.e. + * creates/unlinks/renames), but we do not want to merge dirent + * events referring to subdirs with dirent events referring to + * non subdirs, otherwise, user won't be able to tell from a + * mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+ + * unlink pair or rmdir+create pair of events. + */ + if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) + return false; + switch (old->type) { case FANOTIFY_EVENT_TYPE_PATH: return fanotify_path_equal(fanotify_event_path(old), fanotify_event_path(new)); case FANOTIFY_EVENT_TYPE_FID: - /* - * We want to merge many dirent events in the same dir (i.e. - * creates/unlinks/renames), but we do not want to merge dirent - * events referring to subdirs with dirent events referring to - * non subdirs, otherwise, user won't be able to tell from a - * mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+ - * unlink pair or rmdir+create pair of events. - */ - if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) - return false; - return fanotify_fid_event_equal(FANOTIFY_FE(old), FANOTIFY_FE(new)); case FANOTIFY_EVENT_TYPE_FID_NAME: From 6ad1aadd970460d81d5e803ac38c0f63f85c0a00 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:11 +0300 Subject: [PATCH 14/37] fanotify: distinguish between fid encode error and null fid In fanotify_encode_fh(), both cases of NULL inode and failure to encode ended up with fh type FILEID_INVALID. Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT. This is just a semantic difference at this point. Remove stale comment and unneeded check from fid event compare helpers. Link: https://lore.kernel.org/r/20200716084230.30611-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 84c86a45874c..3dc71a8e795a 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -34,10 +34,6 @@ static bool fanotify_fh_equal(struct fanotify_fh *fh1, if (fh1->type != fh2->type || fh1->len != fh2->len) return false; - /* Do not merge events if we failed to encode fh */ - if (fh1->type == FILEID_INVALID) - return false; - return !fh1->len || !memcmp(fanotify_fh_buf(fh1), fanotify_fh_buf(fh2), fh1->len); } @@ -56,10 +52,7 @@ static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1, static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, struct fanotify_name_event *fne2) { - /* - * Do not merge name events without dir fh. - * FAN_DIR_MODIFY does not encode object fh, so it may be empty. - */ + /* Do not merge name events without dir fh */ if (!fne1->dir_fh.len) return false; @@ -290,8 +283,10 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, void *buf = fh->buf; int err; + fh->type = FILEID_ROOT; + fh->len = 0; if (!inode) - goto out; + return; dwords = 0; err = -ENOENT; @@ -326,7 +321,6 @@ out_err: type, bytes, err); kfree(ext_buf); *fanotify_fh_ext_buf_ptr(fh) = NULL; -out: /* Report the event without a file identifier on encode error */ fh->type = FILEID_INVALID; fh->len = 0; From d809daf1b6add51eec001bf60b17885d697a299d Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:12 +0300 Subject: [PATCH 15/37] fanotify: generalize test for FAN_REPORT_FID As preparation for new flags that report fids, define a bit set of flags for a group reporting fids, currently containing the only bit FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 10 ++++++---- fs/notify/fanotify/fanotify_user.c | 12 ++++++------ include/linux/fanotify.h | 6 ++++-- 3 files changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 3dc71a8e795a..b8c04a6f04c5 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -207,13 +207,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | FANOTIFY_EVENT_FLAGS; const struct path *path = fsnotify_data_path(data, data_type); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct fsnotify_mark *mark; int type; pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n", __func__, iter_info->report_mask, event_mask, data, data_type); - if (!FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + if (!fid_mode) { /* Do we have path to open a file descriptor? */ if (!path) return 0; @@ -264,7 +265,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, * fanotify_alloc_event() when group is reporting fid as indication * that event happened on child. */ - if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + if (fid_mode) { /* Do not report event flags without any event */ if (!(test_mask & ~FANOTIFY_EVENT_FLAGS)) return 0; @@ -424,6 +425,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); bool name_event = false; /* @@ -444,7 +446,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event = fanotify_alloc_perm_event(path, gfp); } else if (name_event && file_name) { event = fanotify_alloc_name_event(id, fsid, file_name, gfp); - } else if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + } else if (fid_mode) { event = fanotify_alloc_fid_event(id, fsid, gfp); } else { event = fanotify_alloc_path_event(path, gfp); @@ -551,7 +553,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, return 0; } - if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) { fsid = fanotify_get_fsid(iter_info); /* Racing with mark destruction or creation? */ if (!fsid.val[0] && !fsid.val[1]) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index c9a824e5c045..1e04caf8d6ba 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -100,7 +100,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, if (fsnotify_notify_queue_is_empty(group)) goto out; - if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) { event_size += fanotify_event_info_len( FANOTIFY_E(fsnotify_peek_first_event(group))); } @@ -882,7 +882,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return -EINVAL; } - if ((flags & FAN_REPORT_FID) && + if ((flags & FANOTIFY_FID_BITS) && (flags & FANOTIFY_CLASS_BITS) != FAN_CLASS_NOTIF) return -EINVAL; @@ -1040,7 +1040,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; - unsigned int obj_type; + unsigned int obj_type, fid_mode; int ret; pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n", @@ -1113,9 +1113,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, * inode events are not supported on a mount mark, because they do not * carry enough information (i.e. path) to be filtered by mount point. */ + fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); if (mask & FANOTIFY_INODE_EVENTS && - (!FAN_GROUP_FLAG(group, FAN_REPORT_FID) || - mark_type == FAN_MARK_MOUNT)) + (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out; if (flags & FAN_MARK_FLUSH) { @@ -1140,7 +1140,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto path_put_and_out; } - if (FAN_GROUP_FLAG(group, FAN_REPORT_FID)) { + if (fid_mode) { ret = fanotify_test_fid(&path, &__fsid); if (ret) goto path_put_and_out; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index b79fa9bb7359..bbbee11d2521 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -18,8 +18,10 @@ #define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT) -#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | \ - FAN_REPORT_TID | FAN_REPORT_FID | \ +#define FANOTIFY_FID_BITS (FAN_REPORT_FID) + +#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ + FAN_REPORT_TID | \ FAN_CLOEXEC | FAN_NONBLOCK | \ FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS) From 3ef866536645ce1b4f0516e8723d803d0fa6ca26 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:13 +0300 Subject: [PATCH 16/37] fanotify: mask out special event flags from ignored mask The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had any meaning in ignored mask. Mask them out explicitly. Link: https://lore.kernel.org/r/20200716084230.30611-6-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify_user.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 1e04caf8d6ba..6d30beb320f3 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1040,6 +1040,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; + bool ignored = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; int ret; @@ -1087,6 +1088,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL; + /* Event flags (ONDIR, ON_CHILD) are meaningless in ignored mask */ + if (ignored) + mask &= ~FANOTIFY_EVENT_FLAGS; + f = fdget(fanotify_fd); if (unlikely(!f.file)) return -EBADF; From 4ed6814a91cc71202e3de12cd1cb7cc59f3cee57 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:14 +0300 Subject: [PATCH 17/37] fanotify: prepare for implicit event flags in mark mask So far, all flags that can be set in an fanotify mark mask can be set explicitly by a call to fanotify_mark(2). Prepare for defining implicit event flags that cannot be set by user with fanotify_mark(2), similar to how inotify/dnotify implicitly set the FS_EVENT_ON_CHILD flag. Implicit event flags cannot be removed by user and mark gets destroyed when only implicit event flags remain in the mask. Link: https://lore.kernel.org/r/20200716084230.30611-7-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify_user.c | 40 ++++++++++++++++++------------ 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6d30beb320f3..ab974cd234f7 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -656,12 +656,13 @@ out: } static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, - unsigned int flags, - int *destroy) + __u32 mask, unsigned int flags, + __u32 umask, int *destroy) { __u32 oldmask = 0; + /* umask bits cannot be removed by user */ + mask &= ~umask; spin_lock(&fsn_mark->lock); if (!(flags & FAN_MARK_IGNORED_MASK)) { oldmask = fsn_mark->mask; @@ -669,7 +670,13 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, } else { fsn_mark->ignored_mask &= ~mask; } - *destroy = !(fsn_mark->mask | fsn_mark->ignored_mask); + /* + * We need to keep the mark around even if remaining mask cannot + * result in any events (e.g. mask == FAN_ONDIR) to support incremenal + * changes to the mask. + * Destroy mark when only umask bits remain. + */ + *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); spin_unlock(&fsn_mark->lock); return mask & oldmask; @@ -677,7 +684,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, static int fanotify_remove_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, __u32 mask, - unsigned int flags) + unsigned int flags, __u32 umask) { struct fsnotify_mark *fsn_mark = NULL; __u32 removed; @@ -691,7 +698,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group, } removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags, - &destroy_mark); + umask, &destroy_mark); if (removed & fsnotify_conn_mask(fsn_mark->connector)) fsnotify_recalc_mask(fsn_mark->connector); if (destroy_mark) @@ -707,25 +714,26 @@ static int fanotify_remove_mark(struct fsnotify_group *group, static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt, __u32 mask, - unsigned int flags) + unsigned int flags, __u32 umask) { return fanotify_remove_mark(group, &real_mount(mnt)->mnt_fsnotify_marks, - mask, flags); + mask, flags, umask); } static int fanotify_remove_sb_mark(struct fsnotify_group *group, - struct super_block *sb, __u32 mask, - unsigned int flags) + struct super_block *sb, __u32 mask, + unsigned int flags, __u32 umask) { - return fanotify_remove_mark(group, &sb->s_fsnotify_marks, mask, flags); + return fanotify_remove_mark(group, &sb->s_fsnotify_marks, mask, + flags, umask); } static int fanotify_remove_inode_mark(struct fsnotify_group *group, struct inode *inode, __u32 mask, - unsigned int flags) + unsigned int flags, __u32 umask) { return fanotify_remove_mark(group, &inode->i_fsnotify_marks, mask, - flags); + flags, umask); } static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, @@ -1175,13 +1183,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, case FAN_MARK_REMOVE: if (mark_type == FAN_MARK_MOUNT) ret = fanotify_remove_vfsmount_mark(group, mnt, mask, - flags); + flags, 0); else if (mark_type == FAN_MARK_FILESYSTEM) ret = fanotify_remove_sb_mark(group, mnt->mnt_sb, mask, - flags); + flags, 0); else ret = fanotify_remove_inode_mark(group, inode, mask, - flags); + flags, 0); break; default: ret = -EINVAL; From 85af5d9258cc5862167c578c63c65ac700a3fa19 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:15 +0300 Subject: [PATCH 18/37] fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on sb/mount marks and non-directory inode mask, but the flag was ignored. Mask out the flag if it is provided by user on sb/mount/non-dir marks and define it as an implicit flag that cannot be removed by user. This flag is going to be used internally to request for events with parent and name info. Link: https://lore.kernel.org/r/20200716084230.30611-8-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify_user.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index ab974cd234f7..16d70a8e90f9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1050,6 +1050,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; bool ignored = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; + u32 umask = 0; int ret; pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n", @@ -1167,6 +1168,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt; + /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ + if (mnt || !S_ISDIR(inode->i_mode)) { + mask &= ~FAN_EVENT_ON_CHILD; + umask = FAN_EVENT_ON_CHILD; + } + /* create/update an inode mark */ switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) { case FAN_MARK_ADD: @@ -1183,13 +1190,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, case FAN_MARK_REMOVE: if (mark_type == FAN_MARK_MOUNT) ret = fanotify_remove_vfsmount_mark(group, mnt, mask, - flags, 0); + flags, umask); else if (mark_type == FAN_MARK_FILESYSTEM) ret = fanotify_remove_sb_mark(group, mnt->mnt_sb, mask, - flags, 0); + flags, umask); else ret = fanotify_remove_inode_mark(group, inode, mask, - flags, 0); + flags, umask); break; default: ret = -EINVAL; From 6ba8d7107f27c1bde60a80bc5def027979af3e8e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:16 +0300 Subject: [PATCH 19/37] fsnotify: add object type "child" to object type iterator The object type iterator is used to collect all the marks of a specific group that have interest in an event. It is used by fanotify to get a single handle_event callback when an event has a match to either of inode/sb/mount marks of the group. The nature of fsnotify events is that they are associated with at most one sb at most one mount and at most one inode. When a parent and child are both watching, two events are sent to backend, one associated to parent inode and one associated to the child inode. This results in duplicate events in fanotify, which usually get merged before user reads them, but this is sub-optimal. It would be better if the same event is sent to backend with an object type iterator that has both the child inode and its parent, and let the backend decide if the event should be reported once (fanotify) or twice (inotify). Link: https://lore.kernel.org/r/20200716084230.30611-9-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 94a4ff3d5bbe..d22519001027 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -253,6 +253,7 @@ static inline const struct path *fsnotify_data_path(const void *data, enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_INODE, + FSNOTIFY_OBJ_TYPE_CHILD, FSNOTIFY_OBJ_TYPE_VFSMOUNT, FSNOTIFY_OBJ_TYPE_SB, FSNOTIFY_OBJ_TYPE_COUNT, @@ -260,6 +261,7 @@ enum fsnotify_obj_type { }; #define FSNOTIFY_OBJ_TYPE_INODE_FL (1U << FSNOTIFY_OBJ_TYPE_INODE) +#define FSNOTIFY_OBJ_TYPE_CHILD_FL (1U << FSNOTIFY_OBJ_TYPE_CHILD) #define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL (1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT) #define FSNOTIFY_OBJ_TYPE_SB_FL (1U << FSNOTIFY_OBJ_TYPE_SB) #define FSNOTIFY_OBJ_ALL_TYPES_MASK ((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1) @@ -304,6 +306,7 @@ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ } FSNOTIFY_ITER_FUNCS(inode, INODE) +FSNOTIFY_ITER_FUNCS(child, CHILD) FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) FSNOTIFY_ITER_FUNCS(sb, SB) From f454fa610a69b93e7ed4de739e736f988d95c2d2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:17 +0300 Subject: [PATCH 20/37] fanotify: use struct fanotify_info to parcel the variable size buffer An fanotify event name is always recorded relative to a dir fh. Encapsulate the name_len member of fanotify_name_event in a new struct fanotify_info, which describes the parceling of the variable size buffer of an fanotify_name_event. The dir_fh member of fanotify_name_event is renamed to _dir_fh and is not accessed directly, but via the fanotify_info_dir_fh() accessor. Although the dir_fh len information is already available in struct fanotify_fh, we store it also in dif_fh_totlen member of fanotify_info, including the size of fanotify_fh header, so we know the offset of the name in the buffer without looking inside the dir_fh. We also add a file_fh_totlen member to allow packing another file handle in the variable size buffer after the dir_fh and before the name. We are going to use that space to store the child fid. Link: https://lore.kernel.org/r/20200716084230.30611-10-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 66 ++++++++++++++++------ fs/notify/fanotify/fanotify.h | 91 +++++++++++++++++++++++++----- fs/notify/fanotify/fanotify_user.c | 25 ++++---- 3 files changed, 139 insertions(+), 43 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index b8c04a6f04c5..31fd41e91575 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -49,22 +49,44 @@ static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1, fanotify_fh_equal(&ffe1->object_fh, &ffe2->object_fh); } +static bool fanotify_info_equal(struct fanotify_info *info1, + struct fanotify_info *info2) +{ + if (info1->dir_fh_totlen != info2->dir_fh_totlen || + info1->file_fh_totlen != info2->file_fh_totlen || + info1->name_len != info2->name_len) + return false; + + if (info1->dir_fh_totlen && + !fanotify_fh_equal(fanotify_info_dir_fh(info1), + fanotify_info_dir_fh(info2))) + return false; + + if (info1->file_fh_totlen && + !fanotify_fh_equal(fanotify_info_file_fh(info1), + fanotify_info_file_fh(info2))) + return false; + + return !info1->name_len || + !memcmp(fanotify_info_name(info1), fanotify_info_name(info2), + info1->name_len); +} + static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, struct fanotify_name_event *fne2) { + struct fanotify_info *info1 = &fne1->info; + struct fanotify_info *info2 = &fne2->info; + /* Do not merge name events without dir fh */ - if (!fne1->dir_fh.len) + if (!info1->dir_fh_totlen) return false; - if (fne1->name_len != fne2->name_len || - !fanotify_fh_equal(&fne1->dir_fh, &fne2->dir_fh)) - return false; - - return !memcmp(fne1->name, fne2->name, fne1->name_len); + return fanotify_info_equal(info1, info2); } static bool fanotify_should_merge(struct fsnotify_event *old_fsn, - struct fsnotify_event *new_fsn) + struct fsnotify_event *new_fsn) { struct fanotify_event *old, *new; @@ -276,8 +298,14 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return test_mask & user_mask; } -static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, - gfp_t gfp) +/* + * Encode fanotify_fh. + * + * Return total size of encoded fh including fanotify_fh header. + * Return 0 on failure to encode. + */ +static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, + gfp_t gfp) { int dwords, type, bytes = 0; char *ext_buf = NULL; @@ -287,7 +315,7 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = FILEID_ROOT; fh->len = 0; if (!inode) - return; + return 0; dwords = 0; err = -ENOENT; @@ -315,7 +343,7 @@ static void fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = bytes; - return; + return FANOTIFY_FH_HDR_LEN + bytes; out_err: pr_warn_ratelimited("fanotify: failed to encode fid (type=%d, len=%d, err=%i)\n", @@ -325,6 +353,7 @@ out_err: /* Report the event without a file identifier on encode error */ fh->type = FILEID_INVALID; fh->len = 0; + return 0; } /* @@ -401,6 +430,8 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, gfp_t gfp) { struct fanotify_name_event *fne; + struct fanotify_info *info; + struct fanotify_fh *dfh; fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp); if (!fne) @@ -408,9 +439,11 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME; fne->fsid = *fsid; - fanotify_encode_fh(&fne->dir_fh, id, gfp); - fne->name_len = file_name->len; - strcpy(fne->name, file_name->name); + info = &fne->info; + fanotify_info_init(info); + dfh = fanotify_info_dir_fh(info); + info->dir_fh_totlen = fanotify_encode_fh(dfh, id, gfp); + fanotify_info_copy_name(info, file_name); return &fne->fae; } @@ -626,9 +659,10 @@ static void fanotify_free_fid_event(struct fanotify_event *event) static void fanotify_free_name_event(struct fanotify_event *event) { struct fanotify_name_event *fne = FANOTIFY_NE(event); + struct fanotify_fh *dfh = fanotify_info_dir_fh(&fne->info); - if (fanotify_fh_has_ext_buf(&fne->dir_fh)) - kfree(fanotify_fh_ext_buf(&fne->dir_fh)); + if (fanotify_fh_has_ext_buf(dfh)) + kfree(fanotify_fh_ext_buf(dfh)); kfree(fne); } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 1b2a3bbe6008..5e104fc56abb 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -23,11 +23,29 @@ enum { * stored in either the first or last 2 dwords. */ #define FANOTIFY_INLINE_FH_LEN (3 << 2) +#define FANOTIFY_FH_HDR_LEN offsetof(struct fanotify_fh, buf) +/* Fixed size struct for file handle */ struct fanotify_fh { - unsigned char buf[FANOTIFY_INLINE_FH_LEN]; u8 type; u8 len; + u8 pad[2]; + unsigned char buf[FANOTIFY_INLINE_FH_LEN]; +} __aligned(4); + +/* Variable size struct for dir file handle + child file handle + name */ +struct fanotify_info { + /* size of dir_fh/file_fh including fanotify_fh hdr size */ + u8 dir_fh_totlen; + u8 file_fh_totlen; + u8 name_len; + u8 pad; + unsigned char buf[]; + /* + * (struct fanotify_fh) dir_fh starts at buf[0] + * (optional) file_fh starts at buf[dir_fh_totlen] + * name starts at buf[dir_fh_totlen + file_fh_totlen] + */ } __aligned(4); static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -37,6 +55,7 @@ static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) static inline char **fanotify_fh_ext_buf_ptr(struct fanotify_fh *fh) { + BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN % 4); BUILD_BUG_ON(__alignof__(char *) - 4 + sizeof(char *) > FANOTIFY_INLINE_FH_LEN); return (char **)ALIGN((unsigned long)(fh->buf), __alignof__(char *)); @@ -52,6 +71,56 @@ static inline void *fanotify_fh_buf(struct fanotify_fh *fh) return fanotify_fh_has_ext_buf(fh) ? fanotify_fh_ext_buf(fh) : fh->buf; } +static inline int fanotify_info_dir_fh_len(struct fanotify_info *info) +{ + if (!info->dir_fh_totlen || + WARN_ON_ONCE(info->dir_fh_totlen < FANOTIFY_FH_HDR_LEN)) + return 0; + + return info->dir_fh_totlen - FANOTIFY_FH_HDR_LEN; +} + +static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *info) +{ + BUILD_BUG_ON(offsetof(struct fanotify_info, buf) % 4); + + return (struct fanotify_fh *)info->buf; +} + +static inline int fanotify_info_file_fh_len(struct fanotify_info *info) +{ + if (!info->file_fh_totlen || + WARN_ON_ONCE(info->file_fh_totlen < FANOTIFY_FH_HDR_LEN)) + return 0; + + return info->file_fh_totlen - FANOTIFY_FH_HDR_LEN; +} + +static inline struct fanotify_fh *fanotify_info_file_fh(struct fanotify_info *info) +{ + return (struct fanotify_fh *)(info->buf + info->dir_fh_totlen); +} + +static inline const char *fanotify_info_name(struct fanotify_info *info) +{ + return info->buf + info->dir_fh_totlen + info->file_fh_totlen; +} + +static inline void fanotify_info_init(struct fanotify_info *info) +{ + info->dir_fh_totlen = 0; + info->file_fh_totlen = 0; + info->name_len = 0; +} + +static inline void fanotify_info_copy_name(struct fanotify_info *info, + const struct qstr *name) +{ + info->name_len = name->len; + strcpy(info->buf + info->dir_fh_totlen + info->file_fh_totlen, + name->name); +} + /* * Common structure for fanotify events. Concrete structs are allocated in * fanotify_handle_event() and freed when the information is retrieved by @@ -96,9 +165,9 @@ FANOTIFY_FE(struct fanotify_event *event) struct fanotify_name_event { struct fanotify_event fae; __kernel_fsid_t fsid; - struct fanotify_fh dir_fh; - u8 name_len; - char name[]; + struct fanotify_info info; + /* Reserve space in info.buf[] - access with fanotify_info_dir_fh() */ + struct fanotify_fh _dir_fh; }; static inline struct fanotify_name_event * @@ -126,11 +195,11 @@ static inline struct fanotify_fh *fanotify_event_object_fh( return NULL; } -static inline struct fanotify_fh *fanotify_event_dir_fh( +static inline struct fanotify_info *fanotify_event_info( struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) - return &FANOTIFY_NE(event)->dir_fh; + return &FANOTIFY_NE(event)->info; else return NULL; } @@ -142,15 +211,11 @@ static inline int fanotify_event_object_fh_len(struct fanotify_event *event) return fh ? fh->len : 0; } -static inline bool fanotify_event_has_name(struct fanotify_event *event) +static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) { - return event->type == FANOTIFY_EVENT_TYPE_FID_NAME; -} + struct fanotify_info *info = fanotify_event_info(event); -static inline int fanotify_event_name_len(struct fanotify_event *event) -{ - return fanotify_event_has_name(event) ? - FANOTIFY_NE(event)->name_len : 0; + return info ? fanotify_info_dir_fh_len(info) : 0; } struct fanotify_path_event { diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 16d70a8e90f9..3842ef00b52e 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -66,19 +66,17 @@ static int fanotify_fid_info_len(int fh_len, int name_len) static int fanotify_event_info_len(struct fanotify_event *event) { - int info_len = 0; + struct fanotify_info *info = fanotify_event_info(event); + int dir_fh_len = fanotify_event_dir_fh_len(event); int fh_len = fanotify_event_object_fh_len(event); + int info_len = 0; + + if (dir_fh_len) + info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); if (fh_len) info_len += fanotify_fid_info_len(fh_len, 0); - if (fanotify_event_name_len(event)) { - struct fanotify_name_event *fne = FANOTIFY_NE(event); - - info_len += fanotify_fid_info_len(fne->dir_fh.len, - fne->name_len); - } - return info_len; } @@ -305,6 +303,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, { struct fanotify_event_metadata metadata; struct path *path = fanotify_event_path(event); + struct fanotify_info *info = fanotify_event_info(event); struct file *f = NULL; int ret, fd = FAN_NOFD; @@ -346,13 +345,11 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, fd_install(fd, f); /* Event info records order is: dir fid + name, child fid */ - if (fanotify_event_name_len(event)) { - struct fanotify_name_event *fne = FANOTIFY_NE(event); - + if (fanotify_event_dir_fh_len(event)) { ret = copy_info_to_user(fanotify_event_fsid(event), - fanotify_event_dir_fh(event), - fne->name, fne->name_len, - buf, count); + fanotify_info_dir_fh(info), + fanotify_info_name(info), + info->name_len, buf, count); if (ret < 0) return ret; From f35c41567867e06bb0a1d6b74c5bf323be1026e5 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:18 +0300 Subject: [PATCH 21/37] fanotify: no external fh buffer in fanotify_name_event The fanotify_fh struct has an inline buffer of size 12 which is enough to store the most common local filesystem file handles (e.g. ext4, xfs). For file handles that do not fit in the inline buffer (e.g. btrfs), an external buffer is allocated to store the file handle. When allocating a variable size fanotify_name_event, there is no point in allocating also an external fh buffer when file handle does not fit in the inline buffer. Check required size for encoding fh, preallocate an event buffer sufficient to contain both file handle and name and store the name after the file handle. At this time, when not reporting name in event, we still allocate the fixed size fanotify_fid_event and an external buffer for large file handles, but fanotify_alloc_name_event() has already been prepared to accept a NULL file_name. Link: https://lore.kernel.org/r/20200716084230.30611-11-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 75 ++++++++++++++++++++++++----------- fs/notify/fanotify/fanotify.h | 12 +++--- 2 files changed, 59 insertions(+), 28 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 31fd41e91575..c107974d6830 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -298,6 +298,24 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return test_mask & user_mask; } +/* + * Check size needed to encode fanotify_fh. + * + * Return size of encoded fh without fanotify_fh header. + * Return 0 on failure to encode. + */ +static int fanotify_encode_fh_len(struct inode *inode) +{ + int dwords = 0; + + if (!inode) + return 0; + + exportfs_encode_inode_fh(inode, NULL, &dwords, NULL); + + return dwords << 2; +} + /* * Encode fanotify_fh. * @@ -305,49 +323,54 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, * Return 0 on failure to encode. */ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, - gfp_t gfp) + unsigned int fh_len, gfp_t gfp) { - int dwords, type, bytes = 0; + int dwords, type = 0; char *ext_buf = NULL; void *buf = fh->buf; int err; fh->type = FILEID_ROOT; fh->len = 0; + fh->flags = 0; if (!inode) return 0; - dwords = 0; + /* + * !gpf means preallocated variable size fh, but fh_len could + * be zero in that case if encoding fh len failed. + */ err = -ENOENT; - type = exportfs_encode_inode_fh(inode, NULL, &dwords, NULL); - if (!dwords) + if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4)) goto out_err; - bytes = dwords << 2; - if (bytes > FANOTIFY_INLINE_FH_LEN) { - /* Treat failure to allocate fh as failure to allocate event */ + /* No external buffer in a variable size allocated fh */ + if (gfp && fh_len > FANOTIFY_INLINE_FH_LEN) { + /* Treat failure to allocate fh as failure to encode fh */ err = -ENOMEM; - ext_buf = kmalloc(bytes, gfp); + ext_buf = kmalloc(fh_len, gfp); if (!ext_buf) goto out_err; *fanotify_fh_ext_buf_ptr(fh) = ext_buf; buf = ext_buf; + fh->flags |= FANOTIFY_FH_FLAG_EXT_BUF; } + dwords = fh_len >> 2; type = exportfs_encode_inode_fh(inode, buf, &dwords, NULL); err = -EINVAL; - if (!type || type == FILEID_INVALID || bytes != dwords << 2) + if (!type || type == FILEID_INVALID || fh_len != dwords << 2) goto out_err; fh->type = type; - fh->len = bytes; + fh->len = fh_len; - return FANOTIFY_FH_HDR_LEN + bytes; + return FANOTIFY_FH_HDR_LEN + fh_len; out_err: pr_warn_ratelimited("fanotify: failed to encode fid (type=%d, len=%d, err=%i)\n", - type, bytes, err); + type, fh_len, err); kfree(ext_buf); *fanotify_fh_ext_buf_ptr(fh) = NULL; /* Report the event without a file identifier on encode error */ @@ -419,7 +442,8 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; ffe->fsid = *fsid; - fanotify_encode_fh(&ffe->object_fh, id, gfp); + fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), + gfp); return &ffe->fae; } @@ -432,8 +456,13 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh; + unsigned int dir_fh_len = fanotify_encode_fh_len(id); + unsigned int size; - fne = kmalloc(sizeof(*fne) + file_name->len + 1, gfp); + size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; + if (file_name) + size += file_name->len + 1; + fne = kmalloc(size, gfp); if (!fne) return NULL; @@ -442,8 +471,13 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, info = &fne->info; fanotify_info_init(info); dfh = fanotify_info_dir_fh(info); - info->dir_fh_totlen = fanotify_encode_fh(dfh, id, gfp); - fanotify_info_copy_name(info, file_name); + info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, 0); + if (file_name) + fanotify_info_copy_name(info, file_name); + + pr_debug("%s: ino=%lu size=%u dir_fh_len=%u name_len=%u name='%.*s'\n", + __func__, id->i_ino, size, dir_fh_len, + info->name_len, info->name_len, fanotify_info_name(info)); return &fne->fae; } @@ -658,12 +692,7 @@ static void fanotify_free_fid_event(struct fanotify_event *event) static void fanotify_free_name_event(struct fanotify_event *event) { - struct fanotify_name_event *fne = FANOTIFY_NE(event); - struct fanotify_fh *dfh = fanotify_info_dir_fh(&fne->info); - - if (fanotify_fh_has_ext_buf(dfh)) - kfree(fanotify_fh_ext_buf(dfh)); - kfree(fne); + kfree(FANOTIFY_NE(event)); } static void fanotify_free_event(struct fsnotify_event *fsn_event) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 5e104fc56abb..12c204b1489f 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -29,8 +29,10 @@ enum { struct fanotify_fh { u8 type; u8 len; - u8 pad[2]; - unsigned char buf[FANOTIFY_INLINE_FH_LEN]; +#define FANOTIFY_FH_FLAG_EXT_BUF 1 + u8 flags; + u8 pad; + unsigned char buf[]; } __aligned(4); /* Variable size struct for dir file handle + child file handle + name */ @@ -50,7 +52,7 @@ struct fanotify_info { static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) { - return fh->len > FANOTIFY_INLINE_FH_LEN; + return (fh->flags & FANOTIFY_FH_FLAG_EXT_BUF); } static inline char **fanotify_fh_ext_buf_ptr(struct fanotify_fh *fh) @@ -154,6 +156,8 @@ struct fanotify_fid_event { struct fanotify_event fae; __kernel_fsid_t fsid; struct fanotify_fh object_fh; + /* Reserve space in object_fh.buf[] - access with fanotify_fh_buf() */ + unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN]; }; static inline struct fanotify_fid_event * @@ -166,8 +170,6 @@ struct fanotify_name_event { struct fanotify_event fae; __kernel_fsid_t fsid; struct fanotify_info info; - /* Reserve space in info.buf[] - access with fanotify_info_dir_fh() */ - struct fanotify_fh _dir_fh; }; static inline struct fanotify_name_event * From 62cb0af4cea871e80015dd45b200033002f23a95 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:19 +0300 Subject: [PATCH 22/37] dnotify: report both events on parent and child with single callback For some events (e.g. DN_ATTRIB on sub-directory) fsnotify may call dnotify_handle_event() once for watching parent and once again for the watching sub-directory. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-12-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/dnotify/dnotify.c | 42 +++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 2d2eadfb5186..ca78d3f78da8 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -70,26 +70,15 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark) * destroy the dnotify struct if it was not registered to receive multiple * events. */ -static int dnotify_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *file_name, u32 cookie, - struct fsnotify_iter_info *iter_info) +static void dnotify_one_event(struct fsnotify_group *group, u32 mask, + struct fsnotify_mark *inode_mark) { - struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct dnotify_mark *dn_mark; struct dnotify_struct *dn; struct dnotify_struct **prev; struct fown_struct *fown; __u32 test_mask = mask & ~FS_EVENT_ON_CHILD; - /* not a dir, dnotify doesn't care */ - if (!dir && !(mask & FS_ISDIR)) - return 0; - - if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) - return 0; - dn_mark = container_of(inode_mark, struct dnotify_mark, fsn_mark); spin_lock(&inode_mark->lock); @@ -111,6 +100,33 @@ static int dnotify_handle_event(struct fsnotify_group *group, u32 mask, } spin_unlock(&inode_mark->lock); +} + +static int dnotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, + struct inode *dir, + const struct qstr *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) +{ + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); + struct fsnotify_mark *child_mark = fsnotify_iter_child_mark(iter_info); + + /* not a dir, dnotify doesn't care */ + if (!dir && !(mask & FS_ISDIR)) + return 0; + + if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) + return 0; + + /* + * Some events can be sent on both parent dir and subdir marks + * (e.g. DN_ATTRIB). If both parent dir and subdir are watching, + * report the event once to parent dir and once to subdir. + */ + if (inode_mark) + dnotify_one_event(group, mask, inode_mark); + if (child_mark) + dnotify_one_event(group, mask, child_mark); return 0; } From c8f3446c66d810e21af61cf889e7b0e3bc921b9d Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:20 +0300 Subject: [PATCH 23/37] inotify: report both events on parent and child with single callback fsnotify usually calls inotify_handle_event() once for watching parent to report event with child's name and once for watching child to report event without child's name. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-13-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/inotify/inotify_fsnotify.c | 44 ++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index dfd455798a1b..a65cf8c9f600 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -55,13 +55,11 @@ static int inotify_merge(struct list_head *list, return event_compare(last_event, event); } -int inotify_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, struct inode *dir, - const struct qstr *file_name, u32 cookie, - struct fsnotify_iter_info *iter_info) +static int inotify_one_event(struct fsnotify_group *group, u32 mask, + struct fsnotify_mark *inode_mark, + const struct path *path, + const struct qstr *file_name, u32 cookie) { - const struct path *path = fsnotify_data_path(data, data_type); - struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct inotify_inode_mark *i_mark; struct inotify_event_info *event; struct fsnotify_event *fsn_event; @@ -69,9 +67,6 @@ int inotify_handle_event(struct fsnotify_group *group, u32 mask, int len = 0; int alloc_len = sizeof(struct inotify_event_info); - if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) - return 0; - if ((inode_mark->mask & FS_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0; @@ -135,6 +130,37 @@ int inotify_handle_event(struct fsnotify_group *group, u32 mask, return 0; } +int inotify_handle_event(struct fsnotify_group *group, u32 mask, + const void *data, int data_type, struct inode *dir, + const struct qstr *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) +{ + const struct path *path = fsnotify_data_path(data, data_type); + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); + struct fsnotify_mark *child_mark = fsnotify_iter_child_mark(iter_info); + int ret = 0; + + if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) + return 0; + + /* + * Some events cannot be sent on both parent and child marks + * (e.g. IN_CREATE). Those events are always sent on inode_mark. + * For events that are possible on both parent and child (e.g. IN_OPEN), + * event is sent on inode_mark with name if the parent is watching and + * is sent on child_mark without name if child is watching. + * If both parent and child are watching, report the event with child's + * name here and report another event without child's name below. + */ + if (inode_mark) + ret = inotify_one_event(group, mask, inode_mark, path, + file_name, cookie); + if (ret || !child_mark) + return ret; + + return inotify_one_event(group, mask, child_mark, path, NULL, 0); +} + static void inotify_freeing_mark(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group) { inotify_ignored_and_remove_idr(fsn_mark, group); From 497b0c5a7c0688c1b100a9c2e267337f677c198e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:22 +0300 Subject: [PATCH 24/37] fsnotify: send event to parent and child with single callback Instead of calling fsnotify() twice, once with parent inode and once with child inode, if event should be sent to parent inode, send it with both parent and child inodes marks in object type iterator and call the backend handle_event() callback only once. The parent inode is assigned to the standard "inode" iterator type and the child inode is assigned to the special "child" iterator type. In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask, the dir argument to handle_event will be the parent inode, the file_name argument to handle_event is non NULL and refers to the name of the child and the child inode can be accessed with fsnotify_data_inode(). This will allow fanotify to make decisions based on child or parent's ignored mask. For example, when a parent is interested in a specific event on its children, but a specific child wishes to ignore this event, the event will not be reported. This is not what happens with current code, but according to man page, it is the expected behavior. Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/kernfs/file.c | 10 +++--- fs/notify/fanotify/fanotify.c | 8 ++--- fs/notify/fsnotify.c | 60 ++++++++++++++++++++--------------- 3 files changed, 45 insertions(+), 33 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index e23b3f62483c..5b1468bc509e 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -883,6 +883,7 @@ repeat: list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; + struct inode *p_inode = NULL; struct inode *inode; struct qstr name; @@ -899,8 +900,6 @@ repeat: name = (struct qstr)QSTR_INIT(kn->name, strlen(kn->name)); parent = kernfs_get_parent(kn); if (parent) { - struct inode *p_inode; - p_inode = ilookup(info->sb, kernfs_ino(parent)); if (p_inode) { fsnotify(p_inode, FS_MODIFY | FS_EVENT_ON_CHILD, @@ -911,8 +910,11 @@ repeat: kernfs_put(parent); } - fsnotify(inode, FS_MODIFY, inode, FSNOTIFY_EVENT_INODE, - NULL, 0); + if (!p_inode) { + fsnotify(inode, FS_MODIFY, inode, FSNOTIFY_EVENT_INODE, + NULL, 0); + } + iput(inode); } diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c107974d6830..3baf93e998c1 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -261,12 +261,12 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, continue; /* - * If the event is for a child and this mark doesn't care about - * events on a child, don't send it! + * If the event is for a child and this mark is on a parent not + * watching children, don't send it! */ if (event_mask & FS_EVENT_ON_CHILD && - (type != FSNOTIFY_OBJ_TYPE_INODE || - !(mark->mask & FS_EVENT_ON_CHILD))) + type == FSNOTIFY_OBJ_TYPE_INODE && + !(mark->mask & FS_EVENT_ON_CHILD)) continue; marks_mask |= mark->mask; diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index f12a554be3f0..0b0e01e04349 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -145,17 +145,21 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) /* * Notify this dentry's parent about a child's events with child name info * if parent is watching. - * Notify also the child without name info if child inode is watching. + * Notify only the child without name info if parent is not watching. */ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { + struct inode *inode = d_inode(dentry); struct dentry *parent; struct inode *p_inode; + struct name_snapshot name; + struct qstr *file_name = NULL; int ret = 0; + parent = NULL; if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) - goto notify_child; + goto notify; parent = dget_parent(dentry); p_inode = parent->d_inode; @@ -163,25 +167,24 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, if (unlikely(!fsnotify_inode_watches_children(p_inode))) { __fsnotify_update_child_dentry_flags(p_inode); } else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) { - struct name_snapshot name; + /* When notifying parent, child should be passed as data */ + WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type)); - /* - * We are notifying a parent, so set a flag in mask to inform - * backend that event has information about a child entry. - */ + /* Notify both parent and child with child name info */ + inode = p_inode; take_dentry_name_snapshot(&name, dentry); - ret = fsnotify(p_inode, mask | FS_EVENT_ON_CHILD, data, - data_type, &name.name, 0); - release_dentry_name_snapshot(&name); + file_name = &name.name; + mask |= FS_EVENT_ON_CHILD; } +notify: + ret = fsnotify(inode, mask, data, data_type, file_name, 0); + + if (file_name) + release_dentry_name_snapshot(&name); dput(parent); - if (ret) - return ret; - -notify_child: - return fsnotify(d_inode(dentry), mask, data, data_type, NULL, 0); + return ret; } EXPORT_SYMBOL_GPL(__fsnotify_parent); @@ -322,12 +325,16 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, struct super_block *sb = to_tell->i_sb; struct inode *dir = file_name ? to_tell : NULL; struct mount *mnt = NULL; + struct inode *child = NULL; int ret = 0; __u32 test_mask, marks_mask; if (path) mnt = real_mount(path->mnt); + if (mask & FS_EVENT_ON_CHILD) + child = fsnotify_data_inode(data, data_type); + /* * Optimization: srcu_read_lock() has a memory barrier which can * be expensive. It protects walking the *_fsnotify_marks lists. @@ -336,21 +343,20 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, * need SRCU to keep them "alive". */ if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && - (!mnt || !mnt->mnt_fsnotify_marks)) + (!mnt || !mnt->mnt_fsnotify_marks) && + (!child || !child->i_fsnotify_marks)) return 0; - /* An event "on child" is not intended for a mount/sb mark */ - marks_mask = to_tell->i_fsnotify_mask; - if (!(mask & FS_EVENT_ON_CHILD)) { - marks_mask |= sb->s_fsnotify_mask; - if (mnt) - marks_mask |= mnt->mnt_fsnotify_mask; - } + marks_mask = to_tell->i_fsnotify_mask | sb->s_fsnotify_mask; + if (mnt) + marks_mask |= mnt->mnt_fsnotify_mask; + if (child) + marks_mask |= child->i_fsnotify_mask; + /* * if this is a modify event we may need to clear the ignored masks - * otherwise return if neither the inode nor the vfsmount/sb care about - * this type of event. + * otherwise return if none of the marks care about this type of event. */ test_mask = (mask & ALL_FSNOTIFY_EVENTS); if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) @@ -366,6 +372,10 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } + if (child) { + iter_info.marks[FSNOTIFY_OBJ_TYPE_CHILD] = + fsnotify_first_mark(&child->i_fsnotify_marks); + } /* * We need to merge inode/vfsmount/sb mark lists so that e.g. inode mark From 82ace1efb3cb1d49a1681cc6e31156047d5ae1f2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 22 Jul 2020 15:58:44 +0300 Subject: [PATCH 25/37] fsnotify: create helper fsnotify_inode() Simple helper to consolidate biolerplate code. Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/kernfs/file.c | 6 ++---- fs/notify/fsnotify.c | 2 +- include/linux/fsnotify.h | 26 +++++++++++--------------- kernel/trace/trace.c | 3 +-- 4 files changed, 15 insertions(+), 22 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 5b1468bc509e..1d185bffc52f 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -910,10 +910,8 @@ repeat: kernfs_put(parent); } - if (!p_inode) { - fsnotify(inode, FS_MODIFY, inode, FSNOTIFY_EVENT_INODE, - NULL, 0); - } + if (!p_inode) + fsnotify_inode(inode, FS_MODIFY); iput(inode); } diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 0b0e01e04349..ba172b742de5 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -74,7 +74,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) iput(iput_inode); /* for each watch, send FS_UNMOUNT and then remove it */ - fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify_inode(inode, FS_UNMOUNT); fsnotify_inode_delete(inode); diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 9b2566d273a9..f9db7c9b3ef1 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -38,6 +38,14 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); } +static inline void fsnotify_inode(struct inode *inode, __u32 mask) +{ + if (S_ISDIR(inode->i_mode)) + mask |= FS_ISDIR; + + fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0); +} + /* Notify this dentry's parent about a child's events. */ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) @@ -105,12 +113,7 @@ static inline int fsnotify_perm(struct file *file, int mask) */ static inline void fsnotify_link_count(struct inode *inode) { - __u32 mask = FS_ATTRIB; - - if (S_ISDIR(inode->i_mode)) - mask |= FS_ISDIR; - - fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify_inode(inode, FS_ATTRIB); } /* @@ -125,7 +128,6 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, u32 fs_cookie = fsnotify_get_cookie(); __u32 old_dir_mask = FS_MOVED_FROM; __u32 new_dir_mask = FS_MOVED_TO; - __u32 mask = FS_MOVE_SELF; const struct qstr *new_name = &moved->d_name; if (old_dir == new_dir) @@ -134,7 +136,6 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, if (isdir) { old_dir_mask |= FS_ISDIR; new_dir_mask |= FS_ISDIR; - mask |= FS_ISDIR; } fsnotify_name(old_dir, old_dir_mask, source, old_name, fs_cookie); @@ -144,7 +145,7 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, fsnotify_link_count(target); if (source) - fsnotify(source, mask, source, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify_inode(source, FS_MOVE_SELF); audit_inode_child(new_dir, moved, AUDIT_TYPE_CHILD_CREATE); } @@ -169,12 +170,7 @@ static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt) */ static inline void fsnotify_inoderemove(struct inode *inode) { - __u32 mask = FS_DELETE_SELF; - - if (S_ISDIR(inode->i_mode)) - mask |= FS_ISDIR; - - fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify_inode(inode, FS_DELETE_SELF); __fsnotify_inode_delete(inode); } diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index bb62269724d5..0c655c039506 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1543,8 +1543,7 @@ static void latency_fsnotify_workfn(struct work_struct *work) { struct trace_array *tr = container_of(work, struct trace_array, fsnotify_work); - fsnotify(tr->d_max_latency->d_inode, FS_MODIFY, - tr->d_max_latency->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify_inode(tr->d_max_latency->d_inode, FS_MODIFY); } static void latency_fsnotify_workfn_irq(struct irq_work *iwork) From 40a100d3adc1ad7f0a34875468c499fcecd20ba4 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 22 Jul 2020 15:58:46 +0300 Subject: [PATCH 26/37] fsnotify: pass dir and inode arguments to fsnotify() The arguments of fsnotify() are overloaded and mean different things for different event types. Replace the to_tell argument with separate arguments @dir and @inode, because we may be sending to both dir and child. Using the @data argument to pass the child is not enough, because dirent events pass this argument (for audit), but we do not report to child. Document the new fsnotify() function argumenets. Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/kernfs/file.c | 5 +-- fs/notify/fsnotify.c | 54 ++++++++++++++++++++++---------- include/linux/fsnotify.h | 9 +++--- include/linux/fsnotify_backend.h | 10 +++--- 4 files changed, 52 insertions(+), 26 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 1d185bffc52f..f277d023ebcd 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -902,8 +902,9 @@ repeat: if (parent) { p_inode = ilookup(info->sb, kernfs_ino(parent)); if (p_inode) { - fsnotify(p_inode, FS_MODIFY | FS_EVENT_ON_CHILD, - inode, FSNOTIFY_EVENT_INODE, &name, 0); + fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD, + inode, FSNOTIFY_EVENT_INODE, + p_inode, &name, inode, 0); iput(p_inode); } diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index ba172b742de5..4a762c8c4a29 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -152,7 +152,7 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, { struct inode *inode = d_inode(dentry); struct dentry *parent; - struct inode *p_inode; + struct inode *p_inode = NULL; struct name_snapshot name; struct qstr *file_name = NULL; int ret = 0; @@ -171,14 +171,13 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type)); /* Notify both parent and child with child name info */ - inode = p_inode; take_dentry_name_snapshot(&name, dentry); file_name = &name.name; mask |= FS_EVENT_ON_CHILD; } notify: - ret = fsnotify(inode, mask, data, data_type, file_name, 0); + ret = fsnotify(mask, data, data_type, p_inode, file_name, inode, 0); if (file_name) release_dentry_name_snapshot(&name); @@ -312,18 +311,31 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) } /* - * This is the main call to fsnotify. The VFS calls into hook specific functions - * in linux/fsnotify.h. Those functions then in turn call here. Here will call - * out to all of the registered fsnotify_group. Those groups can then use the - * notification event in whatever means they feel necessary. + * fsnotify - This is the main call to fsnotify. + * + * The VFS calls into hook specific functions in linux/fsnotify.h. + * Those functions then in turn call here. Here will call out to all of the + * registered fsnotify_group. Those groups can then use the notification event + * in whatever means they feel necessary. + * + * @mask: event type and flags + * @data: object that event happened on + * @data_type: type of object for fanotify_data_XXX() accessors + * @dir: optional directory associated with event - + * if @file_name is not NULL, this is the directory that + * @file_name is relative to + * @file_name: optional file name associated with event + * @inode: optional inode associated with event - + * either @dir or @inode must be non-NULL. + * if both are non-NULL event may be reported to both. + * @cookie: inotify rename cookie */ -int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, - const struct qstr *file_name, u32 cookie) +int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, + const struct qstr *file_name, struct inode *inode, u32 cookie) { const struct path *path = fsnotify_data_path(data, data_type); struct fsnotify_iter_info iter_info = {}; - struct super_block *sb = to_tell->i_sb; - struct inode *dir = file_name ? to_tell : NULL; + struct super_block *sb; struct mount *mnt = NULL; struct inode *child = NULL; int ret = 0; @@ -332,8 +344,18 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, if (path) mnt = real_mount(path->mnt); - if (mask & FS_EVENT_ON_CHILD) - child = fsnotify_data_inode(data, data_type); + if (!inode) { + /* Dirent event - report on TYPE_INODE to dir */ + inode = dir; + } else if (mask & FS_EVENT_ON_CHILD) { + /* + * Event on child - report on TYPE_INODE to dir + * and on TYPE_CHILD to child. + */ + child = inode; + inode = dir; + } + sb = inode->i_sb; /* * Optimization: srcu_read_lock() has a memory barrier which can @@ -342,12 +364,12 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, * SRCU because we have no references to any objects and do not * need SRCU to keep them "alive". */ - if (!to_tell->i_fsnotify_marks && !sb->s_fsnotify_marks && + if (!inode->i_fsnotify_marks && !sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!child || !child->i_fsnotify_marks)) return 0; - marks_mask = to_tell->i_fsnotify_mask | sb->s_fsnotify_mask; + marks_mask = inode->i_fsnotify_mask | sb->s_fsnotify_mask; if (mnt) marks_mask |= mnt->mnt_fsnotify_mask; if (child) @@ -365,7 +387,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_type, iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = - fsnotify_first_mark(&to_tell->i_fsnotify_marks); + fsnotify_first_mark(&inode->i_fsnotify_marks); iter_info.marks[FSNOTIFY_OBJ_TYPE_SB] = fsnotify_first_mark(&sb->s_fsnotify_marks); if (mnt) { diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index f9db7c9b3ef1..99922cac4fcd 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -23,13 +23,14 @@ * have changed (i.e. renamed over). * * Unlike fsnotify_parent(), the event will be reported regardless of the - * FS_EVENT_ON_CHILD mask on the parent inode. + * FS_EVENT_ON_CHILD mask on the parent inode and will not be reported if only + * the child is interested and not the parent. */ static inline void fsnotify_name(struct inode *dir, __u32 mask, struct inode *child, const struct qstr *name, u32 cookie) { - fsnotify(dir, mask, child, FSNOTIFY_EVENT_INODE, name, cookie); + fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); } static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, @@ -43,7 +44,7 @@ static inline void fsnotify_inode(struct inode *inode, __u32 mask) if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; - fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0); + fsnotify(mask, inode, FSNOTIFY_EVENT_INODE, NULL, NULL, inode, 0); } /* Notify this dentry's parent about a child's events. */ @@ -61,7 +62,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, return __fsnotify_parent(dentry, mask, data, data_type); notify_child: - return fsnotify(inode, mask, data, data_type, NULL, 0); + return fsnotify(mask, data, data_type, NULL, NULL, inode, 0); } /* diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d22519001027..152520635bd3 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -387,8 +387,9 @@ struct fsnotify_mark { /* called from the vfs helpers */ /* main fsnotify call to send events */ -extern int fsnotify(struct inode *to_tell, __u32 mask, const void *data, - int data_type, const struct qstr *name, u32 cookie); +extern int fsnotify(__u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *name, + struct inode *inode, u32 cookie); extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type); extern void __fsnotify_inode_delete(struct inode *inode); @@ -545,8 +546,9 @@ static inline void fsnotify_init_event(struct fsnotify_event *event, #else -static inline int fsnotify(struct inode *to_tell, __u32 mask, const void *data, - int data_type, const struct qstr *name, u32 cookie) +static inline int fsnotify(__u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *name, + struct inode *inode, u32 cookie) { return 0; } From 957f7b472c6b6de4214bf5144b811ab83a9b9465 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 22 Jul 2020 15:58:42 +0300 Subject: [PATCH 27/37] inotify: do not set FS_EVENT_ON_CHILD in non-dir mark mask FS_EVENT_ON_CHILD has currently no meaning for non-dir inode marks. In the following patches we want to use that bit to mean that mark's notification group cares about parent and name information. So stop setting FS_EVENT_ON_CHILD for non-dir marks. Link: https://lore.kernel.org/r/20200722125849.17418-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/inotify/inotify_user.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 5385d5817dd9..186722ba3894 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -75,15 +75,17 @@ struct ctl_table inotify_table[] = { }; #endif /* CONFIG_SYSCTL */ -static inline __u32 inotify_arg_to_mask(u32 arg) +static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) { __u32 mask; /* - * everything should accept their own ignored, cares about children, - * and should receive events when the inode is unmounted + * Everything should accept their own ignored and should receive events + * when the inode is unmounted. All directories care about children. */ - mask = (FS_IN_IGNORED | FS_EVENT_ON_CHILD | FS_UNMOUNT); + mask = (FS_IN_IGNORED | FS_UNMOUNT); + if (S_ISDIR(inode->i_mode)) + mask |= FS_EVENT_ON_CHILD; /* mask off the flags used to open the fd */ mask |= (arg & (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK)); @@ -512,7 +514,7 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, int create = (arg & IN_MASK_CREATE); int ret; - mask = inotify_arg_to_mask(arg); + mask = inotify_arg_to_mask(inode, arg); fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group); if (!fsn_mark) @@ -565,7 +567,7 @@ static int inotify_new_watch(struct fsnotify_group *group, struct idr *idr = &group->inotify_data.idr; spinlock_t *idr_lock = &group->inotify_data.idr_lock; - mask = inotify_arg_to_mask(arg); + mask = inotify_arg_to_mask(inode, arg); tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL); if (unlikely(!tmp_i_mark)) From 7dbe6080167860df0bf8627e55fd5154f366cc7a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 22 Jul 2020 15:58:43 +0300 Subject: [PATCH 28/37] audit: do not set FS_EVENT_ON_CHILD in audit marks mask The audit group marks mask does not contain any events possible on a child so setting the flag FS_EVENT_ON_CHILD in the mask is counter productive. It may lead to the undesired outcome of setting the dentry flag DCACHE_FSNOTIFY_PARENT_WATCHED on a directory inode even though it is not watching children, because the audit mark contribute the flag FS_EVENT_ON_CHILD to the inode's fsnotify_mask and another mark could be contributing an event that is possible on child to the inode's mask. Furthermore in the following patches we want to use FS_EVENT_ON_CHILD for non-dir inodes for other purposes so stop using the flag. Link: https://lore.kernel.org/r/20200722125849.17418-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- kernel/audit_fsnotify.c | 2 +- kernel/audit_watch.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 30ca239285a3..bd3a6b79316a 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -36,7 +36,7 @@ static struct fsnotify_group *audit_fsnotify_group; /* fsnotify events we care about. */ #define AUDIT_FS_EVENTS (FS_MOVE | FS_CREATE | FS_DELETE | FS_DELETE_SELF |\ - FS_MOVE_SELF | FS_EVENT_ON_CHILD) + FS_MOVE_SELF) static void audit_fsnotify_mark_free(struct audit_fsnotify_mark *audit_mark) { diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 61fd601f1edf..e23d54bcc587 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -53,7 +53,7 @@ static struct fsnotify_group *audit_watch_group; /* fsnotify events we care about. */ #define AUDIT_FS_WATCH (FS_MOVE | FS_CREATE | FS_DELETE | FS_DELETE_SELF |\ - FS_MOVE_SELF | FS_EVENT_ON_CHILD | FS_UNMOUNT) + FS_MOVE_SELF | FS_UNMOUNT) static void audit_free_parent(struct audit_parent *parent) { From 9b93f33105f5f9bd3d016ff870eb6000c9d89eff Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:23 +0300 Subject: [PATCH 29/37] fsnotify: send event with parent/name info to sb/mount/non-dir marks Similar to events "on child" to watching directory, send event with parent/name info if sb/mount/non-dir marks are interested in parent/name info. The FS_EVENT_ON_CHILD flag can be set on sb/mount/non-dir marks to specify interest in parent/name info for events on non-directory inodes. Events on "orphan" children (disconnected dentries) are sent without parent/name info. Events on directories are sent with parent/name info only if the parent directory is watching. After this change, even groups that do not subscribe to events on children could get an event with mark iterator type TYPE_CHILD and without mark iterator type TYPE_INODE if fanotify has marks on the same objects. dnotify and inotify event handlers can already cope with that situation. audit does not subscribe to events that are possible on child, so won't get to this situation. nfsd does not access the marks iterator from its event handler at the moment, so it is not affected. This is a bit too fragile, so we should prepare all groups to cope with mark type TYPE_CHILD preferably using a generic helper. Link: https://lore.kernel.org/r/20200716084230.30611-16-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fsnotify.c | 74 ++++++++++++++++++++++++++------ include/linux/fsnotify.h | 10 ++++- include/linux/fsnotify_backend.h | 32 ++++++++++++-- 3 files changed, 97 insertions(+), 19 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 4a762c8c4a29..494d5d70323f 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -142,38 +142,81 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) spin_unlock(&inode->i_lock); } +/* Are inode/sb/mount interested in parent and name info with this event? */ +static bool fsnotify_event_needs_parent(struct inode *inode, struct mount *mnt, + __u32 mask) +{ + __u32 marks_mask = 0; + + /* We only send parent/name to inode/sb/mount for events on non-dir */ + if (mask & FS_ISDIR) + return false; + + /* Did either inode/sb/mount subscribe for events with parent/name? */ + marks_mask |= fsnotify_parent_needed_mask(inode->i_fsnotify_mask); + marks_mask |= fsnotify_parent_needed_mask(inode->i_sb->s_fsnotify_mask); + if (mnt) + marks_mask |= fsnotify_parent_needed_mask(mnt->mnt_fsnotify_mask); + + /* Did they subscribe for this event with parent/name info? */ + return mask & marks_mask; +} + /* * Notify this dentry's parent about a child's events with child name info - * if parent is watching. - * Notify only the child without name info if parent is not watching. + * if parent is watching or if inode/sb/mount are interested in events with + * parent and name info. + * + * Notify only the child without name info if parent is not watching and + * inode/sb/mount are not interested in events with parent and name info. */ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data, int data_type) { + const struct path *path = fsnotify_data_path(data, data_type); + struct mount *mnt = path ? real_mount(path->mnt) : NULL; struct inode *inode = d_inode(dentry); struct dentry *parent; + bool parent_watched = dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED; + __u32 p_mask; struct inode *p_inode = NULL; struct name_snapshot name; struct qstr *file_name = NULL; int ret = 0; + /* + * Do inode/sb/mount care about parent and name info on non-dir? + * Do they care about any event at all? + */ + if (!inode->i_fsnotify_marks && !inode->i_sb->s_fsnotify_marks && + (!mnt || !mnt->mnt_fsnotify_marks) && !parent_watched) + return 0; + parent = NULL; - if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + if (!parent_watched && !fsnotify_event_needs_parent(inode, mnt, mask)) goto notify; + /* Does parent inode care about events on children? */ parent = dget_parent(dentry); p_inode = parent->d_inode; - - if (unlikely(!fsnotify_inode_watches_children(p_inode))) { + p_mask = fsnotify_inode_watches_children(p_inode); + if (unlikely(parent_watched && !p_mask)) __fsnotify_update_child_dentry_flags(p_inode); - } else if (p_inode->i_fsnotify_mask & mask & ALL_FSNOTIFY_EVENTS) { + + /* + * Include parent/name in notification either if some notification + * groups require parent info (!parent_watched case) or the parent is + * interested in this event. + */ + if (!parent_watched || (mask & p_mask & ALL_FSNOTIFY_EVENTS)) { /* When notifying parent, child should be passed as data */ WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type)); /* Notify both parent and child with child name info */ take_dentry_name_snapshot(&name, dentry); file_name = &name.name; - mask |= FS_EVENT_ON_CHILD; + if (parent_watched) + mask |= FS_EVENT_ON_CHILD; } notify: @@ -349,8 +392,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, inode = dir; } else if (mask & FS_EVENT_ON_CHILD) { /* - * Event on child - report on TYPE_INODE to dir - * and on TYPE_CHILD to child. + * Event on child - report on TYPE_INODE to dir if it is + * watching children and on TYPE_CHILD to child. */ child = inode; inode = dir; @@ -364,14 +407,17 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, * SRCU because we have no references to any objects and do not * need SRCU to keep them "alive". */ - if (!inode->i_fsnotify_marks && !sb->s_fsnotify_marks && + if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && + (!inode || !inode->i_fsnotify_marks) && (!child || !child->i_fsnotify_marks)) return 0; - marks_mask = inode->i_fsnotify_mask | sb->s_fsnotify_mask; + marks_mask = sb->s_fsnotify_mask; if (mnt) marks_mask |= mnt->mnt_fsnotify_mask; + if (inode) + marks_mask |= inode->i_fsnotify_mask; if (child) marks_mask |= child->i_fsnotify_mask; @@ -386,14 +432,16 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = - fsnotify_first_mark(&inode->i_fsnotify_marks); iter_info.marks[FSNOTIFY_OBJ_TYPE_SB] = fsnotify_first_mark(&sb->s_fsnotify_marks); if (mnt) { iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } + if (inode) { + iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = + fsnotify_first_mark(&inode->i_fsnotify_marks); + } if (child) { iter_info.marks[FSNOTIFY_OBJ_TYPE_CHILD] = fsnotify_first_mark(&child->i_fsnotify_marks); diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 99922cac4fcd..6e63f7e10da0 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -53,10 +53,16 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, { struct inode *inode = d_inode(dentry); - if (S_ISDIR(inode->i_mode)) + if (S_ISDIR(inode->i_mode)) { mask |= FS_ISDIR; - if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + /* sb/mount marks are not interested in name of directory */ + if (!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED)) + goto notify_child; + } + + /* disconnected dentry cannot notify parent */ + if (IS_ROOT(dentry)) goto notify_child; return __fsnotify_parent(dentry, mask, data, data_type); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 152520635bd3..32104cfc27a5 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -49,8 +49,11 @@ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ #define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ -/* This inode cares about things that happen to its children. Always set for - * dnotify and inotify. */ +/* + * Set on inode mark that cares about things that happen to its children. + * Always set for dnotify and inotify. + * Set on inode/sb/mount marks that care about parent/name info. + */ #define FS_EVENT_ON_CHILD 0x08000000 #define FS_DN_RENAME 0x10000000 /* file renamed */ @@ -72,14 +75,22 @@ FS_OPEN_EXEC_PERM) /* - * This is a list of all events that may get sent to a parent based on fs event - * happening to inodes inside that directory. + * This is a list of all events that may get sent to a parent that is watching + * with flag FS_EVENT_ON_CHILD based on fs event on a child of that directory. */ #define FS_EVENTS_POSS_ON_CHILD (ALL_FSNOTIFY_PERM_EVENTS | \ FS_ACCESS | FS_MODIFY | FS_ATTRIB | \ FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | \ FS_OPEN | FS_OPEN_EXEC) +/* + * This is a list of all events that may get sent with the parent inode as the + * @to_tell argument of fsnotify(). + * It may include events that can be sent to an inode/sb/mount mark, but cannot + * be sent to a parent watching children. + */ +#define FS_EVENTS_POSS_TO_PARENT (FS_EVENTS_POSS_ON_CHILD) + /* Events that can be reported to backends */ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ @@ -397,6 +408,19 @@ extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt); extern void fsnotify_sb_delete(struct super_block *sb); extern u32 fsnotify_get_cookie(void); +static inline __u32 fsnotify_parent_needed_mask(__u32 mask) +{ + /* FS_EVENT_ON_CHILD is set on marks that want parent/name info */ + if (!(mask & FS_EVENT_ON_CHILD)) + return 0; + /* + * This object might be watched by a mark that cares about parent/name + * info, does it care about the specific set of events that can be + * reported with parent/name info? + */ + return mask & FS_EVENTS_POSS_TO_PARENT; +} + static inline int fsnotify_inode_watches_children(struct inode *inode) { /* FS_EVENT_ON_CHILD is set if the inode may care */ From 79cb299c7e181fad683bf6191edb8224b2412512 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:24 +0300 Subject: [PATCH 30/37] fsnotify: remove check that source dentry is positive Remove the unneeded check for positive source dentry in fsnotify_move(). fsnotify_move() hook is mostly called from vfs_rename() under lock_rename() and vfs_rename() starts with may_delete() test that verifies positive source dentry. The only other caller of fsnotify_move() - debugfs_rename() also verifies positive source. Link: https://lore.kernel.org/r/20200716084230.30611-17-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 6e63f7e10da0..f8acddcf54fb 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -150,9 +150,7 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, if (target) fsnotify_link_count(target); - - if (source) - fsnotify_inode(source, FS_MOVE_SELF); + fsnotify_inode(source, FS_MOVE_SELF); audit_inode_child(new_dir, moved, AUDIT_TYPE_CHILD_CREATE); } From 83b7a59896dd24015a34b7f00027f0ff3747972f Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:26 +0300 Subject: [PATCH 31/37] fanotify: add basic support for FAN_REPORT_DIR_FID For now, the flag is mutually exclusive with FAN_REPORT_FID. Events include a single info record of type FAN_EVENT_INFO_TYPE_DFID with a directory file handle. For now, events are only reported for: - Directory modification events - Events on children of a watching directory - Events on directory objects Soon, we will add support for reporting the parent directory fid for events on non-directories with filesystem/mount mark and support for reporting both parent directory fid and child fid. Link: https://lore.kernel.org/r/20200716084230.30611-19-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 34 +++++++++++++- fs/notify/fanotify/fanotify_user.c | 71 +++++++++++++++++++++++++----- include/linux/fanotify.h | 2 +- include/uapi/linux/fanotify.h | 11 +++-- 4 files changed, 101 insertions(+), 17 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 3baf93e998c1..fc2e1fab34af 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -223,7 +223,7 @@ out: static u32 fanotify_group_event_mask(struct fsnotify_group *group, struct fsnotify_iter_info *iter_info, u32 event_mask, const void *data, - int data_type) + int data_type, struct inode *dir) { __u32 marks_mask = 0, marks_ignored_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | @@ -243,6 +243,10 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, /* Path type events are only relevant for files and dirs */ if (!d_is_reg(path->dentry) && !d_can_lookup(path->dentry)) return 0; + } else if (!(fid_mode & FAN_REPORT_FID)) { + /* Do we have a directory inode to report? */ + if (!dir && !(event_mask & FS_ISDIR)) + return 0; } fsnotify_foreach_obj_type(type) { @@ -396,6 +400,28 @@ static struct inode *fanotify_fid_inode(u32 event_mask, const void *data, return fsnotify_data_inode(data, data_type); } +/* + * The inode to use as identifier when reporting dir fid depends on the event. + * Report the modified directory inode on dirent modification events. + * Report the "victim" inode if "victim" is a directory. + * Report the parent inode if "victim" is not a directory and event is + * reported to parent. + * Otherwise, do not report dir fid. + */ +static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, + int data_type, struct inode *dir) +{ + struct inode *inode = fsnotify_data_inode(data, data_type); + + if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) + return dir; + + if (S_ISDIR(inode->i_mode)) + return inode; + + return dir; +} + static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, gfp_t gfp) { @@ -491,10 +517,14 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); + struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); bool name_event = false; + if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) + id = dirid; + /* * For queues with unlimited length lost events are not expected and * can possibly have security implications. Avoid losing events when @@ -605,7 +635,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19); mask = fanotify_group_event_mask(group, iter_info, mask, data, - data_type); + data_type, dir); if (!mask) return 0; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3842ef00b52e..e494400711c9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -216,7 +216,7 @@ static int process_access_response(struct fsnotify_group *group, } static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, - const char *name, size_t name_len, + int info_type, const char *name, size_t name_len, char __user *buf, size_t count) { struct fanotify_event_info_fid info = { }; @@ -229,7 +229,7 @@ static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, pr_debug("%s: fh_len=%zu name_len=%zu, info_len=%zu, count=%zu\n", __func__, fh_len, name_len, info_len, count); - if (!fh_len || (name && !name_len)) + if (!fh_len) return 0; if (WARN_ON_ONCE(len < sizeof(info) || len > count)) @@ -239,8 +239,21 @@ static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, * Copy event info fid header followed by variable sized file handle * and optionally followed by variable sized filename. */ - info.hdr.info_type = name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : - FAN_EVENT_INFO_TYPE_FID; + switch (info_type) { + case FAN_EVENT_INFO_TYPE_FID: + case FAN_EVENT_INFO_TYPE_DFID: + if (WARN_ON_ONCE(name_len)) + return -EFAULT; + break; + case FAN_EVENT_INFO_TYPE_DFID_NAME: + if (WARN_ON_ONCE(!name || !name_len)) + return -EFAULT; + break; + default: + return -EFAULT; + } + + info.hdr.info_type = info_type; info.hdr.len = len; info.fsid = *fsid; if (copy_to_user(buf, &info, sizeof(info))) @@ -304,8 +317,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, struct fanotify_event_metadata metadata; struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct file *f = NULL; int ret, fd = FAN_NOFD; + int info_type = 0; pr_debug("%s: group=%p event=%p\n", __func__, group, event); @@ -346,9 +361,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, /* Event info records order is: dir fid + name, child fid */ if (fanotify_event_dir_fh_len(event)) { + info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; ret = copy_info_to_user(fanotify_event_fsid(event), fanotify_info_dir_fh(info), - fanotify_info_name(info), + info_type, fanotify_info_name(info), info->name_len, buf, count); if (ret < 0) return ret; @@ -358,9 +374,33 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, } if (fanotify_event_object_fh_len(event)) { + if (fid_mode == FAN_REPORT_FID || info_type) { + /* + * With only group flag FAN_REPORT_FID only type FID is + * reported. Second info record type is always FID. + */ + info_type = FAN_EVENT_INFO_TYPE_FID; + } else if ((event->mask & ALL_FSNOTIFY_DIRENT_EVENTS) || + (event->mask & FAN_ONDIR)) { + /* + * With group flag FAN_REPORT_DIR_FID, a single info + * record has type DFID for directory entry modification + * event and for event on a directory. + */ + info_type = FAN_EVENT_INFO_TYPE_DFID; + } else { + /* + * With group flags FAN_REPORT_DIR_FID|FAN_REPORT_FID, + * a single info record has type FID for event on a + * non-directory, when there is no directory to report. + * For example, on FAN_DELETE_SELF event. + */ + info_type = FAN_EVENT_INFO_TYPE_FID; + } + ret = copy_info_to_user(fanotify_event_fsid(event), fanotify_event_object_fh(event), - NULL, 0, buf, count); + info_type, NULL, 0, buf, count); if (ret < 0) return ret; @@ -861,6 +901,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) struct fsnotify_group *group; int f_flags, fd; struct user_struct *user; + unsigned int fid_mode = flags & FANOTIFY_FID_BITS; + unsigned int class = flags & FANOTIFY_CLASS_BITS; pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); @@ -887,10 +929,19 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return -EINVAL; } - if ((flags & FANOTIFY_FID_BITS) && - (flags & FANOTIFY_CLASS_BITS) != FAN_CLASS_NOTIF) + if (fid_mode && class != FAN_CLASS_NOTIF) return -EINVAL; + /* Reporting either object fid or dir fid */ + switch (fid_mode) { + case 0: + case FAN_REPORT_FID: + case FAN_REPORT_DIR_FID: + break; + default: + return -EINVAL; + } + user = get_current_user(); if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { free_uid(user); @@ -926,7 +977,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) group->fanotify_data.f_flags = event_f_flags; init_waitqueue_head(&group->fanotify_data.access_waitq); INIT_LIST_HEAD(&group->fanotify_data.access_list); - switch (flags & FANOTIFY_CLASS_BITS) { + switch (class) { case FAN_CLASS_NOTIF: group->priority = FS_PRIO_0; break; @@ -1236,7 +1287,7 @@ COMPAT_SYSCALL_DEFINE6(fanotify_mark, */ static int __init fanotify_user_setup(void) { - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 8); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 9); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index bbbee11d2521..4ddac97b2bf7 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -18,7 +18,7 @@ #define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT) -#define FANOTIFY_FID_BITS (FAN_REPORT_FID) +#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DIR_FID) #define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ FAN_REPORT_TID | \ diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 7f2f17eacbf9..21afebf77fd7 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -53,6 +53,7 @@ /* Flags to determine fanotify event format */ #define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ +#define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ @@ -117,6 +118,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_FID 1 #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 +#define FAN_EVENT_INFO_TYPE_DFID 3 /* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -126,10 +128,11 @@ struct fanotify_event_info_header { }; /* - * Unique file identifier info record. This is used both for - * FAN_EVENT_INFO_TYPE_FID records and for FAN_EVENT_INFO_TYPE_DFID_NAME - * records. For FAN_EVENT_INFO_TYPE_DFID_NAME there is additionally a null - * terminated name immediately after the file handle. + * Unique file identifier info record. + * This structure is used for records of types FAN_EVENT_INFO_TYPE_FID, + * FAN_EVENT_INFO_TYPE_DFID and FAN_EVENT_INFO_TYPE_DFID_NAME. + * For FAN_EVENT_INFO_TYPE_DFID_NAME there is additionally a null terminated + * name immediately after the file handle. */ struct fanotify_event_info_fid { struct fanotify_event_info_header hdr; From 5128063739d293b9faf8ffaa9a5dfd622bc954f5 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:27 +0300 Subject: [PATCH 32/37] fanotify: report events with parent dir fid to sb/mount/non-dir marks In a group with flag FAN_REPORT_DIR_FID, when adding an inode mark with FAN_EVENT_ON_CHILD, events on non-directory children are reported with the fid of the parent. When adding a filesystem or mount mark or mark on a non-dir inode, we want to report events that are "possible on child" (e.g. open/close) also with fid of the parent, as if the victim inode's parent is interested in events "on child". Some events, currently only FAN_MOVE_SELF, should be reported to a sb/mount/non-dir mark with parent fid even though they are not reported to a watching parent. To get the desired behavior we set the flag FAN_EVENT_ON_CHILD on all the sb/mount/non-dir mark masks in a group with FAN_REPORT_DIR_FID. Link: https://lore.kernel.org/r/20200716084230.30611-20-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify_user.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index e494400711c9..7caa64d028ba 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1220,6 +1220,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; umask = FAN_EVENT_ON_CHILD; + /* + * If group needs to report parent fid, register for getting + * events with parent/name info for non-directory. + */ + if ((fid_mode & FAN_REPORT_DIR_FID) && + (flags & FAN_MARK_ADD) && !ignored) + mask |= FAN_EVENT_ON_CHILD; } /* create/update an inode mark */ From 929943b38daf817f2e6d303ea04401651fc3bc05 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:28 +0300 Subject: [PATCH 33/37] fanotify: add support for FAN_REPORT_NAME Introduce a new fanotify_init() flag FAN_REPORT_NAME. It requires the flag FAN_REPORT_DIR_FID and there is a constant for setting both flags named FAN_REPORT_DFID_NAME. For a group with flag FAN_REPORT_NAME, the parent fid and name are reported for directory entry modification events (create/detete/move) and for events on non-directory objects. Events on directories themselves are reported with their own fid and "." as the name. The parent fid and name are reported with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that parent fid is reported with into type FAN_EVENT_INFO_TYPE_DFID, but with an appended null terminated name string. Link: https://lore.kernel.org/r/20200716084230.30611-21-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 18 +++++++++++- fs/notify/fanotify/fanotify_user.c | 45 ++++++++++++++++++++++++------ include/linux/fanotify.h | 2 +- include/uapi/linux/fanotify.h | 4 +++ 4 files changed, 58 insertions(+), 11 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index fc2e1fab34af..d793f3e56b26 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -522,9 +522,25 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); bool name_event = false; - if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) + if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { id = dirid; + /* + * We record file name only in a group with FAN_REPORT_NAME + * and when we have a directory inode to report. + * + * For directory entry modification event, we record the fid of + * the directory and the name of the modified entry. + * + * For event on non-directory that is reported to parent, we + * record the fid of the parent and the name of the child. + */ + if ((fid_mode & FAN_REPORT_NAME) && + ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || + !(mask & FAN_ONDIR))) + name_event = true; + } + /* * For queues with unlimited length lost events are not expected and * can possibly have security implications. Avoid losing events when diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 7caa64d028ba..6b839790cb42 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -64,18 +64,27 @@ static int fanotify_fid_info_len(int fh_len, int name_len) return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN); } -static int fanotify_event_info_len(struct fanotify_event *event) +static int fanotify_event_info_len(unsigned int fid_mode, + struct fanotify_event *event) { struct fanotify_info *info = fanotify_event_info(event); int dir_fh_len = fanotify_event_dir_fh_len(event); int fh_len = fanotify_event_object_fh_len(event); int info_len = 0; + int dot_len = 0; - if (dir_fh_len) + if (dir_fh_len) { info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { + /* + * With group flag FAN_REPORT_NAME, if name was not recorded in + * event on a directory, we will report the name ".". + */ + dot_len = 1; + } if (fh_len) - info_len += fanotify_fid_info_len(fh_len, 0); + info_len += fanotify_fid_info_len(fh_len, dot_len); return info_len; } @@ -91,6 +100,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, { size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); pr_debug("%s: group=%p count=%zd\n", __func__, group, count); @@ -98,8 +108,8 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, if (fsnotify_notify_queue_is_empty(group)) goto out; - if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS)) { - event_size += fanotify_event_info_len( + if (fid_mode) { + event_size += fanotify_event_info_len(fid_mode, FANOTIFY_E(fsnotify_peek_first_event(group))); } @@ -325,7 +335,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); metadata.event_len = FAN_EVENT_METADATA_LEN + - fanotify_event_info_len(event); + fanotify_event_info_len(fid_mode, event); metadata.metadata_len = FAN_EVENT_METADATA_LEN; metadata.vers = FANOTIFY_METADATA_VERSION; metadata.reserved = 0; @@ -374,12 +384,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, } if (fanotify_event_object_fh_len(event)) { + const char *dot = NULL; + int dot_len = 0; + if (fid_mode == FAN_REPORT_FID || info_type) { /* * With only group flag FAN_REPORT_FID only type FID is * reported. Second info record type is always FID. */ info_type = FAN_EVENT_INFO_TYPE_FID; + } else if ((fid_mode & FAN_REPORT_NAME) && + (event->mask & FAN_ONDIR)) { + /* + * With group flag FAN_REPORT_NAME, if name was not + * recorded in an event on a directory, report the + * name "." with info type DFID_NAME. + */ + info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; + dot = "."; + dot_len = 1; } else if ((event->mask & ALL_FSNOTIFY_DIRENT_EVENTS) || (event->mask & FAN_ONDIR)) { /* @@ -400,7 +423,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, ret = copy_info_to_user(fanotify_event_fsid(event), fanotify_event_object_fh(event), - info_type, NULL, 0, buf, count); + info_type, dot, dot_len, buf, count); if (ret < 0) return ret; @@ -932,11 +955,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if (fid_mode && class != FAN_CLASS_NOTIF) return -EINVAL; - /* Reporting either object fid or dir fid */ + /* + * Reporting either object fid or dir fid. + * Child name is reported with parent fid so requires dir fid. + */ switch (fid_mode) { case 0: case FAN_REPORT_FID: case FAN_REPORT_DIR_FID: + case FAN_REPORT_DFID_NAME: break; default: return -EINVAL; @@ -1294,7 +1321,7 @@ COMPAT_SYSCALL_DEFINE6(fanotify_mark, */ static int __init fanotify_user_setup(void) { - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 9); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 4ddac97b2bf7..3e9c56ee651f 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -18,7 +18,7 @@ #define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT) -#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DIR_FID) +#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) #define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ FAN_REPORT_TID | \ diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 21afebf77fd7..fbf9c5c7dd59 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -54,6 +54,10 @@ #define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ +#define FAN_REPORT_NAME 0x00000800 /* Report events with name */ + +/* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */ +#define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME) /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ From 7e8283af6edeb7dd9f8f18a429e6738c07f00ce4 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:29 +0300 Subject: [PATCH 34/37] fanotify: report parent fid + name + child fid For a group with fanotify_init() flag FAN_REPORT_DFID_NAME, the parent fid and name are reported for events on non-directory objects with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME. If the group also has the init flag FAN_REPORT_FID, the child fid is also reported with another info record that follows the first info record. The second info record is the same info record that would have been reported to a group with only FAN_REPORT_FID flag. When the child fid needs to be recorded, the variable size struct fanotify_name_event is preallocated with enough space to store the child fh between the dir fh and the name. Link: https://lore.kernel.org/r/20200716084230.30611-22-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 27 +++++++++++++++++++++++---- fs/notify/fanotify/fanotify.h | 8 +++++++- fs/notify/fanotify/fanotify_user.c | 3 ++- 3 files changed, 32 insertions(+), 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index d793f3e56b26..4d1edddfbbb4 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -477,15 +477,19 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, __kernel_fsid_t *fsid, const struct qstr *file_name, + struct inode *child, gfp_t gfp) { struct fanotify_name_event *fne; struct fanotify_info *info; - struct fanotify_fh *dfh; + struct fanotify_fh *dfh, *ffh; unsigned int dir_fh_len = fanotify_encode_fh_len(id); + unsigned int child_fh_len = fanotify_encode_fh_len(child); unsigned int size; size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; + if (child_fh_len) + size += FANOTIFY_FH_HDR_LEN + child_fh_len; if (file_name) size += file_name->len + 1; fne = kmalloc(size, gfp); @@ -498,11 +502,15 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, fanotify_info_init(info); dfh = fanotify_info_dir_fh(info); info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, 0); + if (child_fh_len) { + ffh = fanotify_info_file_fh(info); + info->file_fh_totlen = fanotify_encode_fh(ffh, child, child_fh_len, 0); + } if (file_name) fanotify_info_copy_name(info, file_name); - pr_debug("%s: ino=%lu size=%u dir_fh_len=%u name_len=%u name='%.*s'\n", - __func__, id->i_ino, size, dir_fh_len, + pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", + __func__, id->i_ino, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info)); return &fne->fae; @@ -520,9 +528,19 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + struct inode *child = NULL; bool name_event = false; if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { + /* + * With both flags FAN_REPORT_DIR_FID and FAN_REPORT_FID, we + * report the child fid for events reported on a non-dir child + * in addition to reporting the parent fid and child name. + */ + if ((fid_mode & FAN_REPORT_FID) && + id != dirid && !(mask & FAN_ONDIR)) + child = id; + id = dirid; /* @@ -558,7 +576,8 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); } else if (name_event && file_name) { - event = fanotify_alloc_name_event(id, fsid, file_name, gfp); + event = fanotify_alloc_name_event(id, fsid, file_name, child, + gfp); } else if (fid_mode) { event = fanotify_alloc_fid_event(id, fsid, gfp); } else { diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 12c204b1489f..896c819a1786 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -193,6 +193,8 @@ static inline struct fanotify_fh *fanotify_event_object_fh( { if (event->type == FANOTIFY_EVENT_TYPE_FID) return &FANOTIFY_FE(event)->object_fh; + else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) + return fanotify_info_file_fh(&FANOTIFY_NE(event)->info); else return NULL; } @@ -208,9 +210,13 @@ static inline struct fanotify_info *fanotify_event_info( static inline int fanotify_event_object_fh_len(struct fanotify_event *event) { + struct fanotify_info *info = fanotify_event_info(event); struct fanotify_fh *fh = fanotify_event_object_fh(event); - return fh ? fh->len : 0; + if (info) + return info->file_fh_totlen ? fh->len : 0; + else + return fh ? fh->len : 0; } static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6b839790cb42..be328d7ee283 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -956,14 +956,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return -EINVAL; /* - * Reporting either object fid or dir fid. * Child name is reported with parent fid so requires dir fid. + * If reporting child name, we can report both child fid and dir fid. */ switch (fid_mode) { case 0: case FAN_REPORT_FID: case FAN_REPORT_DIR_FID: case FAN_REPORT_DFID_NAME: + case FAN_REPORT_DFID_NAME | FAN_REPORT_FID: break; default: return -EINVAL; From 691d976352c73f9b55486092625adbcedd5ca5c5 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 16 Jul 2020 11:42:30 +0300 Subject: [PATCH 35/37] fanotify: report parent fid + child fid Add support for FAN_REPORT_FID | FAN_REPORT_DIR_FID. Internally, it is implemented as a private case of reporting both parent and child fids and name, the parent and child fids are recorded in a variable length fanotify_name_event, but there is no name. It should be noted that directory modification events are recorded in fixed size fanotify_fid_event when not reporting name, just like with group flags FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-23-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 16 +++++++++++----- fs/notify/fanotify/fanotify_user.c | 15 ++++----------- 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 4d1edddfbbb4..bd9e88e889ea 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -535,7 +535,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, /* * With both flags FAN_REPORT_DIR_FID and FAN_REPORT_FID, we * report the child fid for events reported on a non-dir child - * in addition to reporting the parent fid and child name. + * in addition to reporting the parent fid and maybe child name. */ if ((fid_mode & FAN_REPORT_FID) && id != dirid && !(mask & FAN_ONDIR)) @@ -552,11 +552,17 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, * * For event on non-directory that is reported to parent, we * record the fid of the parent and the name of the child. + * + * Even if not reporting name, we need a variable length + * fanotify_name_event if reporting both parent and child fids. */ - if ((fid_mode & FAN_REPORT_NAME) && - ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || - !(mask & FAN_ONDIR))) + if (!(fid_mode & FAN_REPORT_NAME)) { + name_event = !!child; + file_name = NULL; + } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || + !(mask & FAN_ONDIR)) { name_event = true; + } } /* @@ -575,7 +581,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); - } else if (name_event && file_name) { + } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, gfp); } else if (fid_mode) { diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index be328d7ee283..559de311deca 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -371,7 +371,8 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, /* Event info records order is: dir fid + name, child fid */ if (fanotify_event_dir_fh_len(event)) { - info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; + info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : + FAN_EVENT_INFO_TYPE_DFID; ret = copy_info_to_user(fanotify_event_fsid(event), fanotify_info_dir_fh(info), info_type, fanotify_info_name(info), @@ -957,18 +958,10 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) /* * Child name is reported with parent fid so requires dir fid. - * If reporting child name, we can report both child fid and dir fid. + * We can report both child fid and dir fid with or without name. */ - switch (fid_mode) { - case 0: - case FAN_REPORT_FID: - case FAN_REPORT_DIR_FID: - case FAN_REPORT_DFID_NAME: - case FAN_REPORT_DFID_NAME | FAN_REPORT_FID: - break; - default: + if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL; - } user = get_current_user(); if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { From b9a1b9772509cbc6f6aa8bcd0b019f6347a2b631 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 22 Jul 2020 15:58:48 +0300 Subject: [PATCH 36/37] fsnotify: create method handle_inode_event() in fsnotify_operations The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/nfsd/filecache.c | 12 +++----- fs/notify/dnotify/dnotify.c | 38 +++++------------------ fs/notify/fsnotify.c | 52 ++++++++++++++++++++++++++++++-- include/linux/fsnotify_backend.h | 19 ++++++++++-- kernel/audit_fsnotify.c | 20 +++++------- kernel/audit_tree.c | 10 +++--- kernel/audit_watch.c | 17 +++++------ 7 files changed, 97 insertions(+), 71 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index bbc7892d2928..c8b9d2667ee6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -598,14 +598,10 @@ static struct notifier_block nfsd_file_lease_notifier = { }; static int -nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *file_name, u32 cookie, - struct fsnotify_iter_info *iter_info) +nfsd_file_fsnotify_handle_event(struct fsnotify_mark *mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *name) { - struct inode *inode = fsnotify_data_inode(data, data_type); - trace_nfsd_file_fsnotify_handle_event(inode, mask); /* Should be no marks on non-regular files */ @@ -626,7 +622,7 @@ nfsd_file_fsnotify_handle_event(struct fsnotify_group *group, u32 mask, static const struct fsnotify_ops nfsd_file_fsnotify_ops = { - .handle_event = nfsd_file_fsnotify_handle_event, + .handle_inode_event = nfsd_file_fsnotify_handle_event, .free_mark = nfsd_file_mark_free, }; diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index ca78d3f78da8..5dcda8f20c04 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -70,8 +70,9 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark) * destroy the dnotify struct if it was not registered to receive multiple * events. */ -static void dnotify_one_event(struct fsnotify_group *group, u32 mask, - struct fsnotify_mark *inode_mark) +static int dnotify_handle_event(struct fsnotify_mark *inode_mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *name) { struct dnotify_mark *dn_mark; struct dnotify_struct *dn; @@ -79,6 +80,10 @@ static void dnotify_one_event(struct fsnotify_group *group, u32 mask, struct fown_struct *fown; __u32 test_mask = mask & ~FS_EVENT_ON_CHILD; + /* not a dir, dnotify doesn't care */ + if (!dir && !(mask & FS_ISDIR)) + return 0; + dn_mark = container_of(inode_mark, struct dnotify_mark, fsn_mark); spin_lock(&inode_mark->lock); @@ -100,33 +105,6 @@ static void dnotify_one_event(struct fsnotify_group *group, u32 mask, } spin_unlock(&inode_mark->lock); -} - -static int dnotify_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *file_name, u32 cookie, - struct fsnotify_iter_info *iter_info) -{ - struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); - struct fsnotify_mark *child_mark = fsnotify_iter_child_mark(iter_info); - - /* not a dir, dnotify doesn't care */ - if (!dir && !(mask & FS_ISDIR)) - return 0; - - if (WARN_ON(fsnotify_iter_vfsmount_mark(iter_info))) - return 0; - - /* - * Some events can be sent on both parent dir and subdir marks - * (e.g. DN_ATTRIB). If both parent dir and subdir are watching, - * report the event once to parent dir and once to subdir. - */ - if (inode_mark) - dnotify_one_event(group, mask, inode_mark); - if (child_mark) - dnotify_one_event(group, mask, child_mark); return 0; } @@ -143,7 +121,7 @@ static void dnotify_free_mark(struct fsnotify_mark *fsn_mark) } static const struct fsnotify_ops dnotify_fsnotify_ops = { - .handle_event = dnotify_handle_event, + .handle_inode_event = dnotify_handle_event, .free_mark = dnotify_free_mark, }; diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 494d5d70323f..a960ec3a569a 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -230,6 +230,49 @@ notify: } EXPORT_SYMBOL_GPL(__fsnotify_parent); +static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, + const void *data, int data_type, + struct inode *dir, const struct qstr *name, + u32 cookie, struct fsnotify_iter_info *iter_info) +{ + struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); + struct fsnotify_mark *child_mark = fsnotify_iter_child_mark(iter_info); + struct inode *inode = fsnotify_data_inode(data, data_type); + const struct fsnotify_ops *ops = group->ops; + int ret; + + if (WARN_ON_ONCE(!ops->handle_inode_event)) + return 0; + + if (WARN_ON_ONCE(fsnotify_iter_sb_mark(iter_info)) || + WARN_ON_ONCE(fsnotify_iter_vfsmount_mark(iter_info))) + return 0; + + /* + * An event can be sent on child mark iterator instead of inode mark + * iterator because of other groups that have interest of this inode + * and have marks on both parent and child. We can simplify this case. + */ + if (!inode_mark) { + inode_mark = child_mark; + child_mark = NULL; + dir = NULL; + name = NULL; + } + + ret = ops->handle_inode_event(inode_mark, mask, inode, dir, name); + if (ret || !child_mark) + return ret; + + /* + * Some events can be sent on both parent dir and child marks + * (e.g. FS_ATTRIB). If both parent dir and child are watching, + * report the event once to parent dir with name and once to child + * without name. + */ + return ops->handle_inode_event(child_mark, mask, inode, NULL, NULL); +} + static int send_to_group(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info) @@ -275,8 +318,13 @@ static int send_to_group(__u32 mask, const void *data, int data_type, if (!(test_mask & marks_mask & ~marks_ignored_mask)) return 0; - return group->ops->handle_event(group, mask, data, data_type, dir, - file_name, cookie, iter_info); + if (group->ops->handle_event) { + return group->ops->handle_event(group, mask, data, data_type, dir, + file_name, cookie, iter_info); + } + + return fsnotify_handle_event(group, mask, data, data_type, dir, + file_name, cookie, iter_info); } static struct fsnotify_mark *fsnotify_first_mark(struct fsnotify_mark_connector **connp) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 32104cfc27a5..f8529a3a2923 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -128,17 +128,30 @@ struct mem_cgroup; * @cookie: inotify rename cookie * @iter_info: array of marks from this group that are interested in the event * + * handle_inode_event - simple variant of handle_event() for groups that only + * have inode marks and don't have ignore mask + * @mark: mark to notify + * @mask: event type and flags + * @inode: inode that event happened on + * @dir: optional directory associated with event - + * if @file_name is not NULL, this is the directory that + * @file_name is relative to. + * @file_name: optional file name associated with event + * * free_group_priv - called when a group refcnt hits 0 to clean up the private union * freeing_mark - called when a mark is being destroyed for some reason. The group - * MUST be holding a reference on each mark and that reference must be - * dropped in this function. inotify uses this function to send - * userspace messages that marks have been removed. + * MUST be holding a reference on each mark and that reference must be + * dropped in this function. inotify uses this function to send + * userspace messages that marks have been removed. */ struct fsnotify_ops { int (*handle_event)(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, u32 cookie, struct fsnotify_iter_info *iter_info); + int (*handle_inode_event)(struct fsnotify_mark *mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *file_name); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); void (*free_event)(struct fsnotify_event *event); diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index bd3a6b79316a..bfcfcd61adb6 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -152,35 +152,31 @@ static void audit_autoremove_mark_rule(struct audit_fsnotify_mark *audit_mark) } /* Update mark data in audit rules based on fsnotify events. */ -static int audit_mark_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *dname, u32 cookie, - struct fsnotify_iter_info *iter_info) +static int audit_mark_handle_event(struct fsnotify_mark *inode_mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *dname) { - struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); struct audit_fsnotify_mark *audit_mark; - const struct inode *inode = fsnotify_data_inode(data, data_type); audit_mark = container_of(inode_mark, struct audit_fsnotify_mark, mark); - BUG_ON(group != audit_fsnotify_group); - - if (WARN_ON(!inode)) + if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group) || + WARN_ON_ONCE(!inode)) return 0; if (mask & (FS_CREATE|FS_MOVED_TO|FS_DELETE|FS_MOVED_FROM)) { if (audit_compare_dname_path(dname, audit_mark->path, AUDIT_NAME_FULL)) return 0; audit_update_mark(audit_mark, inode); - } else if (mask & (FS_DELETE_SELF|FS_UNMOUNT|FS_MOVE_SELF)) + } else if (mask & (FS_DELETE_SELF|FS_UNMOUNT|FS_MOVE_SELF)) { audit_autoremove_mark_rule(audit_mark); + } return 0; } static const struct fsnotify_ops audit_mark_fsnotify_ops = { - .handle_event = audit_mark_handle_event, + .handle_inode_event = audit_mark_handle_event, .free_mark = audit_fsnotify_free_mark, }; diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 2ce2ac1ce100..025d24abf15d 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -1037,11 +1037,9 @@ static void evict_chunk(struct audit_chunk *chunk) audit_schedule_prune(); } -static int audit_tree_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *file_name, u32 cookie, - struct fsnotify_iter_info *iter_info) +static int audit_tree_handle_event(struct fsnotify_mark *mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *file_name) { return 0; } @@ -1070,7 +1068,7 @@ static void audit_tree_freeing_mark(struct fsnotify_mark *mark, } static const struct fsnotify_ops audit_tree_ops = { - .handle_event = audit_tree_handle_event, + .handle_inode_event = audit_tree_handle_event, .freeing_mark = audit_tree_freeing_mark, .free_mark = audit_tree_destroy_watch, }; diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index e23d54bcc587..246e5ba704c0 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -464,20 +464,17 @@ void audit_remove_watch_rule(struct audit_krule *krule) } /* Update watch data in audit rules based on fsnotify events. */ -static int audit_watch_handle_event(struct fsnotify_group *group, u32 mask, - const void *data, int data_type, - struct inode *dir, - const struct qstr *dname, u32 cookie, - struct fsnotify_iter_info *iter_info) +static int audit_watch_handle_event(struct fsnotify_mark *inode_mark, u32 mask, + struct inode *inode, struct inode *dir, + const struct qstr *dname) { - struct fsnotify_mark *inode_mark = fsnotify_iter_inode_mark(iter_info); - const struct inode *inode = fsnotify_data_inode(data, data_type); struct audit_parent *parent; parent = container_of(inode_mark, struct audit_parent, mark); - BUG_ON(group != audit_watch_group); - WARN_ON(!inode); + if (WARN_ON_ONCE(inode_mark->group != audit_watch_group) || + WARN_ON_ONCE(!inode)) + return 0; if (mask & (FS_CREATE|FS_MOVED_TO) && inode) audit_update_watch(parent, dname, inode->i_sb->s_dev, inode->i_ino, 0); @@ -490,7 +487,7 @@ static int audit_watch_handle_event(struct fsnotify_group *group, u32 mask, } static const struct fsnotify_ops audit_watch_fsnotify_ops = { - .handle_event = audit_watch_handle_event, + .handle_inode_event = audit_watch_handle_event, .free_mark = audit_watch_free_mark, }; From 8aed8cebdd973e95d20743e00e35467c7b467d0d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 28 Jul 2020 10:58:07 +0200 Subject: [PATCH 37/37] fanotify: compare fsid when merging name event When merging name events, fsids of the two involved events have to match. Otherwise we could merge events from two different filesystems and thus effectively loose the second event. Backporting note: Although the commit cacfb956d46e introducing this bug was merged for 5.7, the relevant code didn't get used in the end until 7e8283af6ede ("fanotify: report parent fid + name + child fid") which will be merged with this patch. So there's no need for backporting this. Fixes: cacfb956d46e ("fanotify: record name info for FAN_DIR_MODIFY event") Reported-by: Amir Goldstein Signed-off-by: Jan Kara --- fs/notify/fanotify/fanotify.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index bd9e88e889ea..c942910a8649 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -82,6 +82,9 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, if (!info1->dir_fh_totlen) return false; + if (!fanotify_fsid_equal(&fne1->fsid, &fne2->fsid)) + return false; + return fanotify_info_equal(info1, info2); }