OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Amir Goldstein	950cc0d2be	fsnotify: generalize handle_inode_event() The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: `b9a1b97725` ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-12-03 14:58:35 +01:00
Waiman Long	9289012374	inotify: Increase default inotify.max_user_watches limit to 1048576 The default value of inotify.max_user_watches sysctl parameter was set to 8192 since the introduction of the inotify feature in 2005 by commit `0eeca28300` ("[PATCH] inotify"). Today this value is just too small for many modern usage. As a result, users have to explicitly set it to a larger value to make it work. After some searching around the web, these are the inotify.max_user_watches values used by some projects: - vscode: 524288 - dropbox support: 100000 - users on stackexchange: 12228 - lsyncd user: 2000000 - code42 support: 1048576 - monodevelop: 16384 - tectonic: 524288 - openshift origin: 65536 Each watch point adds an inotify_inode_mark structure to an inode to be watched. It also pins the watched inode. Modeled after the epoll.max_user_watches behavior to adjust the default value according to the amount of addressable memory available, make inotify.max_user_watches behave in a similar way to make it use no more than 1% of addressable memory within the range [8192, 1048576]. We estimate the amount of memory used by inotify mark to size of inotify_inode_mark plus two times the size of struct inode (we double the inode size to cover the additional filesystem private inode part). That means that a 64-bit system with 128GB or more memory will likely have the maximum value of 1048576 for inotify.max_user_watches. This default should be big enough for most use cases. Link: https://lore.kernel.org/r/20201109035931.4740-1-longman@redhat.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-11-09 15:03:21 +01:00
Amir Goldstein	7372e79c9e	fanotify: fix logic of reporting name info with watched parent The victim inode's parent and name info is required when an event needs to be delivered to a group interested in filename info OR when the inode's parent is interested in an event on its children. Let us call the first condition 'parent_needed' and the second condition 'parent_interested'. In fsnotify_parent(), the condition where the inode's parent is interested in some events on its children, but not necessarily interested the specific event is called 'parent_watched'. fsnotify_parent() tests the condition (!parent_watched && !parent_needed) for sending the event without parent and name info, which is correct. It then wrongly assumes that parent_watched implies !parent_needed and tests the condition (parent_watched && !parent_interested) for sending the event without parent and name info, which is wrong, because parent may still be needed by some group. For example, after initializing a group with FAN_REPORT_DFID_NAME and adding a FAN_MARK_MOUNT with FAN_OPEN mask, open events on non-directory children of "testdir" are delivered with file name info. After adding another mark to the same group on the parent "testdir" with FAN_CLOSE\|FAN_EVENT_ON_CHILD mask, open events on non-directory children of "testdir" are no longer delivered with file name info. Fix the logic and use auxiliary variables to clarify the conditions. Fixes: `9b93f33105` ("fsnotify: send event with parent/name info to sb/mount/non-dir marks") Cc: stable@vger.kernel.org#v5.9 Link: https://lore.kernel.org/r/20201108105906.8493-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-11-09 15:03:08 +01:00
Roman Gushchin	b87d8cefe4	mm, memcg: rework remote charging API to support nesting Currently the remote memcg charging API consists of two functions: memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the memcg value, which overwrites the memcg of the current task. memalloc_use_memcg(target_memcg); <...> memalloc_unuse_memcg(); It works perfectly for allocations performed from a normal context, however an attempt to call it from an interrupt context or just nest two remote charging blocks will lead to an incorrect accounting. On exit from the inner block the active memcg will be cleared instead of being restored. memalloc_use_memcg(target_memcg); memalloc_use_memcg(target_memcg_2); <...> memalloc_unuse_memcg(); Error: allocation here are charged to the memcg of the current process instead of target_memcg. memalloc_unuse_memcg(); This patch extends the remote charging API by switching to a single function: struct mem_cgroup set_active_memcg(struct mem_cgroup memcg), which sets the new value and returns the old one. So a remote charging block will look like: old_memcg = set_active_memcg(target_memcg); <...> set_active_memcg(old_memcg); This patch is heavily based on the patch by Johannes Weiner, which can be found here: https://lkml.org/lkml/2020/5/28/806 . Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-18 09:27:09 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Jan Kara	8aed8cebdd	fanotify: compare fsid when merging name event When merging name events, fsids of the two involved events have to match. Otherwise we could merge events from two different filesystems and thus effectively loose the second event. Backporting note: Although the commit `cacfb956d4` introducing this bug was merged for 5.7, the relevant code didn't get used in the end until `7e8283af6e` ("fanotify: report parent fid + name + child fid") which will be merged with this patch. So there's no need for backporting this. Fixes: `cacfb956d4` ("fanotify: record name info for FAN_DIR_MODIFY event") Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-28 10:58:07 +02:00
Amir Goldstein	b9a1b97725	fsnotify: create method handle_inode_event() in fsnotify_operations The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:25:50 +02:00
Amir Goldstein	691d976352	fanotify: report parent fid + child fid Add support for FAN_REPORT_FID \| FAN_REPORT_DIR_FID. Internally, it is implemented as a private case of reporting both parent and child fids and name, the parent and child fids are recorded in a variable length fanotify_name_event, but there is no name. It should be noted that directory modification events are recorded in fixed size fanotify_fid_event when not reporting name, just like with group flags FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-23-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:24:01 +02:00
Amir Goldstein	7e8283af6e	fanotify: report parent fid + name + child fid For a group with fanotify_init() flag FAN_REPORT_DFID_NAME, the parent fid and name are reported for events on non-directory objects with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME. If the group also has the init flag FAN_REPORT_FID, the child fid is also reported with another info record that follows the first info record. The second info record is the same info record that would have been reported to a group with only FAN_REPORT_FID flag. When the child fid needs to be recorded, the variable size struct fanotify_name_event is preallocated with enough space to store the child fh between the dir fh and the name. Link: https://lore.kernel.org/r/20200716084230.30611-22-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:24:00 +02:00
Amir Goldstein	929943b38d	fanotify: add support for FAN_REPORT_NAME Introduce a new fanotify_init() flag FAN_REPORT_NAME. It requires the flag FAN_REPORT_DIR_FID and there is a constant for setting both flags named FAN_REPORT_DFID_NAME. For a group with flag FAN_REPORT_NAME, the parent fid and name are reported for directory entry modification events (create/detete/move) and for events on non-directory objects. Events on directories themselves are reported with their own fid and "." as the name. The parent fid and name are reported with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that parent fid is reported with into type FAN_EVENT_INFO_TYPE_DFID, but with an appended null terminated name string. Link: https://lore.kernel.org/r/20200716084230.30611-21-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:24:00 +02:00
Amir Goldstein	5128063739	fanotify: report events with parent dir fid to sb/mount/non-dir marks In a group with flag FAN_REPORT_DIR_FID, when adding an inode mark with FAN_EVENT_ON_CHILD, events on non-directory children are reported with the fid of the parent. When adding a filesystem or mount mark or mark on a non-dir inode, we want to report events that are "possible on child" (e.g. open/close) also with fid of the parent, as if the victim inode's parent is interested in events "on child". Some events, currently only FAN_MOVE_SELF, should be reported to a sb/mount/non-dir mark with parent fid even though they are not reported to a watching parent. To get the desired behavior we set the flag FAN_EVENT_ON_CHILD on all the sb/mount/non-dir mark masks in a group with FAN_REPORT_DIR_FID. Link: https://lore.kernel.org/r/20200716084230.30611-20-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:24:00 +02:00
Amir Goldstein	83b7a59896	fanotify: add basic support for FAN_REPORT_DIR_FID For now, the flag is mutually exclusive with FAN_REPORT_FID. Events include a single info record of type FAN_EVENT_INFO_TYPE_DFID with a directory file handle. For now, events are only reported for: - Directory modification events - Events on children of a watching directory - Events on directory objects Soon, we will add support for reporting the parent directory fid for events on non-directories with filesystem/mount mark and support for reporting both parent directory fid and child fid. Link: https://lore.kernel.org/r/20200716084230.30611-19-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:24:00 +02:00
Amir Goldstein	9b93f33105	fsnotify: send event with parent/name info to sb/mount/non-dir marks Similar to events "on child" to watching directory, send event with parent/name info if sb/mount/non-dir marks are interested in parent/name info. The FS_EVENT_ON_CHILD flag can be set on sb/mount/non-dir marks to specify interest in parent/name info for events on non-directory inodes. Events on "orphan" children (disconnected dentries) are sent without parent/name info. Events on directories are sent with parent/name info only if the parent directory is watching. After this change, even groups that do not subscribe to events on children could get an event with mark iterator type TYPE_CHILD and without mark iterator type TYPE_INODE if fanotify has marks on the same objects. dnotify and inotify event handlers can already cope with that situation. audit does not subscribe to events that are possible on child, so won't get to this situation. nfsd does not access the marks iterator from its event handler at the moment, so it is not affected. This is a bit too fragile, so we should prepare all groups to cope with mark type TYPE_CHILD preferably using a generic helper. Link: https://lore.kernel.org/r/20200716084230.30611-16-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:21:02 +02:00
Amir Goldstein	957f7b472c	inotify: do not set FS_EVENT_ON_CHILD in non-dir mark mask FS_EVENT_ON_CHILD has currently no meaning for non-dir inode marks. In the following patches we want to use that bit to mean that mark's notification group cares about parent and name information. So stop setting FS_EVENT_ON_CHILD for non-dir marks. Link: https://lore.kernel.org/r/20200722125849.17418-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:16:16 +02:00
Amir Goldstein	40a100d3ad	fsnotify: pass dir and inode arguments to fsnotify() The arguments of fsnotify() are overloaded and mean different things for different event types. Replace the to_tell argument with separate arguments @dir and @inode, because we may be sending to both dir and child. Using the @data argument to pass the child is not enough, because dirent events pass this argument (for audit), but we do not report to child. Document the new fsnotify() function argumenets. Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:15:48 +02:00
Amir Goldstein	82ace1efb3	fsnotify: create helper fsnotify_inode() Simple helper to consolidate biolerplate code. Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 23:13:51 +02:00
Amir Goldstein	497b0c5a7c	fsnotify: send event to parent and child with single callback Instead of calling fsnotify() twice, once with parent inode and once with child inode, if event should be sent to parent inode, send it with both parent and child inodes marks in object type iterator and call the backend handle_event() callback only once. The parent inode is assigned to the standard "inode" iterator type and the child inode is assigned to the special "child" iterator type. In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask, the dir argument to handle_event will be the parent inode, the file_name argument to handle_event is non NULL and refers to the name of the child and the child inode can be accessed with fsnotify_data_inode(). This will allow fanotify to make decisions based on child or parent's ignored mask. For example, when a parent is interested in a specific event on its children, but a specific child wishes to ignore this event, the event will not be reported. This is not what happens with current code, but according to man page, it is the expected behavior. Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:24:52 +02:00
Amir Goldstein	c8f3446c66	inotify: report both events on parent and child with single callback fsnotify usually calls inotify_handle_event() once for watching parent to report event with child's name and once for watching child to report event without child's name. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:24:51 +02:00
Amir Goldstein	62cb0af4ce	dnotify: report both events on parent and child with single callback For some events (e.g. DN_ATTRIB on sub-directory) fsnotify may call dnotify_handle_event() once for watching parent and once again for the watching sub-directory. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:24:51 +02:00
Amir Goldstein	f35c415678	fanotify: no external fh buffer in fanotify_name_event The fanotify_fh struct has an inline buffer of size 12 which is enough to store the most common local filesystem file handles (e.g. ext4, xfs). For file handles that do not fit in the inline buffer (e.g. btrfs), an external buffer is allocated to store the file handle. When allocating a variable size fanotify_name_event, there is no point in allocating also an external fh buffer when file handle does not fit in the inline buffer. Check required size for encoding fh, preallocate an event buffer sufficient to contain both file handle and name and store the name after the file handle. At this time, when not reporting name in event, we still allocate the fixed size fanotify_fid_event and an external buffer for large file handles, but fanotify_alloc_name_event() has already been prepared to accept a NULL file_name. Link: https://lore.kernel.org/r/20200716084230.30611-11-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:37 +02:00
Amir Goldstein	f454fa610a	fanotify: use struct fanotify_info to parcel the variable size buffer An fanotify event name is always recorded relative to a dir fh. Encapsulate the name_len member of fanotify_name_event in a new struct fanotify_info, which describes the parceling of the variable size buffer of an fanotify_name_event. The dir_fh member of fanotify_name_event is renamed to _dir_fh and is not accessed directly, but via the fanotify_info_dir_fh() accessor. Although the dir_fh len information is already available in struct fanotify_fh, we store it also in dif_fh_totlen member of fanotify_info, including the size of fanotify_fh header, so we know the offset of the name in the buffer without looking inside the dir_fh. We also add a file_fh_totlen member to allow packing another file handle in the variable size buffer after the dir_fh and before the name. We are going to use that space to store the child fid. Link: https://lore.kernel.org/r/20200716084230.30611-10-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:37 +02:00
Amir Goldstein	85af5d9258	fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on sb/mount marks and non-directory inode mask, but the flag was ignored. Mask out the flag if it is provided by user on sb/mount/non-dir marks and define it as an implicit flag that cannot be removed by user. This flag is going to be used internally to request for events with parent and name info. Link: https://lore.kernel.org/r/20200716084230.30611-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:37 +02:00
Amir Goldstein	4ed6814a91	fanotify: prepare for implicit event flags in mark mask So far, all flags that can be set in an fanotify mark mask can be set explicitly by a call to fanotify_mark(2). Prepare for defining implicit event flags that cannot be set by user with fanotify_mark(2), similar to how inotify/dnotify implicitly set the FS_EVENT_ON_CHILD flag. Implicit event flags cannot be removed by user and mark gets destroyed when only implicit event flags remain in the mask. Link: https://lore.kernel.org/r/20200716084230.30611-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	3ef8665366	fanotify: mask out special event flags from ignored mask The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had any meaning in ignored mask. Mask them out explicitly. Link: https://lore.kernel.org/r/20200716084230.30611-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	d809daf1b6	fanotify: generalize test for FAN_REPORT_FID As preparation for new flags that report fids, define a bit set of flags for a group reporting fids, currently containing the only bit FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	6ad1aadd97	fanotify: distinguish between fid encode error and null fid In fanotify_encode_fh(), both cases of NULL inode and failure to encode ended up with fh type FILEID_INVALID. Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT. This is just a semantic difference at this point. Remove stale comment and unneeded check from fid event compare helpers. Link: https://lore.kernel.org/r/20200716084230.30611-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	103ff6a554	fanotify: generalize merge logic of events on dir An event on directory should never be merged with an event on non-directory regardless of the event struct type. This change has no visible effect, because currently, with struct fanotify_path_event, the relevant events will not be merged because event path of dir will be different than event path of non-dir. Link: https://lore.kernel.org/r/20200716084230.30611-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	0badfa029e	fanotify: generalize the handling of extra event flags In fanotify_group_event_mask() there is logic in place to make sure we are not going to handle an event with no type and just FAN_ONDIR flag. Generalize this logic to any FANOTIFY_EVENT_FLAGS. There is only one more flag in this group at the moment - FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to fanotify_alloc_event() when group is reporting fid as indication that event happened on child. We will have use for this indication later on. Link: https://lore.kernel.org/r/20200716084230.30611-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	08b95c338e	fanotify: remove event FAN_DIR_MODIFY It was never enabled in uapi and its functionality is about to be superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group flag FAN_REPORT_NAME. Keep a place holder variable name_event instead of removing the name recording code since it will be used by the new events. Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 21:23:36 +02:00
Amir Goldstein	b54cecf5e2	fsnotify: pass dir argument to handle_event() callback The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-27 18:32:47 +02:00
Amir Goldstein	9c61f3b560	fanotify: break up fanotify_alloc_event() Break up fanotify_alloc_event() into helpers by event struct type. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 17:41:33 +02:00
Amir Goldstein	b8a6c3a2f0	fanotify: create overflow event type The special overflow event is allocated as struct fanotify_path_event, but with a null path. Use a special event type to identify the overflow event, so the helper fanotify_has_event_path() will always indicate a non null path. Allocating the overflow event doesn't need any of the fancy stuff in fanotify_alloc_event(), so create a simplified helper for allocating the overflow event. There is also no need to store and report the pid with an overflow event. Link: https://lore.kernel.org/r/20200708111156.24659-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 17:37:03 +02:00
Amir Goldstein	956235afd1	inotify: do not use objectid when comparing events inotify's event->wd is the object identifier. Compare that instead of the common fsnotidy event objectid, so we can get rid of the objectid field later. Link: https://lore.kernel.org/r/20200708111156.24659-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 17:36:58 +02:00
Amir Goldstein	cbcf47adc8	fsnotify: return non const from fsnotify_data_inode() Return non const inode pointer from fsnotify_data_inode(). None of the fsnotify hooks pass const inode pointer as data and callers often need to cast to a non const pointer. Link: https://lore.kernel.org/r/20200708111156.24659-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 17:36:45 +02:00
Amir Goldstein	c738fbabb0	fsnotify: fold fsnotify() call into fsnotify_parent() All (two) callers of fsnotify_parent() also call fsnotify() to notify the child inode. Move the second fsnotify() call into fsnotify_parent(). This will allow more flexibility in making decisions about which of the two event falvors should be sent. Using 'goto notify_child' in the inline helper seems a bit strange, but it mimics the code in __fsnotify_parent() for clarity and the goto pattern will become less strage after following patches are applied. Link: https://lore.kernel.org/r/20200708111156.24659-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 17:36:41 +02:00
Mel Gorman	71d734103e	fsnotify: Rearrange fast path to minimise overhead when there is no watcher The fsnotify paths are trivial to hit even when there are no watchers and they are surprisingly expensive. For example, every successful vfs_write() hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless FMODE_NONOTIFY is set which is an internal flag invisible to userspace. As it stands, fsnotify_parent is a guaranteed functional call even if there are no watchers and fsnotify() does a substantial amount of unnecessary work before it checks if there are any watchers. A perf profile showed that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the total samples taken in that function during a test. This patch rearranges the fast paths to reduce the amount of work done when there are no watchers. The test motivating this was "perf bench sched messaging --pipe". Despite the fact the pipes are anonymous, fsnotify is still called a lot and the overhead is noticeable even though it's completely pointless. It's likely the overhead is negligible for real IO so this is an extreme example. This is a comparison of hackbench using processes and pipes on a 1-socket machine with 8 CPU threads without fanotify watchers. 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Duration User 157.05 152.79 Duration System 1279.98 1219.32 Duration Elapsed 182.81 174.52 This is showing that the latencies are improved by roughly 2-9%. The variability is not shown but some of these results are within the noise as this workload heavily overloads the machine. That said, the system CPU usage is reduced by quite a bit so it makes sense to avoid the overhead even if it is a bit tricky to detect at times. A perf profile of just 1 group of tasks showed that 5.14% of samples taken were in either fsnotify() or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, mostly function entry and the initial check for watchers. The check for watchers is complicated enough that inlining it may be controversial. [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask Link: https://lore.kernel.org/r/20200708111156.24659-1-amir73il@gmail.com Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 15:29:10 +02:00
Jan Kara	47aaabdedf	fanotify: Avoid softlockups when reading many events When user provides large buffer for events and there are lots of events available, we can try to copy them all to userspace without scheduling which can softlockup the kernel (furthermore exacerbated by the contention on notification_lock). Add a scheduling point after copying each event. Note that usually the real underlying problem is the cost of fanotify event merging and the resulting contention on notification_lock but this is a cheap way to somewhat reduce the problem until we can properly address that. Reported-by: Francesco Ruggeri <fruggeri@arista.com> Link: https://lore.kernel.org/lkml/20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-07-15 15:23:28 +02:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Linus Torvalds	07c8f3bfef	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl7Y2McACgkQnJ2qBz9k QNlHzwf/e4oz9oRCXPqBwh6C318nl6ksQO5ooW+Dhb535cr/Cn99nuZa3GrvW+aq eSbypsvZQMguk0/okEc4jcTgLmEw+KubpBXOi/DJZ9dzGQrvjT2nBkQmaTqwp9dO WMZcJLmszkrtokjKD4lVjyQArcwqQF/v/moEKIImw5A6CY4R4odTaUOCPnTwF7P6 OXsDPwRfAccJ25ZUZ8hjc+fRl/Ncex6szciaJ08T4btlaAtc5UIn5Sy/u8BqNNiw 0VRheD4sJ2c25hLOIQJ5RETIeuYaRcR/BA3vm+k1d2iIiw4ubj9+ppwiaWOryA9U 5fXnBmXKuUUrwFihzmiLSckIpm3IPg== =kghV -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Several smaller fixes and cleanups for fsnotify subsystem" * tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix ignore mask logic for events on child and on dir fanotify: don't write with size under sizeof(response) fsnotify: Remove proc_fs.h include fanotify: remove reference to fill_event_metadata() fsnotify: add mutex destroy fanotify: prefix should_merge() fanotify: Replace zero-length array with flexible-array inotify: Fix error return code assignment flow. fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()	2020-06-04 13:51:54 -07:00
Linus Torvalds	b23c4771ff	A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I really hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl7VId8PHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yq/gH/iaDgirQZV6UZ2v9sfwQNYolNpf2sKAuOZjd bPFB7WJoMQbKwQEvYrAUL2+5zPOcLYuIfzyOfo1BV1py+EyKbACcKjI4AedxfJF7 +NchmOBhlEqmEhzx2U08HRc4/8J223WG17fJRVsV3p+opJySexSFeQucfOciX5NR RUCxweWWyg/FgyqjkyMMTtsePqZPmcT5dWTlVXISlbWzcv5NFhuJXnSrw8Sfzcmm SJMzqItv3O+CabnKQ8kMLV2PozXTMfjeWH47ZUK0Y8/8PP9+cvqwFzZ0UDQJ1Xaz oyW/TqmunaXhfMsMFeFGSwtfgwRHvXdxkQdtwNHvo1dV4dzTvDw= =fDC/ -----END PGP SIGNATURE----- Merge tag 'docs-5.8' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I really hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...	2020-06-01 15:45:27 -07:00
Amir Goldstein	f17936993a	fanotify: turn off support for FAN_DIR_MODIFY FAN_DIR_MODIFY has been enabled by commit `44d705b037` ("fanotify: report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are planning further extensions to the fanotify API and during that we realized that FAN_DIR_MODIFY may behave slightly differently to be more consistent with extensions we plan. So until we finalize these extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to userland. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-27 18:55:54 +02:00
Amir Goldstein	2f02fd3fa1	fanotify: fix ignore mask logic for events on child and on dir The comments in fanotify_group_event_mask() say: "If the event is on dir/child and this mark doesn't care about events on dir/child, don't send it!" Specifically, mount and filesystem marks do not care about events on child, but they can still specify an ignore mask for those events. For example, a group that has: - A mount mark with mask 0 and ignore_mask FAN_OPEN - An inode mark on a directory with mask FAN_OPEN \| FAN_OPEN_EXEC with flag FAN_EVENT_ON_CHILD A child file open for exec would be reported to group with the FAN_OPEN event despite the fact that FAN_OPEN is in ignore mask of mount mark, because the mark iteration loop skips over non-inode marks for events on child when calculating the ignore mask. Move ignore mask calculation to the top of the iteration loop block before excluding marks for events on dir/child. Link: https://lore.kernel.org/r/20200524072441.18258-1-amir73il@gmail.com Reported-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/linux-fsdevel/20200521162443.GA26052@quack2.suse.cz/ Fixes: `55bf882c7f` "fanotify: fix merging marks masks with FAN_ONDIR" Fixes: `b469e7e47c` "fanotify: fix handling of events on child..." Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-25 10:43:56 +02:00
Fabian Frederick	5e23663b49	fanotify: don't write with size under sizeof(response) fanotify_write() only aligned copy_from_user size to sizeof(response) for higher values. This patch avoids all values below as suggested by Amir Goldstein and set to response size unconditionally. Link: https://lore.kernel.org/r/20200512181921.405973-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-13 17:16:57 +02:00
Fabian Frederick	5a449099b9	fsnotify: Remove proc_fs.h include proc_fs.h was already included in fdinfo.h Link: https://lore.kernel.org/r/20200512181906.405927-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-13 17:16:14 +02:00
Fabian Frederick	c5e443cb7b	fanotify: remove reference to fill_event_metadata() fill_event_metadata() was removed in commit `bb2f7b4542` ("fanotify: open code fill_event_metadata()") Link: https://lore.kernel.org/r/20200512181836.405879-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-13 17:15:43 +02:00
Fabian Frederick	191e1656d1	fsnotify: add mutex destroy Call mutex_destroy() before freeing notification group. This only adds some additional debug checks when mutex debugging is enabled but still it may be useful. Link: https://lore.kernel.org/r/20200512181803.405832-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-13 17:14:20 +02:00
Fabian Frederick	ab3c4da0ad	fanotify: prefix should_merge() Prefix function with fanotify_ like others. Link: https://lore.kernel.org/r/20200512181715.405728-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-13 17:13:12 +02:00
Gustavo A. R. Silva	374ad001f7	fanotify: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200507185230.GA14229@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2020-05-08 10:38:12 +02:00
youngjun	7b26aa243d	inotify: Fix error return code assignment flow. If error code is initialized -EINVAL, there is no need to assign -EINVAL. Link: https://lore.kernel.org/r/20200426143316.29877-1-her0gyugyu@gmail.com Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-04-27 12:14:56 +02:00
Mauro Carvalho Chehab	0c1bc6b845	docs: filesystems: fix renamed references Some filesystem references got broken by a previous patch series I submitted. Address those. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: David Sterba <dsterba@suse.com> # fs/affs/Kconfig Link: https://lore.kernel.org/r/57318c53008dbda7f6f4a5a9e5787f4d37e8565a.1586881715.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-04-20 15:45:22 -06:00
Jules Irenge	00e0afb658	fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait() Sparse reports warnings at fsnotify_prepare_user_wait() and at fsnotify_finish_user_wait() warning: context imbalance in fsnotify_finish_user_wait() - wrong count at exit warning: context imbalance in fsnotify_prepare_user_wait() - unexpected unlock The root cause is the missing annotation at fsnotify_finish_user_wait() and at fsnotify_prepare_user_wait() fsnotify_prepare_user_wait() has an extra annotation __release() that only tell Sparse and not GCC to shutdown the warning Add the missing __acquires(&fsnotify_mark_srcu) annotation Add the missing __releases(&fsnotify_mark_srcu) annotation Add the __release(&fsnotify_mark_srcu) annotation. Link: https://lore.kernel.org/r/20200413214240.15245-1-jbi.octave@gmail.com Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-04-15 11:44:43 +02:00
Nathan Chancellor	6def1a1d2d	fanotify: Fix the checks in fanotify_fsid_equal Clang warns: fs/notify/fanotify/fanotify.c:28:23: warning: self-comparison always evaluates to true [-Wtautological-compare] return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1]; ^ fs/notify/fanotify/fanotify.c:28:57: warning: self-comparison always evaluates to true [-Wtautological-compare] return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1]; ^ 2 warnings generated. The intention was clearly to compare val[0] and val[1] in the two different fsid structs. Fix it otherwise this function always returns true. Fixes: `afc894c784` ("fanotify: Store fanotify handles differently") Link: https://github.com/ClangBuiltLinux/linux/issues/952 Link: https://lore.kernel.org/r/20200327171030.30625-1-natechancellor@gmail.com Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-30 12:40:53 +02:00
Amir Goldstein	44d705b037	fanotify: report name info for FAN_DIR_MODIFY event Report event FAN_DIR_MODIFY with name in a variable length record similar to how fid's are reported. With name info reporting implemented, setting FAN_DIR_MODIFY in mark mask is now allowed. When events are reported with name, the reported fid identifies the directory and the name follows the fid. The info record type for this event info is FAN_EVENT_INFO_TYPE_DFID_NAME. For now, all reported events have at most one info record which is either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for FAN_DIR_MODIFY). Later on, events "on child" will report both records. There are several ways that an application can use this information: 1. When watching a single directory, the name is always relative to the watched directory, so application need to fstatat(2) the name relative to the watched directory. 2. When watching a set of directories, the application could keep a map of dirfd for all watched directories and hash the map by fid obtained with name_to_handle_at(2). When getting a name event, the fid in the event info could be used to lookup the base dirfd in the map and then call fstatat(2) with that dirfd. 3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of directories, the application could use open_by_handle_at(2) with the fid in event info to obtain dirfd for the directory where event happened and call fstatat(2) with this dirfd. The last option scales better for a large number of watched directories. The first two options may be available in the future also for non privileged fanotify watchers, because open_by_handle_at(2) requires the CAP_DAC_READ_SEARCH capability. Link: https://lore.kernel.org/r/20200319151022.31456-15-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 23:17:16 +01:00
Amir Goldstein	cacfb956d4	fanotify: record name info for FAN_DIR_MODIFY event For FAN_DIR_MODIFY event, allocate a variable size event struct to store the dir entry name along side the directory file handle. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-14-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 23:17:10 +01:00
Jan Kara	01affd5471	fanotify: Drop fanotify_event_has_fid() When some events have directory id and some object id, fanotify_event_has_fid() becomes mostly useless and confusing because we usually need to know which type of file handle the event has. So just drop the function and use fanotify_event_object_fh() instead. Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:27:16 +01:00
Amir Goldstein	d766b55361	fanotify: prepare to report both parent and child fid's For some events, we are going to report both child and parent fid's, so pass fsid and file handle as arguments to copy_fid_to_user(), which is going to be called with parent and child file handles. Link: https://lore.kernel.org/r/20200319151022.31456-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:27:16 +01:00
Amir Goldstein	9e2ba2c34f	fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Dirent events are going to be supported in two flavors: 1. Directory fid info + mask that includes the specific event types (e.g. FAN_CREATE) and an optional FAN_ONDIR flag. 2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY. To request the second event flavor, user needs to set the event type FAN_DIR_MODIFY in the mark mask. The first flavor is supported since kernel v5.1 for groups initialized with flag FAN_REPORT_FID. It is intended to be used for watching directories in "batch mode" - the watcher is notified when directory is changed and re-scans the directory content in response. This event flavor is stored more compactly in the event queue, so it is optimal for workloads with frequent directory changes. The second event flavor is intended to be used for watching large directories, where the cost of re-scan of the directory on every change is considered too high. The watcher getting the event with the directory fid and entry name is expected to call fstatat(2) to query the content of the entry after the change. Legacy inotify events are reported with name and event mask (e.g. "foo", FAN_CREATE \| FAN_ONDIR). That can lead users to the conclusion that there is currently an entry "foo" that is a sub-directory, when in fact "foo" may be negative or non-dir by the time user gets the event. To make it clear that the current state of the named entry is unknown, when reporting an event with name info, fanotify obfuscates the specific event types (e.g. create,delete,rename) and uses a common event type - FAN_DIR_MODIFY to describe the change. This should make it harder for users to make wrong assumptions and write buggy filesystem monitors. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:27:16 +01:00
Jan Kara	7088f35720	fanotify: divorce fanotify_path_event and fanotify_fid_event Breakup the union and make them both inherit from abstract fanotify_event. fanotify_path_event, fanotify_fid_event and fanotify_perm_event inherit from fanotify_event. type field in abstract fanotify_event determines the concrete event type. fanotify_path_event, fanotify_fid_event and fanotify_perm_event are allocated from separate memcache pools. Rename fanotify_perm_event casting macro to FANOTIFY_PERM(), so that FANOTIFY_PE() and FANOTIFY_FE() can be used as casting macros to fanotify_path_event and fanotify_fid_event. [JK: Cleanup FANOTIFY_PE() and FANOTIFY_FE() to be proper inline functions and remove requirement that fanotify_event is the first in event structures] Link: https://lore.kernel.org/r/20200319151022.31456-11-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:27:16 +01:00
Jan Kara	afc894c784	fanotify: Store fanotify handles differently Currently, struct fanotify_fid groups fsid and file handle and is unioned together with struct path to save space. Also there is fh_type and fh_len directly in struct fanotify_event to avoid padding overhead. In the follwing patches, we will be adding more event types and this packing makes code difficult to follow. So unpack everything and create struct fanotify_fh which groups members logically related to file handle to make code easier to follow. In the following patch we will pack things again differently to make events smaller. Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:27:16 +01:00
Jan Kara	a741c2febe	fanotify: Simplify create_fd() create_fd() is never used with invalid path. Also the only thing it needs to know from fanotify_event is the path. Simplify the function to take path directly and assume it is correct. Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-25 10:22:54 +01:00
Amir Goldstein	55bf882c7f	fanotify: fix merging marks masks with FAN_ONDIR Change the logic of FAN_ONDIR in two ways that are similar to the logic of FAN_EVENT_ON_CHILD, that was fixed in commit `54a307ba8d` ("fanotify: fix logic of events on child"): 1. The flag is meaningless in ignore mask 2. The flag refers only to events in the mask of the mark where it is set This is what the fanotify_mark.2 man page says about FAN_ONDIR: "Without this flag, only events for files are created." It doesn't say anything about setting this flag in ignore mask to stop getting events on directories nor can I think of any setup where this capability would be useful. Currently, when marks masks are merged, the FAN_ONDIR flag set in one mark affects the events that are set in another mark's mask and this behavior causes unexpected results. For example, a user adds a mark on a directory with mask FAN_ATTRIB \| FAN_ONDIR and a mount mark with mask FAN_OPEN (without FAN_ONDIR). An opendir() of that directory (which is inside that mount) generates a FAN_OPEN event even though neither of the marks requested to get open events on directories. Link: https://lore.kernel.org/r/20200319151022.31456-10-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-24 12:06:32 +01:00
Amir Goldstein	f367a62a7c	fanotify: merge duplicate events on parent and child With inotify, when a watch is set on a directory and on its child, an event on the child is reported twice, once with wd of the parent watch and once with wd of the child watch without the filename. With fanotify, when a watch is set on a directory and on its child, an event on the child is reported twice, but it has the exact same information - either an open file descriptor of the child or an encoded fid of the child. The reason that the two identical events are not merged is because the object id used for merging events in the queue is the child inode in one event and parent inode in the other. For events with path or dentry data, use the victim inode instead of the watched inode as the object id for event merging, so that the event reported on parent will be merged with the event reported on the child. Link: https://lore.kernel.org/r/20200319151022.31456-9-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-24 11:29:12 +01:00
Amir Goldstein	dfc2d2594e	fsnotify: replace inode pointer with an object id The event inode field is used only for comparison in queue merges and cannot be dereferenced after handle_event(), because it does not hold a refcount on the inode. Replace it with an abstract id to do the same thing. Link: https://lore.kernel.org/r/20200319151022.31456-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-24 11:28:00 +01:00
Amir Goldstein	017de65fe5	fsnotify: simplify arguments passing to fsnotify_parent() Instead of passing both dentry and path and having to figure out which one to use, pass data/data_type to simplify the code. Link: https://lore.kernel.org/r/20200319151022.31456-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-23 18:22:48 +01:00
Amir Goldstein	aa93bdc550	fsnotify: use helpers to access data by data_type Create helpers to access path and inode from different data types. Link: https://lore.kernel.org/r/20200319151022.31456-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2020-03-23 18:19:06 +01:00
Eric Sandeen	1edc8eb2e9	fs: call fsnotify_sb_delete after evict_inodes When a filesystem is unmounted, we currently call fsnotify_sb_delete() before evict_inodes(), which means that fsnotify_unmount_inodes() must iterate over all inodes on the superblock looking for any inodes with watches. This is inefficient and can lead to livelocks as it iterates over many unwatched inodes. At this point, SB_ACTIVE is gone and dropping refcount to zero kicks the inode out out immediately, so anything processed by fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop. After that, the call to evict_inodes will evict everything else with a zero refcount. This should speed things up overall, and avoid livelocks in fsnotify_unmount_inodes(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-12-18 00:03:01 -05:00
Eric Sandeen	04646aebd3	fs: avoid softlockups in s_inodes iterators Anything that walks all inodes on sb->s_inodes list without rescheduling risks softlockups. Previous efforts were made in 2 functions, see: `c27d82f` fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() `ac05fbb` inode: don't softlockup when evicting inodes but there hasn't been an audit of all walkers, so do that now. This also consistently moves the cond_resched() calls to the bottom of each loop in cases where it already exists. One loop remains: remove_dquot_ref(), because I'm not quite sure how to deal with that one w/o taking the i_lock. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-12-18 00:03:01 -05:00
Linus Torvalds	0da522107e	compat_ioctl: remove most of fs/compat_ioctl.c As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite. Signed-off-by: Arnd Bergmann <arnd@arndb.de> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl bN65armTm7bFFR32Avnu =lgCl -----END PGP SIGNATURE----- Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...	2019-12-01 13:46:15 -08:00
Arnd Bergmann	1832f2d8ff	compat_ioctl: move more drivers to compat_ptr_ioctl The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2019-10-23 17:23:44 +02:00
Ben Dooks (Codethink)	ddd06c36bd	fsnotify/fdinfo: exportfs_encode_inode_fh() takes pointer as 4th argument The call to exportfs_encode_inode_fh() takes an pointer as the 4th argument, so replace the integer 0 with the NULL pointer. This fixes the following sparse warning: fs/notify/fdinfo.c:53:87: warning: Using plain integer as NULL pointer Link: https://lore.kernel.org/r/20191016095955.3347-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2019-10-17 10:32:59 +02:00
Ben Dooks	4a0b20be60	fsnotify: move declaration of fsnotify_mark_connector_cachep to fsnotify.h Move fsnotify_mark_connector_cachep to fsnotify.h to properly share it with the user in mark.c and avoid the following warning from sparse: fs/notify/mark.c:82:19: warning: symbol 'fsnotify_mark_connector_cachep' was not declared. Should it be static? Link: https://lore.kernel.org/r/20191015132518.21819-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2019-10-17 10:31:12 +02:00
Linus Torvalds	298fb76a55	Highlights: - add a new knfsd file cache, so that we don't have to open and close on each (NFSv2/v3) READ or WRITE. This can speed up read and write in some cases. It also replaces our readahead cache. - Prevent silent data loss on write errors, by treating write errors like server reboots for the purposes of write caching, thus forcing clients to resend their writes. - Tweak the code that allocates sessions to be more forgiving, so that NFSv4.1 mounts are less likely to hang when a server already has a lot of clients. - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be limited only by the backend filesystem and the maximum RPC size. - Allow the server to enforce use of the correct kerberos credentials when a client reclaims state after a reboot. And some miscellaneous smaller bugfixes and cleanup. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAl2OoFcVHGJmaWVsZHNA ZmllbGRzZXMub3JnAAoJECebzXlCjuG+dRoP/3OW1NxPjpjbCQWZL0M+O3AYJJla W8E+uoZKMosFEe/ymokMD0Vn5s47jPaMCifMjHZa2GygW8zHN9X2v0HURx/lob+o /rJXwMn78N/8kdbfDz2FvaCPeT0IuNzRIFBV8/sSXofqwCBwvPo+cl0QGrd4/xLp X35qlupx62TRk+kbdRjvv8kpS5SJ7BvR+FSA1WubNYWw2hpdEsr2OCFdGq2Wvthy DK6AfGBXfJGsOE+HGCSj6ejRV6i0UOJ17P8gRSsx+YT0DOe5E7ROjt+qvvRwk489 wmR8Vjuqr1e40eGAUq3xuLfk5F5NgycY4ekVxk/cTVFNwWcz2DfdjXQUlyPAbrSD SqIyxN1qdKT24gtr7AHOXUWJzBYPWDgObCVBXUGzyL81RiDdhf38HRNjL2TcSDld tzCjQ0wbPw+iT74v6qQRY05oS+h3JOtDjU4pxsBnxVtNn4WhGJtaLfWW8o1C1QwU bc4aX3TlYhDmzU7n7Zjt4rFXGJfyokM+o6tPao1Z60Pmsv1gOk4KQlzLtW/jPHx4 ZwYTwVQUKRDBfC62nmgqDyGI3/Qu11FuIxL2KXUCgkwDxNWN4YkwYjOGw9Lb5qKM wFpxq6CDNZB/IWLEu8Yg85kDPPUJMoI657lZb7Osr/MfBpU0YljcMOIzLBy8uV1u 9COUbPaQipiWGu/0 =diBo -----END PGP SIGNATURE----- Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "Highlights: - Add a new knfsd file cache, so that we don't have to open and close on each (NFSv2/v3) READ or WRITE. This can speed up read and write in some cases. It also replaces our readahead cache. - Prevent silent data loss on write errors, by treating write errors like server reboots for the purposes of write caching, thus forcing clients to resend their writes. - Tweak the code that allocates sessions to be more forgiving, so that NFSv4.1 mounts are less likely to hang when a server already has a lot of clients. - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be limited only by the backend filesystem and the maximum RPC size. - Allow the server to enforce use of the correct kerberos credentials when a client reclaims state after a reboot. And some miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits) sunrpc: clean up indentation issue nfsd: fix nfs read eof detection nfsd: Make nfsd_reset_boot_verifier_locked static nfsd: degraded slot-count more gracefully as allocation nears exhaustion. nfsd: handle drc over-allocation gracefully. nfsd: add support for upcall version 2 nfsd: add a "GetVersion" upcall for nfsdcld nfsd: Reset the boot verifier on all write I/O errors nfsd: Don't garbage collect files that might contain write errors nfsd: Support the server resetting the boot verifier nfsd: nfsd_file cache entries should be per net namespace nfsd: eliminate an unnecessary acl size limit Deprecate nfsd fault injection nfsd: remove duplicated include from filecache.c nfsd: Fix the documentation for svcxdr_tmpalloc() nfsd: Fix up some unused variable warnings nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: rip out the raparms cache nfsd: have nfsd_test_lock use the nfsd_file cache nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache ...	2019-09-27 17:00:27 -07:00
Linus Torvalds	5825a95fe9	selinux/stable-5.4 PR 20190917 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl2BLvcUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXP9pA/+Ls9sRGZoEipycbgRnwkL9/6yFtn4 UCFGMP0eobrjL82i8uMOa/72Budsp3ZaZRxf36NpbMDPyB9ohp5jf7o1WFTELESv EwxVvOMNwrxO2UbzRv3iywnhdPVJ4gHPa4GWfBHu2EEfhz3/Bv0tPIBdeXAbq4aC R0p+M9X0FFEp9eP4ftwOvFGpbZ8zKo1kwgdvCnqLhHDkyqtapqO/ByCTe1VATERP fyxjYDZNnITmI0plaIxCeeudklOTtVSAL4JPh1rk8rZIkUznZ4EBDHxdKiaz3j9C ZtAthiAA9PfAwf4DZSPHnGsfINxeNBKLD65jZn/PUne/gNJEx4DK041X9HXBNwjv OoArw58LCzxtTNZ//WB4CovRpeSdKvmKv0oh61k8cdQahLeHhzXE1wLQbnnBJLI3 CTsumIp4ZPEOX5r4ogdS3UIQpo3KrZump7VO85yUTRni150JpZR3egYpmcJ0So1A QTPemBhC2CHJVTpycYZ9fVTlPeC4oNwosPmvpB8XeGu3w5JpuNSId+BDR/ZlQAmq xWiIocGL3UMuPuJUrTGChifqBAgzK+gLa7S7RYPEnTCkj6LVQwsuP4gBXf75QTG4 FPwVcoMSDFxUDF0oFqwz4GfJlCxBSzX+BkWUn6jIiXKXBnQjU+1gu6KTwE25mf/j snJznFk25hFYFaM= =n4ht -----END PGP SIGNATURE----- Merge tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM hooks, and SELinux access control hooks, for dnotify, fanotify, and inotify watches. This has been discussed with both the LSM and fs/notify folks and everybody is good with these new hooks. - The LSM stacking changes missed a few calls to current_security() in the SELinux code; we fix those and remove current_security() for good. - Improve our network object labeling cache so that we always return the object's label, even when under memory pressure. Previously we would return an error if we couldn't allocate a new cache entry, now we always return the label even if we can't create a new cache entry for it. - Convert the sidtab atomic_t counter to a normal u32 with READ/WRITE_ONCE() and memory barrier protection. - A few patches to policydb.c to clean things up (remove forward declarations, long lines, bad variable names, etc) * tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: lsm: remove current_security() selinux: fix residual uses of current_security() for the SELinux blob selinux: avoid atomic_t usage in sidtab fanotify, inotify, dnotify, security: add security hook for fs notifications selinux: always return a secid from the network caches if we find one selinux: policydb - rename type_val_to_struct_array selinux: policydb - fix some checkpatch.pl warnings selinux: shuffle around policydb.c to get rid of forward declarations	2019-09-23 11:21:04 -07:00
Trond Myklebust	b72679ee89	notify: export symbols for use by the knfsd file cache The knfsd file cache will need to detect when files are unlinked, so that it can close the associated cached files. Export a minimal set of notifier functions to allow it to do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-08-19 11:00:39 -04:00
Aaron Goidel	ac5656d8a4	fanotify, inotify, dnotify, security: add security hook for fs notifications As of now, setting watches on filesystem objects has, at most, applied a check for read access to the inode, and in the case of fanotify, requires CAP_SYS_ADMIN. No specific security hook or permission check has been provided to control the setting of watches. Using any of inotify, dnotify, or fanotify, it is possible to observe, not only write-like operations, but even read access to a file. Modeling the watch as being merely a read from the file is insufficient for the needs of SELinux. This is due to the fact that read access should not necessarily imply access to information about when another process reads from a file. Furthermore, fanotify watches grant more power to an application in the form of permission events. While notification events are solely, unidirectional (i.e. they only pass information to the receiving application), permission events are blocking. Permission events make a request to the receiving application which will then reply with a decision as to whether or not that action may be completed. This causes the issue of the watching application having the ability to exercise control over the triggering process. Without drawing a distinction within the permission check, the ability to read would imply the greater ability to control an application. Additionally, mount and superblock watches apply to all files within the same mount or superblock. Read access to one file should not necessarily imply the ability to watch all files accessed within a given mount or superblock. In order to solve these issues, a new LSM hook is implemented and has been placed within the system calls for marking filesystem objects with inotify, fanotify, and dnotify watches. These calls to the hook are placed at the point at which the target path has been resolved and are provided with the path struct, the mask of requested notification events, and the type of object on which the mark is being set (inode, superblock, or mount). The mask and obj_type have already been translated into common FS_* values shared by the entirety of the fs notification infrastructure. The path struct is passed rather than just the inode so that the mount is available, particularly for mount watches. This also allows for use of the hook by pathname-based security modules. However, since the hook is intended for use even by inode based security modules, it is not placed under the CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security modules would need to enable all of the path hooks, even though they do not use any of them. This only provides a hook at the point of setting a watch, and presumes that permission to set a particular watch implies the ability to receive all notification about that object which match the mask. This is all that is required for SELinux. If other security modules require additional hooks or infrastructure to control delivery of notification, these can be added by them. It does not make sense for us to propose hooks for which we have no implementation. The understanding that all notifications received by the requesting application are all strictly of a type for which the application has been granted permission shows that this implementation is sufficient in its coverage. Security modules wishing to provide complete control over fanotify must also implement a security_file_open hook that validates that the access requested by the watching application is authorized. Fanotify has the issue that it returns a file descriptor with the file mode specified during fanotify_init() to the watching process on event. This is already covered by the LSM security_file_open hook if the security module implements checking of the requested file mode there. Otherwise, a watching process can obtain escalated access to a file for which it has not been authorized. The selinux_path_notify hook implementation works by adding five new file permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm (descriptions about which will follow), and one new filesystem permission: watch (which is applied to superblock checks). The hook then decides which subset of these permissions must be held by the requesting application based on the contents of the provided mask and the obj_type. The selinux_file_open hook already checks the requested file mode and therefore ensures that a watching process cannot escalate its access through fanotify. The watch, watch_mount, and watch_sb permissions are the baseline permissions for setting a watch on an object and each are a requirement for any watch to be set on a file, mount, or superblock respectively. It should be noted that having either of the other two permissions (watch_reads and watch_with_perm) does not imply the watch, watch_mount, or watch_sb permission. Superblock watches further require the filesystem watch permission to the superblock. As there is no labeled object in view for mounts, there is no specific check for mount watches beyond watch_mount to the inode. Such a check could be added in the future, if a suitable labeled object existed representing the mount. The watch_reads permission is required to receive notifications from read-exclusive events on filesystem objects. These events include accessing a file for the purpose of reading and closing a file which has been opened read-only. This distinction has been drawn in order to provide a direct indication in the policy for this otherwise not obvious capability. Read access to a file should not necessarily imply the ability to observe read events on a file. Finally, watch_with_perm only applies to fanotify masks since it is the only way to set a mask which allows for the blocking, permission event. This permission is needed for any watch which is of this type. Though fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit trust to root, which we do not do, and does not support least privilege. Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>	2019-08-12 17:45:39 -04:00
Matteo Croce	eec4844fae	proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero\|one\|int_max)' \|wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Shakeel Butt	ec16545096	memcg, fsnotify: no oom-kill for remote memcg charging Commit `d46eb14b73` ("fs: fsnotify: account fsnotify metadata to kmemcg") added remote memcg charging for fanotify and inotify event objects. The aim was to charge the memory to the listener who is interested in the events but without triggering the OOM killer. Otherwise there would be security concerns for the listener. At the time, oom-kill trigger was not in the charging path. A parallel work added the oom-kill back to charging path i.e. commit `29ef680ae7` ("memcg, oom: move out_of_memory back to the charge path"). So to not trigger oom-killer in the remote memcg, explicitly add __GFP_RETRY_MAYFAIL to the fanotigy and inotify event allocations. Link: http://lkml.kernel.org/r/20190514212259.156585-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:43 -07:00
Linus Torvalds	e6983afd92	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl0kWgwACgkQnJ2qBz9k QNkkdggA7bdy6xZRPdumZMxtASGDs1JJ4diNs+apgyc6wUfsT1lCE2ap20EdzfzK drAvlJt1vYEW+6apOzUXJ0qWXMVRzy4XRl+jVMO9GW6BoY4OyJQ86AQZlEv1zZ4n vxeYnlbxA7JyfkWgup0ZSb5EKRSO1eSxZKEZou0wu2jRCRr/E5RyjPQHXaiE5ihc 7ilEtTI3Qg3nnAK30F0Iy0X3lGqgXj+rlJ0TgR8BBEDllct2wV16vvMl/Sy+BXip 5sSWjSy8zntMnkSN8yH/oJN0D+fqmCsnYafwqTpPek8izvEz4xpjshbWTDnPm0HM eiMC1U3ZJoD3Z4/wxRZ91m60VYgJBA== =SVKR -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This contains cleanups of the fsnotify name removal hook and also a patch to disable fanotify permission events for 'proc' filesystem" * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: get rid of fsnotify_nameremove() fsnotify: move fsnotify_nameremove() hook out of d_delete() configfs: call fsnotify_rmdir() hook debugfs: call fsnotify_{unlink,rmdir}() hooks debugfs: simplify __debugfs_remove_file() devpts: call fsnotify_unlink() hook tracefs: call fsnotify_{unlink,rmdir}() hooks rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks btrfs: call fsnotify_rmdir() hook fsnotify: add empty fsnotify_{unlink,rmdir}() hooks fanotify: Disallow permission events for proc filesystem	2019-07-10 20:09:17 -07:00
Amir Goldstein	7377f5bec1	fsnotify: get rid of fsnotify_nameremove() For all callers of fsnotify_{unlink,rmdir}(), we made sure that d_parent and d_name are stable. Therefore, fsnotify_{unlink,rmdir}() do not need the safety measures in fsnotify_nameremove() to stabilize parent and name. We can now simplify those hooks and get rid of fsnotify_nameremove(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-06-20 14:47:54 +02:00
Amir Goldstein	c285a2f01d	fanotify: update connector fsid cache on add mark When implementing connector fsid cache, we only initialized the cache when the first mark added to object was added by FAN_REPORT_FID group. We forgot to update conn->fsid when the second mark is added by FAN_REPORT_FID group to an already attached connector without fsid cache. Reported-and-tested-by: syzbot+c277e8e2f46414645508@syzkaller.appspotmail.com Fixes: `77115225ac` ("fanotify: cache fsid in fsnotify_mark_connector") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-06-19 15:53:58 +02:00
Jan Kara	0b3b094ac9	fanotify: Disallow permission events for proc filesystem Proc filesystem has special locking rules for various files. Thus fanotify which opens files on event delivery can easily deadlock against another process that waits for fanotify permission event to be handled. Since permission events on /proc have doubtful value anyway, just disallow them. Link: https://lore.kernel.org/linux-fsdevel/20190320131642.GE9485@quack2.suse.cz/ Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-05-28 18:10:07 +02:00
Thomas Gleixner	3e0a4e8580	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091651.032047323@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-24 17:39:02 +02:00
Thomas Gleixner	c82ee6d3be	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program see the file copying if not write to the free software foundation 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 52 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 11:28:46 +02:00
Thomas Gleixner	ec8f24b7fa	treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:46 +02:00
Linus Torvalds	d4c608115c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlzZmHgACgkQnJ2qBz9k QNleKQf/VglDfB2VvuYnMbZp4YBPZgNMe0KlOTYE60RBg4DGZj04ov0bxSRZccPP KvHp1ZyIM4Bp67lKFZoI6bYd5diq7gVjHIdHrFc0LfDmpyaodqxe3bOMWT/TH0YU 0jjr7MU1wmVRY8UYrIqPIlL6Dl0hlxGf5cYPzoRQB4hzNWEWr9ZEdC1DbKupWTk/ 1GsYtCHZzpZ20YfzDU3jcYaHRVZDwZXb65Wx3OfUVof8q9bkpHd5lhl79jT4yVPS yYfqs1CW6ra2zsYDtBdmh46BZ3d36ACfcjmgf2/7GMdQP8Ocv0c5lnSSkrJeC5XO GttTFIKU7JWFAqiWmPAoGIchUXNkRA== =eSpi -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Two fsnotify fixes" * tag 'fsnotify_for_v5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix unlink performance regression fsnotify: Clarify connector assignment in fsnotify_add_mark_list()	2019-05-13 15:08:16 -07:00
Amir Goldstein	4d8e7055a4	fsnotify: fix unlink performance regression __fsnotify_parent() has an optimization in place to avoid unneeded take_dentry_name_snapshot(). When fsnotify_nameremove() was changed not to call __fsnotify_parent(), we left out the optimization. Kernel test robot reported a 5% performance regression in concurrent unlink() workload. Reported-by: kernel test robot <rong.a.chen@intel.com> Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/ Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/ Fixes: `5f02a87763` ("fsnotify: annotate directory entry modification events") CC: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-05-09 12:44:00 +02:00
Linus Torvalds	d27fb65bc2	Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc dcache updates from Al Viro: "Most of this pile is putting name length into struct name_snapshot and making use of it. The beginning of this series ("ovl_lookup_real_one(): don't bother with strlen()") ought to have been split in two (separate switch of name_snapshot to struct qstr from overlayfs reaping the trivial benefits of that), but I wanted to avoid a rebase - by the time I'd spotted that it was (a) in -next and (b) close to 5.1-final ;-/" * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: audit_compare_dname_path(): switch to const struct qstr * audit_update_watch(): switch to const struct qstr * inotify_handle_event(): don't bother with strlen() fsnotify: switch send_to_group() and ->handle_event to const struct qstr * fsnotify(): switch to passing const struct qstr * for file_name switch fsnotify_move() to passing const struct qstr * for old_name ovl_lookup_real_one(): don't bother with strlen() sysv: bury the broken "quietly truncate the long filenames" logics nsfs: unobfuscate unexport d_alloc_pseudo()	2019-05-07 20:03:32 -07:00
Linus Torvalds	eac7078a0f	pidfd patches for v5.2-rc1 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlzReuoACgkQjrBW1T7s sS1uvBAA16pgnhRNxNTrp3LYft6lUWmF4n0baOTVtQNLhPjpwaOxHIrCBugkQCJB QcQ9IQSOvIkaEW0XAQoPBaeLviiKhHOFw1Fv89OtW6xUidSfSV15lcI9f1F2pCm2 4yCL/8XvL6M0NhxiwftJAkWOXeDNLfjFnLwyLxBfgg3EeyqMgUB8raeosEID0ORR gm2/g8DYS2r+KNqM/F4xvMSgabfi2bGk+8BtAaVnftJfstpRNrqKwWnSK3Wspj1l 5gkb8gSsiY6ns3V6RgNHrFlhevFg8V+VjcJt7FR+aUEjOkcoiXas/PhvamMzdsn/ FM1F/A0pM8FSybIUClhnnnxNPc+p8ZN/71YQAPs+Mnh3xvbtKea2lkhC+Xv4OpK3 edutSZWFaiIery82Rk00H3vqiSF1+kRIXSpZSS4mElk4FsVljkyH+nSP7rbmE2MR EQe+kKnZl8QzWrVbnODC+EVvvVpA2bXDvENJmvKqus+t2G0OdV7Iku3F5E3KjF8k S5RRV1zuBF3ugqnjmYrVmJtpEA8mxClmqvg6okru+qW6ngO5oOgVpPLjWn1CXcdj wcuQ6Pe1QwAHS54e9WSWgCHVssLvm9nCdCqypdNaoyGWmbTWntwlrY7Y0JUQnAbB 6/G/DQQiCWY9y8bMZlTEydhIpgcsdROuPYv+oHF5+eQQthsWwHc= =LH11 -----END PGP SIGNATURE----- Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. After a thorough review from Oleg CLONE_PIDFD returns pidfds in the parent_tidptr argument. This means we can give back the associated pid and the pidfd at the same time. Access to process metadata information thus becomes rather trivial. As has been agreed, CLONE_PIDFD creates file descriptors based on anonymous inodes similar to the new mount api. They are made unconditional by this patchset as they are now needed by core kernel code (vfs, pidfd) even more than they already were before (timerfd, signalfd, io_uring, epoll etc.). The core patchset is rather small. The bulky looking changelist is caused by David's very simple changes to Kconfig to make anon inodes unconditional. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". To remove worries about missing metadata access this patchset comes with a sample/test program that illustrates how a combination of CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. Further work based on this patchset has been done by Joel. His work makes pidfds pollable. It finished too late for this merge window. I would prefer to have it sitting in linux-next for a while and send it for inclusion during the 5.3 merge window" * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: samples: show race-free pidfd metadata access signal: support CLONE_PIDFD with pidfd_send_signal clone: add CLONE_PIDFD Make anon_inodes unconditional	2019-05-07 12:30:24 -07:00
Jan Kara	11a6f8e2db	fsnotify: Clarify connector assignment in fsnotify_add_mark_list() Add a comment explaining why WRITE_ONCE() is enough when setting mark->connector which can get dereferenced by RCU protected readers. Signed-off-by: Jan Kara <jack@suse.cz>	2019-05-01 18:05:11 +02:00
Jan Kara	b1da6a5187	fsnotify: Fix NULL ptr deref in fanotify_get_fsid() fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can happen that it sees mark not fully initialized or mark that is already detached from the object list. In these cases mark->connector can be NULL leading to NULL ptr dereference. Fix the problem by being careful when reading mark->connector and check it for being NULL. Also use WRITE_ONCE when writing the mark just to prevent compiler from doing something stupid. Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com Fixes: `77115225ac` ("fanotify: cache fsid in fsnotify_mark_connector") Signed-off-by: Jan Kara <jack@suse.cz>	2019-04-28 22:14:50 +02:00
Al Viro	ce163918cd	inotify_handle_event(): don't bother with strlen() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-04-26 13:55:21 -04:00
Al Viro	e43e9c339a	fsnotify: switch send_to_group() and ->handle_event to const struct qstr * note that conditions surrounding accesses to dname in audit_watch_handle_event() and audit_mark_handle_event() guarantee that dname won't have been NULL. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-04-26 13:51:03 -04:00
Al Viro	25b229dff4	fsnotify(): switch to passing const struct qstr * for file_name Note that in fnsotify_move() and fsnotify_link() we are guaranteed that dentry->d_name won't change during the fsnotify() evaluation (by having the parent directory locked exclusive), so we don't need to fetch dentry->d_name.name in the callers. In fsnotify_dirent() the same stability of dentry->d_name is also true, but it's a bit more convoluted - there is one callchain (devpts_pty_new() -> fsnotify_create() -> fsnotify_dirent()) where the parent is _not_ locked, but on devpts ->d_name of everything is unchanging; it has neither explicit nor implicit renames. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-04-26 13:37:25 -04:00
Al Viro	230c6402b1	ovl_lookup_real_one(): don't bother with strlen() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-04-26 13:13:33 -04:00
David Howells	5dd50aaeb1	Make anon_inodes unconditional Make the anon_inodes facility unconditional so that it can be used by core VFS code and pidfd code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [christian@brauner.io: adapt commit message to mention pidfds] Signed-off-by: Christian Brauner <christian@brauner.io>	2019-04-19 14:03:11 +02:00
Jan Kara	b2d22b6bb3	fanotify: Allow copying of file handle to userspace When file handle is embedded inside fanotify_event and usercopy checks are enabled, we get a warning like: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'fanotify_event' (offset 40, size 8)! WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110 mm/usercopy.c:78 Annotate handling in fanotify_event properly to mark copying it to userspace is fine. Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com Fixes: `a8b13aa20a` ("fanotify: enable FAN_REPORT_FID init flag") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-03-19 09:29:07 +01:00
ZhangXiaoxu	62c9d2674b	inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() Commit `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fsnotify_put_mark() with IN_MASK_CREATE after fsnotify_find_mark() Fixes: `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-03-11 10:13:17 +01:00
Jan Kara	b519057981	fanotify: Make waits for fanotify events only killable Making waits for response to fanotify permission events interruptible can result in EINTR returns from open(2) or other syscalls when there's e.g. AV software that's monitoring the file. Orion reports that e.g. bash is complaining like: bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call So for now convert the wait from interruptible to only killable one. That is mostly invisible to userspace. Sadly this breaks hibernation with fanotify permission events pending again but we have to put more thought into how to fix this without regressing userspace visible behavior. Reported-by: Orion Poplawski <orion@nwra.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-21 11:47:23 +01:00
Jan Kara	fabf7f29b3	fanotify: Use interruptible wait when waiting for permission events When waiting for response to fanotify permission events, we currently use uninterruptible waits. That makes code simple however it can cause lots of processes to end up in uninterruptible sleep with hard reboot being the only alternative in case fanotify listener process stops responding (e.g. due to a bug in its implementation). Uninterruptible sleep also makes system hibernation fail if the listener gets frozen before the process generating fanotify permission event. Fix these problems by using interruptible sleep for waiting for response to fanotify event. This is slightly tricky though - we have to detect when the event got already reported to userspace as in that case we must not free the event. Instead we push the responsibility for freeing the event to the process that will write response to the event. Reported-by: Orion Poplawski <orion@nwra.com> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	40873284d7	fanotify: Track permission event state Track whether permission event got already reported to userspace and whether userspace already answered to the permission event. Protect stores to this field together with updates to ->response field by group->notification_lock. This will allow aborting wait for reply to permission event from userspace. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	ca6f86998d	fanotify: Simplify cleaning of access_list Simplify iteration cleaning access_list in fanotify_release(). That will make following changes more obvious. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	f7db89accc	fsnotify: Create function to remove event from notification list Create function to remove event from the notification list. Later it will be used from more places. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:41:16 +01:00
Jan Kara	8c5544666c	fanotify: Move locking inside get_one_event() get_one_event() has a single caller and that just locks notification_lock around the call. Move locking inside get_one_event() as that will make using ->response field for permission event state easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 12:40:46 +01:00
Jan Kara	af6a511306	fanotify: Fold dequeue_event() into process_access_response() Fold dequeue_event() into process_access_response(). This will make changes to use of ->response field easier. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-18 11:49:36 +01:00
Jan Kara	53136b393c	fanotify: Select EXPORTFS Fanotify now uses exportfs_encode_inode_fh() so it needs to select EXPORTFS. Fixes: `e9e0c89030` "fanotify: encode file identifier for FAN_REPORT_FID" Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-14 17:46:33 +01:00
Amir Goldstein	e7fce6d94c	fanotify: report FAN_ONDIR to listener with FAN_REPORT_FID dirent modification events (create/delete/move) do not carry the child entry name/inode information. Instead, we report FAN_ONDIR for mkdir/rmdir so user can differentiate them from creat/unlink. This is consistent with inotify reporting IN_ISDIR with dirent events and is useful for implementing recursive directory tree watcher. We avoid merging dirent events referring to subdirs with dirent events referring to non subdirs, otherwise, user won't be able to tell from a mask FAN_CREATE\|FAN_DELETE\|FAN_ONDIR if it describes mkdir+unlink pair or rmdir+create pair of events. For backward compatibility and consistency, do not report FAN_ONDIR to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR to user in FAN_REPORT_FID mode for all event types. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:47:32 +01:00
Amir Goldstein	235328d1fa	fanotify: add support for create/attrib/move/delete events Add support for events with data type FSNOTIFY_EVENT_INODE (e.g. create/attrib/move/delete) for inode and filesystem mark types. The "inode" events do not carry enough information (i.e. path) to report event->fd, so we do not allow setting a mask for those events unless group supports reporting fid. The "inode" events are not supported on a mount mark, because they do not carry enough information (i.e. path) to be filtered by mount point. The "dirent" events (create/move/delete) report the fid of the parent directory where events took place without specifying the filename of the child. In the future, fanotify may get support for reporting filename information for those events. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:43:23 +01:00
Amir Goldstein	83b535d289	fanotify: support events with data type FSNOTIFY_EVENT_INODE When event data type is FSNOTIFY_EVENT_INODE, we don't have a refernece to the mount, so we will not be able to open a file descriptor when user reads the event. However, if the listener has enabled reporting file identifier with the FAN_REPORT_FID init flag, we allow reporting those events and we use an identifier inode to encode fid. The inode to use as identifier when reporting fid depends on the event. For dirent modification events, we report the modified directory inode and we report the "victim" inode otherwise. For example: FS_ATTRIB reports the child inode even if reported on a watched parent. FS_CREATE reports the modified dir inode and not the created inode. [JK: Fixup condition in fanotify_group_event_mask()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:36 +01:00
Amir Goldstein	0321e03cb4	fanotify: check FS_ISDIR flag instead of d_is_dir() All fsnotify hooks set the FS_ISDIR flag for events that happen on directory victim inodes except for fsnotify_perm(). Add the missing FS_ISDIR flag in fsnotify_perm() hook and let fanotify_group_event_mask() check the FS_ISDIR flag instead of checking if path argument is a directory. This is needed for fanotify support for event types that do not carry path information. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:36 +01:00
Amir Goldstein	0a20df7ed3	fsnotify: report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events We need to report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events for fanotify, because fanotify API requires the user to explicitly request events on directories by FAN_ONDIR flag. inotify never reported IN_ISDIR with those events. It looks like an oversight, but to avoid the risk of breaking existing inotify programs, mask the FS_ISDIR flag out when reprting those events to inotify backend. We also add the FS_ISDIR flag with FS_ATTRIB event in the case of rename over an empty target directory. inotify did not report IN_ISDIR in this case, but it normally does report IN_ISDIR along with IN_ATTRIB event, so in this case, we do not mask out the FS_ISDIR flag. [JK: Simplify the checks in fsnotify_move()] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	73072283a2	fanotify: use vfs_get_fsid() helper instead of vfs_statfs() This is a cleanup that doesn't change any logic. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	77115225ac	fanotify: cache fsid in fsnotify_mark_connector For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on every event. To avoid having to call vfs_statfs() on every event to get fsid, we store the fsid in fsnotify_mark_connector on the first time we add a mark and on handle event we use the cached fsid. Subsequent calls to add mark on the same object are expected to pass the same fsid, so the call will fail on cached fsid mismatch. If an event is reported on several mark types (inode, mount, filesystem), all connectors should already have the same fsid, so we use the cached fsid from the first connector. [JK: Simplify code flow around fanotify_get_fid() make fsid argument of fsnotify_add_mark_locked() unconditional] Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	a8b13aa20a	fanotify: enable FAN_REPORT_FID init flag When setting up an fanotify listener, user may request to get fid information in event instead of an open file descriptor. The fid obtained with event on a watched object contains the file handle returned by name_to_handle_at(2) and fsid returned by statfs(2). Restrict FAN_REPORT_FID to class FAN_CLASS_NOTIF, because we have have no good reason to support reporting fid on permission events. When setting a mark, we need to make sure that the filesystem supports encoding file handles with name_to_handle_at(2) and that statfs(2) encodes a non-zero fsid. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	5e469c830f	fanotify: copy event fid info to user If group requested FAN_REPORT_FID and event has file identifier, copy that information to user reading the event after event metadata. fid information is formatted as struct fanotify_event_info_fid that includes a generic header struct fanotify_event_info_header, so that other info types could be defined in the future using the same header. metadata->event_len includes the length of the fid information. The fid information includes the filesystem's fsid (see statfs(2)) followed by an NFS file handle of the file that could be passed as an argument to open_by_handle_at(2). Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:35 +01:00
Amir Goldstein	e9e0c89030	fanotify: encode file identifier for FAN_REPORT_FID When user requests the flag FAN_REPORT_FID in fanotify_init(), a unique file identifier of the event target object will be reported with the event. The file identifier includes the filesystem's fsid (i.e. from statfs(2)) and an NFS file handle of the file (i.e. from name_to_handle_at(2)). The file identifier makes holding the path reference and passing a file descriptor to user redundant, so those are disabled in a group with FAN_REPORT_FID. Encode fid and store it in event for a group with FAN_REPORT_FID. Up to 12 bytes of file handle on 32bit arch (16 bytes on 64bit arch) are stored inline in fanotify_event struct. Larger file handles are stored in an external allocated buffer. On failure to encode fid, we print a warning and queue the event without the fid information. [JK: Fold part of later patched into this one to use exportfs_encode_inode_fh() right away] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:38:34 +01:00
Amir Goldstein	bb2f7b4542	fanotify: open code fill_event_metadata() The helper is quite trivial and open coding it will make it easier to implement copying event fid info to user. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-07 16:37:01 +01:00
Amir Goldstein	33913997d5	fanotify: rename struct fanotify_{,perm_}event_info struct fanotify_event_info "inherits" from struct fsnotify_event and therefore a more appropriate (and short) name for it is fanotify_event. Same for struct fanotify_perm_event_info, which now "inherits" from struct fanotify_event. We plan to reuse the name struct fanotify_event_info for user visible event info record format. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:25:45 +01:00
Amir Goldstein	a0a92d261f	fsnotify: move mask out of struct fsnotify_event Common fsnotify_event helpers have no need for the mask field. It is only used by backend code, so move the field out of the abstract fsnotify_event struct and into the concrete backend event structs. This change packs struct inotify_event_info better on 64bit machine and will allow us to cram some more fields into struct fanotify_event_info. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:25:11 +01:00
Amir Goldstein	45a9fb3725	fsnotify: send all event types to super block marks So far, existence of super block marks was checked only on events with data type FSNOTIFY_EVENT_PATH. Use the super block of the "to_tell" inode to report the events of all event types to super block marks. This change has no effect on current backends. Soon, this will allow fanotify backend to receive all event types on a super block mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-02-06 15:20:30 +01:00
Tetsuo Handa	125892edfe	inotify: Fix fd refcount leak in inotify_add_watch(). Commit `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fdput() before bailing out. Fixes: `4d97f7d53d` ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") CC: stable@vger.kernel.org Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2019-01-02 18:28:37 +01:00
Nikolay Borisov	ac9498d686	fanotify: Use inode_is_open_for_write Use the aptly named function rather than opencoding i_writecount check. No functional changes. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-12-11 10:55:45 +01:00
Kees Cook	5b03a472b4	fanotify: Make sure to check event_len when copying As a precaution, make sure we check event_len when copying to userspace. Based on old feedback: https://lkml.kernel.org/r/542D9FE5.3010009@gmx.de Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz>	2018-12-05 12:47:22 +01:00
Eric Biggers	d6f7aa9820	fsnotify/fdinfo: include fdinfo.h for inotify_show_fdinfo() inotify_show_fdinfo() is defined in fs/notify/fdinfo.c and declared in fs/notify/fdinfo.h, but the declaration isn't included at the point of the definition. Include the header to enforce that the definition matches the declaration. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-11-15 17:34:27 +01:00
Matthew Bobrowski	66917a3130	fanotify: introduce new event mask FAN_OPEN_EXEC_PERM A new event mask FAN_OPEN_EXEC_PERM has been defined. This allows users to receive events and grant access to files that are intending to be opened for execution. Events of FAN_OPEN_EXEC_PERM type will be generated when a file has been opened by using either execve(), execveat() or uselib() system calls. This acts in the same manner as previous permission event mask, meaning that an access response is required from the user application in order to permit any further operations on the file. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-11-13 18:41:05 +01:00
Matthew Bobrowski	9b076f1c0f	fanotify: introduce new event mask FAN_OPEN_EXEC A new event mask FAN_OPEN_EXEC has been defined so that users have the ability to receive events specifically when a file has been opened with the intent to be executed. Events of FAN_OPEN_EXEC type will be generated when a file has been opened using either execve(), execveat() or uselib() system calls. The feature is implemented within fsnotify_open() by generating the FAN_OPEN_EXEC event type if __FMODE_EXEC is set within file->f_flags. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-11-13 18:41:04 +01:00
Matthew Bobrowski	2d10b23082	fanotify: return only user requested event types in event mask Modify fanotify_should_send_event() so that it now returns a mask for an event that contains ONLY flags for the event types that have been specifically requested by the user. Flags that may have been included within the event mask, but have not been explicitly requested by the user will not be present in the returned value. As an example, given the situation where a user requests events of type FAN_OPEN. Traditionally, the event mask returned within an event that occurred on a filesystem object that has been marked for monitoring and is opened, will only ever have the FAN_OPEN bit set. With the introduction of the new flags like FAN_OPEN_EXEC, and perhaps any other future event flags, there is a possibility of the returned event mask containing more than a single bit set, despite having only requested the single event type. Prior to these modifications performed to fanotify_should_send_event(), a user would have received a bundled event mask containing flags FAN_OPEN and FAN_OPEN_EXEC in the instance that a file was opened for execution via execve(), for example. This means that a user would receive event types in the returned event mask that have not been requested. This runs the possibility of breaking existing systems and causing other unforeseen issues. To mitigate this possibility, fanotify_should_send_event() has been modified to return the event mask containing ONLY event types explicitly requested by the user. This means that we will NOT report events that the user did no set a mask for, and we will NOT report events that the user has set an ignore mask for. The function name fanotify_should_send_event() has also been updated so that it's more relevant to what it has been designed to do. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-11-13 18:38:18 +01:00
Amir Goldstein	b469e7e47c	fanotify: fix handling of events on child sub-directory When an event is reported on a sub-directory and the parent inode has a mark mask with FS_EVENT_ON_CHILD\|FS_ISDIR, the event will be sent to fsnotify() even if the event type is not in the parent mark mask (e.g. FS_OPEN). Further more, if that event happened on a mount or a filesystem with a mount/sb mark that does have that event type in their mask, the "on child" event will be reported on the mount/sb mark. That is not desired, because user will get a duplicate event for the same action. Note that the event reported on the victim inode is never merged with the event reported on the parent inode, because of the check in should_merge(): old_fsn->inode == new_fsn->inode. Fix this by looking for a match of an actual event type (i.e. not just FS_ISDIR) in parent's inode mark mask and by not reporting an "on child" event to group if event type is only found on mount/sb marks. [backport hint: The bug seems to have always been in fanotify, but this patch will only apply cleanly to v4.19.y] Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-11-08 15:43:48 +01:00
Jan Kara	721fb6fbfd	fsnotify: Fix busy inodes during unmount Detaching of mark connector from fsnotify_put_mark() can race with unmounting of the filesystem like: CPU1 CPU2 fsnotify_put_mark() spin_lock(&conn->lock); ... inode = fsnotify_detach_connector_from_object(conn) spin_unlock(&conn->lock); generic_shutdown_super() fsnotify_unmount_inodes() sees connector detached for inode -> nothing to do evict_inode() barfs on pending inode reference iput(inode); Resulting in "Busy inodes after unmount" message and possible kernel oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode references from detached connectors. Note that the accounting of outstanding inode references in the superblock can cause some cacheline contention on the counter. OTOH it happens only during deletion of the last notification mark from an inode (or during unlinking of watched inode) and that is not too bad. I have measured time to create & delete inotify watch 100000 times from 64 processes in parallel (each process having its own inotify group and its own file on a shared superblock) on a 64 CPU machine. Average and standard deviation of 15 runs look like: Avg Stddev Vanilla 9.817400 0.276165 Fixed 9.710467 0.228294 So there's no statistically significant difference. Fixes: `6b3f05d24d` ("fsnotify: Detach mark from object list when last reference is dropped") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-25 15:49:19 +02:00
Amir Goldstein	d0a6a87e40	fanotify: support reporting thread id instead of process id In order to identify which thread triggered the event in a multi-threaded program, add the FAN_REPORT_TID flag in fanotify_init to opt-in for reporting the event creator's thread id information. Signed-off-by: nixiaoming <nixiaoming@huawei.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-08 13:48:45 +02:00
Amir Goldstein	bdd5a46fe3	fanotify: add BUILD_BUG_ON() to count the bits of fanotify constants Also define the FANOTIFY_EVENT_FLAGS consisting of the extra flags FAN_ONDIR and FAN_ON_CHILD. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-04 13:30:03 +02:00
Amir Goldstein	a39f7ec417	fsnotify: convert runtime BUG_ON() to BUILD_BUG_ON() The BUG_ON() statements to verify number of bits in ALL_FSNOTIFY_BITS and ALL_INOTIFY_BITS are converted to build time check of the constant. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-04 13:28:50 +02:00
Amir Goldstein	23c9deeb32	fanotify: deprecate uapi FAN_ALL_* constants We do not want to add new bits to the FAN_ALL_* uapi constants because they have been exposed to userspace. If there are programs out there using these constants, those programs could break if re-compiled with modified FAN_ALL_* constants and run on an old kernel. We deprecate the uapi constants FAN_ALL_* and define new FANOTIFY_* constants for internal use to replace them. New feature bits will be added only to the new constants. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-04 13:28:38 +02:00
Amir Goldstein	a72fd224e3	fanotify: simplify handling of FAN_ONDIR fanotify mark add/remove code jumps through hoops to avoid setting the FS_ISDIR in the commulative object mask. That was just papering over a bug in fsnotify() handling of the FS_ISDIR extra flag. This bug is now fixed, so all the hoops can be removed along with the unneeded internal flag FAN_MARK_ONDIR. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-04 13:28:29 +02:00
Amir Goldstein	007d1e8395	fsnotify: generalize handling of extra event flags FS_EVENT_ON_CHILD gets a special treatment in fsnotify() because it is not a flag specifying an event type, but rather an extra flags that may be reported along with another event and control the handling of the event by the backend. FS_ISDIR is also an "extra flag" and not an "event type" and therefore desrves the same treatment. With inotify/dnotify backends it was never possible to set FS_ISDIR in mark masks, so it did not matter. With fanotify backend, mark adding code jumps through hoops to avoid setting the FS_ISDIR in the commulative object mask. Separate the constant ALL_FSNOTIFY_EVENTS to ALL_FSNOTIFY_FLAGS and ALL_FSNOTIFY_EVENTS, so the latter can be used to test for specific event types. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-10-04 13:28:02 +02:00
Amir Goldstein	96a71f21ef	fanotify: store fanotify_init() flags in group's fanotify_data This averts the need to re-generate flags in fanotify_show_fdinfo() and sets the scene for addition of more upcoming flags without growing new members to the fanotify_data struct. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-09-27 15:29:00 +02:00
Amir Goldstein	d54f4fba88	fanotify: add API to attach/detach super block mark Add another mark type flag FAN_MARK_FILESYSTEM for add/remove/flush of super block mark type. A super block watch gets all events on the filesystem, regardless of the mount from which the mark was added, unless an ignore mask exists on either the inode or the mount where the event was generated. Only one of FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM mark type flags may be provided to fanotify_mark() or no mark type flag for inode mark. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-09-03 15:16:11 +02:00
Amir Goldstein	60f7ed8c7c	fsnotify: send path type events to group with super block marks Send events to group if super block mark mask matches the event and unless the same group has an ignore mask on the vfsmount or the inode on which the event occurred. Soon, fanotify backend is going to support super block marks and fanotify backend only supports path type events. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-09-03 15:14:04 +02:00
Amir Goldstein	1e6cb72399	fsnotify: add super block object type Add the infrastructure to attach a mark to a super_block struct and detach all attached marks when super block is destroyed. This is going to be used by fanotify backend to setup super block marks. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-09-03 15:14:01 +02:00
Amir Goldstein	9bdda4e9cf	fsnotify: fix ignore mask logic in fsnotify() Commit `92183a4289` ("fsnotify: fix ignore mask logic in send_to_group()") acknoledges the use case of ignoring an event on an inode mark, because of an ignore mask on a mount mark of the same group (i.e. I want to get all events on this file, except for the events that came from that mount). This change depends on correctly merging the inode marks and mount marks group lists, so that the mount mark ignore mask would be tested in send_to_group(). Alas, the merging of the lists did not take into account the case where event in question is not in the mask of any of the mount marks. To fix this, completely remove the tests for inode and mount event masks from the lists merging code. Fixes: `92183a4289` ("fsnotify: fix ignore mask logic in send_to_group") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-09-03 14:57:41 +02:00
Linus Torvalds	f3f106dac0	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAluGfbAACgkQnJ2qBz9k QNmkVQgAjv/tCHStwoQ4Hhr6q5wLU9apAW5WC7Or2MyNfqoJsKgyujpikwFMa0xY wkVqlITYavvUKl/nRlQ6hpZtAY7hi1Y7GvIYs2ci6QO4YAUuBoH7qPLKqyZYzXx+ mBaC68885nMMDqIHvsCcLurwTiDer6XQXXPDmpMoO9g4kyVuhm2e0/M7CPaA4Ue9 WEfBPBROSNdRH7Wtww/MUvtfd+9mezdlpUQZVVO5cdhAVQ8pzW2g2piYtwIpaY7E DiJ1zcgaqCnni+b69x4wz8TwB0q8wZJjrPfJgf4X7mIGBZrw80yNRlevfe5SxwdJ mkMDvjaD0TEItvpjgTZReXaAjFOWTA== =E61M -----END PGP SIGNATURE----- Merge tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc fs fixes from Jan Kara: - make UDF to properly mount media created by Win7 - make isofs to properly refuse devices with large physical block size - fix a Spectre gadget in quotactl(2) - fix a warning in fsnotify code hit by syzkaller * tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix mounting of Win7 created UDF filesystems udf: Remove dead code from udf_find_fileset() fs/quota: Fix spectre gadget in do_quotactl fs/quota: Replace XQM_MAXQUOTAS usage with MAXQUOTAS isofs: reject hardware sector size > 2048 bytes fsnotify: fix false positive warning on inode delete	2018-08-29 14:56:45 -07:00
Linus Torvalds	0214f46b3a	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...	2018-08-21 13:47:29 -07:00
Jan Kara	d3bc0fa841	fsnotify: fix false positive warning on inode delete When inode is getting deleted and someone else holds reference to a mark attached to the inode, we just detach the connector from the inode. In that case fsnotify_put_mark() called from fsnotify_destroy_marks() will decide to recalculate mask for the inode and __fsnotify_recalc_mask() will WARN about invalid connector type: WARNING: CPU: 1 PID: 12015 at fs/notify/mark.c:139 __fsnotify_recalc_mask+0x2d7/0x350 fs/notify/mark.c:139 Actually there's no reason to warn about detached connector in __fsnotify_recalc_mask() so just silently skip updating the mask in such case. Reported-by: syzbot+c34692a51b9a6ca93540@syzkaller.appspotmail.com Fixes: `3ac70bfcde` ("fsnotify: add helper to get mask from connector") Signed-off-by: Jan Kara <jack@suse.cz>	2018-08-20 13:55:45 +02:00
Linus Torvalds	6ada4e2826	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ...	2018-08-17 16:49:31 -07:00
Shakeel Butt	d46eb14b73	fs: fsnotify: account fsnotify metadata to kmemcg Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-17 16:20:30 -07:00
Eric W. Biederman	019191342f	signal: Use PIDTYPE_TGID to clearly store where file signals will be sent When f_setown is called a pid and a pid type are stored. Replace the use of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread group. Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now is only for a thread. Update the users of __f_setown to use PIDTYPE_TGID instead of PIDTYPE_PID. For now the code continues to capture task_pid (when task_tgid would really be appropriate), and iterate on PIDTYPE_PID (even when type == PIDTYPE_TGID) out of an abundance of caution to preserve existing behavior. Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID for tgid lookup also be used to avoid taking the tasklist lock. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Eric W. Biederman	7a36094d61	pids: Compute task_tgid using signal->leader_pid The cost is the the same and this removes the need to worry about complications that come from de_thread and group_leader changing. __task_pid_nr_ns has been updated to take advantage of this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-07-21 10:43:12 -05:00
Henry Wilson	4d97f7d53d	inotify: Add flag IN_MASK_CREATE for inotify_add_watch() The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch() which prevents inotify from modifying any existing watches when invoked. If the pathname specified in the call has a watched inode associated with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST. Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and will return EINVAL. RATIONALE In the current implementation, there is no way to prevent inotify_add_watch() from modifying existing watch descriptors. Even if the caller keeps a record of all watch descriptors collected, this is only sufficient to detect that an existing watch descriptor may have been modified. The assumption that a particular path will map to the same inode over multiple calls to inotify_add_watch() cannot be made as files can be renamed or deleted. It is also not possible to assume that two distinct paths do no map to the same inode, due to hard-links or a dereferenced symbolic link. Further uses of inotify_add_watch() to revert the change may cause other watch descriptors to be modified or created, merely compunding the problem. There is currently no system call such as inotify_modify_watch() to explicity modify a watch descriptor, which would be able to revert unwanted changes. Thus the caller cannot guarantee to be able to revert any changes to existing watch decriptors. Additionally the caller cannot assume that the events that are associated with a watch descriptor are within the set requested, as any future calls to inotify_add_watch() may unintentionally modify a watch descriptor's mask. Thus it cannot currently be guaranteed that a watch descriptor will only generate events which have been requested. The program must filter events which come through its watch descriptor to within its expected range. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Henry Wilson <henry.wilson@acentic.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-06-27 19:21:25 +02:00
Amir Goldstein	eaa2c6b0c9	fanotify: factor out helpers to add/remove mark Factor out helpers fanotify_add_mark() and fanotify_remove_mark() to reduce duplicated code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-06-27 13:45:12 +02:00
Amir Goldstein	3ac70bfcde	fsnotify: add helper to get mask from connector Use a helper to get the mask from the object (i.e. i_fsnotify_mask) to generalize code of add/remove inode/vfsmount mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-06-27 13:45:07 +02:00
Amir Goldstein	36f10f55ff	fsnotify: let connector point to an abstract object Make the code to attach/detach a connector to object more generic by letting the fsnotify connector point to an abstract fsnotify_connp_t. Code that needs to dereference an inode or mount object now uses the helpers fsnotify_conn_{inode,mount}. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2018-06-27 13:45:05 +02:00

1 2 3 4 5 ...

700 Commits