License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2009-12-18 10:24:29 +08:00
|
|
|
#include <linux/fanotify.h>
|
2009-12-18 10:24:25 +08:00
|
|
|
#include <linux/fcntl.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/file.h>
|
2009-12-18 10:24:25 +08:00
|
|
|
#include <linux/fs.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/anon_inodes.h>
|
2009-12-18 10:24:25 +08:00
|
|
|
#include <linux/fsnotify_backend.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/init.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/mount.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/namei.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/poll.h>
|
2009-12-18 10:24:25 +08:00
|
|
|
#include <linux/security.h>
|
|
|
|
#include <linux/syscalls.h>
|
2010-05-19 23:36:28 +08:00
|
|
|
#include <linux/slab.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/types.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
#include <linux/uaccess.h>
|
2013-03-06 09:10:59 +08:00
|
|
|
#include <linux/compat.h>
|
2017-02-03 02:15:33 +08:00
|
|
|
#include <linux/sched/signal.h>
|
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 06:46:39 +08:00
|
|
|
#include <linux/memcontrol.h>
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
#include <asm/ioctls.h>
|
2009-12-18 10:24:25 +08:00
|
|
|
|
2011-11-25 15:35:16 +08:00
|
|
|
#include "../../mount.h"
|
2012-12-18 08:05:12 +08:00
|
|
|
#include "../fdinfo.h"
|
2014-01-22 07:48:14 +08:00
|
|
|
#include "fanotify.h"
|
2011-11-25 15:35:16 +08:00
|
|
|
|
2010-10-29 05:21:57 +08:00
|
|
|
#define FANOTIFY_DEFAULT_MAX_EVENTS 16384
|
2010-10-29 05:21:57 +08:00
|
|
|
#define FANOTIFY_DEFAULT_MAX_MARKS 8192
|
2010-10-29 05:21:58 +08:00
|
|
|
#define FANOTIFY_DEFAULT_MAX_LISTENERS 128
|
2010-10-29 05:21:57 +08:00
|
|
|
|
fanotify: check file flags passed in fanotify_init
Without this patch fanotify_init does not validate the value passed in
event_f_flags.
When a fanotify event is read from the fanotify file descriptor a new
file descriptor is created where file.f_flags = event_f_flags.
Internal and external open flags are stored together in field f_flags of
struct file. Hence, an application might create file descriptors with
internal flags like FMODE_EXEC, FMODE_NOCMTIME set.
Jan Kara and Eric Paris both aggreed that this is a bug and the value of
event_f_flags should be checked:
https://lkml.org/lkml/2014/4/29/522
https://lkml.org/lkml/2014/4/29/539
This updated patch version considers the comments by Michael Kerrisk in
https://lkml.org/lkml/2014/5/4/10
With the patch the value of event_f_flags is checked.
When specifying an invalid value error EINVAL is returned.
Internal flags are disallowed.
File creation flags are disallowed:
O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT.
Flags which do not make sense with fanotify are disallowed:
__O_TMPFILE, O_PATH, FASYNC, and O_DIRECT.
This leaves us with the following allowed values:
O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the
bits given by O_ACCMODE.
O_APPEND is working as expected. The value might be useful in a logging
application which appends the current status each time the log is opened.
O_LARGEFILE is needed for files exceeding 4GB on 32bit systems.
O_NONBLOCK may be useful when monitoring slow devices like tapes.
O_NDELAY is equal to O_NONBLOCK except for platform parisc.
To avoid code breaking on parisc either both flags should be
allowed or none. The patch allows both.
__O_SYNC and O_DSYNC may be used to avoid data loss on power disruption.
O_NOATIME may be useful to reduce disk activity.
O_CLOEXEC may be useful, if separate processes shall be used to scan files.
Once this patch is accepted, the fanotify_init.2 manpage has to be updated.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 07:05:44 +08:00
|
|
|
/*
|
|
|
|
* All flags that may be specified in parameter event_f_flags of fanotify_init.
|
|
|
|
*
|
|
|
|
* Internal and external open flags are stored together in field f_flags of
|
|
|
|
* struct file. Only external open flags shall be allowed in event_f_flags.
|
|
|
|
* Internal flags like FMODE_NONOTIFY, FMODE_EXEC, FMODE_NOCMTIME shall be
|
|
|
|
* excluded.
|
|
|
|
*/
|
|
|
|
#define FANOTIFY_INIT_ALL_EVENT_F_BITS ( \
|
|
|
|
O_ACCMODE | O_APPEND | O_NONBLOCK | \
|
|
|
|
__O_SYNC | O_DSYNC | O_CLOEXEC | \
|
|
|
|
O_LARGEFILE | O_NOATIME )
|
|
|
|
|
2009-12-18 10:24:29 +08:00
|
|
|
extern const struct fsnotify_ops fanotify_fsnotify_ops;
|
2009-12-18 10:24:25 +08:00
|
|
|
|
2016-12-22 01:06:12 +08:00
|
|
|
struct kmem_cache *fanotify_mark_cache __read_mostly;
|
2014-01-22 07:48:14 +08:00
|
|
|
struct kmem_cache *fanotify_event_cachep __read_mostly;
|
2014-04-04 05:46:33 +08:00
|
|
|
struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
/*
|
|
|
|
* Get an fsnotify notification event if one exists and is small
|
|
|
|
* enough to fit in "count". Return an error pointer if the count
|
|
|
|
* is not large enough.
|
|
|
|
*
|
2016-10-08 07:56:52 +08:00
|
|
|
* Called with the group->notification_lock held.
|
2009-12-18 10:24:26 +08:00
|
|
|
*/
|
|
|
|
static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
|
|
|
|
size_t count)
|
|
|
|
{
|
2016-10-08 07:57:01 +08:00
|
|
|
assert_spin_locked(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
|
|
|
|
|
|
|
|
if (fsnotify_notify_queue_is_empty(group))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (FAN_EVENT_METADATA_LEN > count)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
2016-10-08 07:56:52 +08:00
|
|
|
/* held the notification_lock the whole time, so this is the
|
2009-12-18 10:24:26 +08:00
|
|
|
* same event we peeked above */
|
2014-08-07 07:03:26 +08:00
|
|
|
return fsnotify_remove_first_event(group);
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
|
|
|
|
2012-08-20 00:30:45 +08:00
|
|
|
static int create_fd(struct fsnotify_group *group,
|
2019-01-11 01:04:32 +08:00
|
|
|
struct fanotify_event *event,
|
2014-01-22 07:48:14 +08:00
|
|
|
struct file **file)
|
2009-12-18 10:24:26 +08:00
|
|
|
{
|
|
|
|
int client_fd;
|
|
|
|
struct file *new_file;
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
According to commit 80af258867648 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].
Unfortunately O_CLOEXEC is currently silently ignored.
Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().
It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:
- in systemd's readahead[2]:
fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
- in clsync[3]:
#define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
- in examples [4] from "Filesystem monitoring in the Linux
kernel" article[5] by Aleksander Morgado:
if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
Additionally, since commit 48149e9d3a7e ("fanotify: check file flags
passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.
So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.
But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().
In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock. So close-on-exec is needed for most applications.
More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective. If not, it might weaken
their security, as noted by Jan Kara[8].
So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument. This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.
[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10 06:24:40 +08:00
|
|
|
client_fd = get_unused_fd_flags(group->fanotify_data.f_flags);
|
2009-12-18 10:24:26 +08:00
|
|
|
if (client_fd < 0)
|
|
|
|
return client_fd;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* we need a new file handle for the userspace program so it can read even if it was
|
|
|
|
* originally opened O_WRONLY.
|
|
|
|
*/
|
|
|
|
/* it's possible this event was an overflow event. in that case dentry and mnt
|
|
|
|
* are NULL; That's fine, just don't call dentry open */
|
2012-06-27 01:58:53 +08:00
|
|
|
if (event->path.dentry && event->path.mnt)
|
|
|
|
new_file = dentry_open(&event->path,
|
2010-07-28 22:18:37 +08:00
|
|
|
group->fanotify_data.f_flags | FMODE_NONOTIFY,
|
2009-12-18 10:24:26 +08:00
|
|
|
current_cred());
|
|
|
|
else
|
|
|
|
new_file = ERR_PTR(-EOVERFLOW);
|
|
|
|
if (IS_ERR(new_file)) {
|
|
|
|
/*
|
|
|
|
* we still send an event even if we can't open the file. this
|
|
|
|
* can happen when say tasks are gone and we try to open their
|
|
|
|
* /proc files or we try to open a WRONLY file like in sysfs
|
|
|
|
* we just send the errno to userspace since there isn't much
|
|
|
|
* else we can do.
|
|
|
|
*/
|
|
|
|
put_unused_fd(client_fd);
|
|
|
|
client_fd = PTR_ERR(new_file);
|
|
|
|
} else {
|
2012-08-20 00:30:45 +08:00
|
|
|
*file = new_file;
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
return client_fd;
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
|
|
|
|
2019-01-11 01:04:32 +08:00
|
|
|
static struct fanotify_perm_event *dequeue_event(
|
2014-04-04 05:46:33 +08:00
|
|
|
struct fsnotify_group *group, int fd)
|
2009-12-18 10:24:34 +08:00
|
|
|
{
|
2019-01-11 01:04:32 +08:00
|
|
|
struct fanotify_perm_event *event, *return_e = NULL;
|
2009-12-18 10:24:34 +08:00
|
|
|
|
2016-10-08 07:56:55 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2014-04-04 05:46:33 +08:00
|
|
|
list_for_each_entry(event, &group->fanotify_data.access_list,
|
|
|
|
fae.fse.list) {
|
|
|
|
if (event->fd != fd)
|
2009-12-18 10:24:34 +08:00
|
|
|
continue;
|
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
list_del_init(&event->fae.fse.list);
|
|
|
|
return_e = event;
|
2009-12-18 10:24:34 +08:00
|
|
|
break;
|
|
|
|
}
|
2016-10-08 07:56:55 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2009-12-18 10:24:34 +08:00
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
pr_debug("%s: found return_re=%p\n", __func__, return_e);
|
2009-12-18 10:24:34 +08:00
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
return return_e;
|
2009-12-18 10:24:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int process_access_response(struct fsnotify_group *group,
|
|
|
|
struct fanotify_response *response_struct)
|
|
|
|
{
|
2019-01-11 01:04:32 +08:00
|
|
|
struct fanotify_perm_event *event;
|
2014-04-04 05:46:33 +08:00
|
|
|
int fd = response_struct->fd;
|
|
|
|
int response = response_struct->response;
|
2009-12-18 10:24:34 +08:00
|
|
|
|
|
|
|
pr_debug("%s: group=%p fd=%d response=%d\n", __func__, group,
|
|
|
|
fd, response);
|
|
|
|
/*
|
|
|
|
* make sure the response is valid, if invalid we do nothing and either
|
2011-03-31 09:57:33 +08:00
|
|
|
* userspace can send a valid response or we will clean it up after the
|
2009-12-18 10:24:34 +08:00
|
|
|
* timeout
|
|
|
|
*/
|
2017-10-03 08:21:39 +08:00
|
|
|
switch (response & ~FAN_AUDIT) {
|
2009-12-18 10:24:34 +08:00
|
|
|
case FAN_ALLOW:
|
|
|
|
case FAN_DENY:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (fd < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-09-22 02:20:30 +08:00
|
|
|
if ((response & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
|
2017-10-03 08:21:39 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
event = dequeue_event(group, fd);
|
|
|
|
if (!event)
|
2009-12-18 10:24:34 +08:00
|
|
|
return -ENOENT;
|
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
event->response = response;
|
2009-12-18 10:24:34 +08:00
|
|
|
wake_up(&group->fanotify_data.access_waitq);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
static ssize_t copy_event_to_user(struct fsnotify_group *group,
|
2019-01-11 01:04:33 +08:00
|
|
|
struct fsnotify_event *fsn_event,
|
2018-12-05 07:44:46 +08:00
|
|
|
char __user *buf, size_t count)
|
2009-12-18 10:24:26 +08:00
|
|
|
{
|
2019-01-11 01:04:33 +08:00
|
|
|
struct fanotify_event_metadata metadata;
|
|
|
|
struct fanotify_event *event;
|
|
|
|
struct file *f = NULL;
|
2009-12-18 10:24:34 +08:00
|
|
|
int fd, ret;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2019-01-11 01:04:33 +08:00
|
|
|
pr_debug("%s: group=%p event=%p\n", __func__, group, fsn_event);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2019-01-11 01:04:33 +08:00
|
|
|
event = container_of(fsn_event, struct fanotify_event, fse);
|
|
|
|
metadata.event_len = FAN_EVENT_METADATA_LEN;
|
|
|
|
metadata.metadata_len = FAN_EVENT_METADATA_LEN;
|
|
|
|
metadata.vers = FANOTIFY_METADATA_VERSION;
|
|
|
|
metadata.reserved = 0;
|
|
|
|
metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
|
|
|
|
metadata.pid = pid_vnr(event->pid);
|
|
|
|
|
|
|
|
if (unlikely(event->mask & FAN_Q_OVERFLOW)) {
|
|
|
|
fd = FAN_NOFD;
|
|
|
|
} else {
|
|
|
|
fd = create_fd(group, event, &f);
|
|
|
|
if (fd < 0)
|
|
|
|
return fd;
|
|
|
|
}
|
|
|
|
metadata.fd = fd;
|
2009-12-18 10:24:34 +08:00
|
|
|
|
|
|
|
ret = -EFAULT;
|
2018-12-05 07:44:46 +08:00
|
|
|
/*
|
|
|
|
* Sanity check copy size in case get_one_event() and
|
|
|
|
* fill_event_metadata() event_len sizes ever get out of sync.
|
|
|
|
*/
|
2019-01-11 01:04:33 +08:00
|
|
|
if (WARN_ON_ONCE(metadata.event_len > count))
|
2018-12-05 07:44:46 +08:00
|
|
|
goto out_close_fd;
|
2019-01-11 01:04:33 +08:00
|
|
|
|
|
|
|
if (copy_to_user(buf, &metadata, metadata.event_len))
|
2012-08-20 00:30:45 +08:00
|
|
|
goto out_close_fd;
|
|
|
|
|
2019-01-11 01:04:33 +08:00
|
|
|
if (fanotify_is_perm_event(event->mask))
|
|
|
|
FANOTIFY_PE(fsn_event)->fd = fd;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2012-11-19 03:19:00 +08:00
|
|
|
if (fd != FAN_NOFD)
|
|
|
|
fd_install(fd, f);
|
2019-01-11 01:04:33 +08:00
|
|
|
return metadata.event_len;
|
2009-12-18 10:24:34 +08:00
|
|
|
|
|
|
|
out_close_fd:
|
2012-08-20 00:30:45 +08:00
|
|
|
if (fd != FAN_NOFD) {
|
|
|
|
put_unused_fd(fd);
|
|
|
|
fput(f);
|
|
|
|
}
|
2009-12-18 10:24:34 +08:00
|
|
|
return ret;
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* intofiy userspace file descriptor functions */
|
2017-07-03 13:02:18 +08:00
|
|
|
static __poll_t fanotify_poll(struct file *file, poll_table *wait)
|
2009-12-18 10:24:26 +08:00
|
|
|
{
|
|
|
|
struct fsnotify_group *group = file->private_data;
|
2017-07-03 13:02:18 +08:00
|
|
|
__poll_t ret = 0;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
poll_wait(file, &group->notification_waitq, wait);
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
if (!fsnotify_notify_queue_is_empty(group))
|
2018-02-12 06:34:03 +08:00
|
|
|
ret = EPOLLIN | EPOLLRDNORM;
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static ssize_t fanotify_read(struct file *file, char __user *buf,
|
|
|
|
size_t count, loff_t *pos)
|
|
|
|
{
|
|
|
|
struct fsnotify_group *group;
|
|
|
|
struct fsnotify_event *kevent;
|
|
|
|
char __user *start;
|
|
|
|
int ret;
|
2014-12-16 23:28:38 +08:00
|
|
|
DEFINE_WAIT_FUNC(wait, woken_wake_function);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
start = buf;
|
|
|
|
group = file->private_data;
|
|
|
|
|
|
|
|
pr_debug("%s: group=%p\n", __func__, group);
|
|
|
|
|
2014-12-16 23:28:38 +08:00
|
|
|
add_wait_queue(&group->notification_waitq, &wait);
|
2009-12-18 10:24:26 +08:00
|
|
|
while (1) {
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
kevent = get_one_event(group, count);
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2014-04-04 05:46:35 +08:00
|
|
|
if (IS_ERR(kevent)) {
|
2009-12-18 10:24:26 +08:00
|
|
|
ret = PTR_ERR(kevent);
|
2014-04-04 05:46:35 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!kevent) {
|
|
|
|
ret = -EAGAIN;
|
|
|
|
if (file->f_flags & O_NONBLOCK)
|
2009-12-18 10:24:26 +08:00
|
|
|
break;
|
2014-04-04 05:46:35 +08:00
|
|
|
|
|
|
|
ret = -ERESTARTSYS;
|
|
|
|
if (signal_pending(current))
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (start != buf)
|
2009-12-18 10:24:26 +08:00
|
|
|
break;
|
2014-12-16 23:28:38 +08:00
|
|
|
|
|
|
|
wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
|
2009-12-18 10:24:26 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-12-05 07:44:46 +08:00
|
|
|
ret = copy_event_to_user(group, kevent, buf, count);
|
2017-04-25 19:29:35 +08:00
|
|
|
if (unlikely(ret == -EOPENSTALE)) {
|
|
|
|
/*
|
|
|
|
* We cannot report events with stale fd so drop it.
|
|
|
|
* Setting ret to 0 will continue the event loop and
|
|
|
|
* do the right thing if there are no more events to
|
|
|
|
* read (i.e. return bytes read, -EAGAIN or wait).
|
|
|
|
*/
|
|
|
|
ret = 0;
|
|
|
|
}
|
|
|
|
|
2014-04-04 05:46:35 +08:00
|
|
|
/*
|
|
|
|
* Permission events get queued to wait for response. Other
|
|
|
|
* events can be destroyed now.
|
|
|
|
*/
|
2019-01-11 01:04:31 +08:00
|
|
|
if (!fanotify_is_perm_event(FANOTIFY_E(kevent)->mask)) {
|
2014-04-04 05:46:35 +08:00
|
|
|
fsnotify_destroy_event(group, kevent);
|
2014-04-04 05:46:36 +08:00
|
|
|
} else {
|
2017-04-25 19:29:35 +08:00
|
|
|
if (ret <= 0) {
|
2014-04-04 05:46:36 +08:00
|
|
|
FANOTIFY_PE(kevent)->response = FAN_DENY;
|
|
|
|
wake_up(&group->fanotify_data.access_waitq);
|
2017-04-25 19:29:35 +08:00
|
|
|
} else {
|
|
|
|
spin_lock(&group->notification_lock);
|
|
|
|
list_add_tail(&kevent->list,
|
|
|
|
&group->fanotify_data.access_list);
|
|
|
|
spin_unlock(&group->notification_lock);
|
2014-04-04 05:46:36 +08:00
|
|
|
}
|
|
|
|
}
|
2017-04-25 19:29:35 +08:00
|
|
|
if (ret < 0)
|
|
|
|
break;
|
2014-04-04 05:46:35 +08:00
|
|
|
buf += ret;
|
|
|
|
count -= ret;
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
2014-12-16 23:28:38 +08:00
|
|
|
remove_wait_queue(&group->notification_waitq, &wait);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
if (start != buf && ret != -EFAULT)
|
|
|
|
ret = buf - start;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:34 +08:00
|
|
|
static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
|
|
|
|
{
|
|
|
|
struct fanotify_response response = { .fd = -1, .response = -1 };
|
|
|
|
struct fsnotify_group *group;
|
|
|
|
int ret;
|
|
|
|
|
2017-10-31 04:14:56 +08:00
|
|
|
if (!IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2009-12-18 10:24:34 +08:00
|
|
|
group = file->private_data;
|
|
|
|
|
|
|
|
if (count > sizeof(response))
|
|
|
|
count = sizeof(response);
|
|
|
|
|
|
|
|
pr_debug("%s: group=%p count=%zu\n", __func__, group, count);
|
|
|
|
|
|
|
|
if (copy_from_user(&response, buf, count))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
ret = process_access_response(group, &response);
|
|
|
|
if (ret < 0)
|
|
|
|
count = ret;
|
|
|
|
|
|
|
|
return count;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
static int fanotify_release(struct inode *ignored, struct file *file)
|
|
|
|
{
|
|
|
|
struct fsnotify_group *group = file->private_data;
|
2019-01-11 01:04:32 +08:00
|
|
|
struct fanotify_perm_event *event, *next;
|
2016-09-20 05:44:30 +08:00
|
|
|
struct fsnotify_event *fsn_event;
|
2010-10-29 05:21:59 +08:00
|
|
|
|
2014-08-07 07:03:28 +08:00
|
|
|
/*
|
2016-09-20 05:44:30 +08:00
|
|
|
* Stop new events from arriving in the notification queue. since
|
|
|
|
* userspace cannot use fanotify fd anymore, no event can enter or
|
|
|
|
* leave access_list by now either.
|
2014-08-07 07:03:28 +08:00
|
|
|
*/
|
2016-09-20 05:44:30 +08:00
|
|
|
fsnotify_group_stop_queueing(group);
|
2010-08-19 00:25:50 +08:00
|
|
|
|
2016-09-20 05:44:30 +08:00
|
|
|
/*
|
|
|
|
* Process all permission events on access_list and notification queue
|
|
|
|
* and simulate reply from userspace.
|
|
|
|
*/
|
2016-10-08 07:56:55 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2014-04-04 05:46:33 +08:00
|
|
|
list_for_each_entry_safe(event, next, &group->fanotify_data.access_list,
|
|
|
|
fae.fse.list) {
|
|
|
|
pr_debug("%s: found group=%p event=%p\n", __func__, group,
|
|
|
|
event);
|
2010-08-19 00:25:50 +08:00
|
|
|
|
2014-04-04 05:46:33 +08:00
|
|
|
list_del_init(&event->fae.fse.list);
|
|
|
|
event->response = FAN_ALLOW;
|
2010-08-19 00:25:50 +08:00
|
|
|
}
|
|
|
|
|
2014-08-07 07:03:28 +08:00
|
|
|
/*
|
2016-09-20 05:44:30 +08:00
|
|
|
* Destroy all non-permission events. For permission events just
|
|
|
|
* dequeue them and set the response. They will be freed once the
|
|
|
|
* response is consumed and fanotify_get_response() returns.
|
2014-08-07 07:03:28 +08:00
|
|
|
*/
|
2016-09-20 05:44:30 +08:00
|
|
|
while (!fsnotify_notify_queue_is_empty(group)) {
|
|
|
|
fsn_event = fsnotify_remove_first_event(group);
|
2019-01-11 01:04:31 +08:00
|
|
|
if (!(FANOTIFY_E(fsn_event)->mask & FANOTIFY_PERM_EVENTS)) {
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2016-09-20 05:44:30 +08:00
|
|
|
fsnotify_destroy_event(group, fsn_event);
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2017-10-31 04:14:56 +08:00
|
|
|
} else {
|
2016-09-20 05:44:30 +08:00
|
|
|
FANOTIFY_PE(fsn_event)->response = FAN_ALLOW;
|
2017-10-31 04:14:56 +08:00
|
|
|
}
|
2016-09-20 05:44:30 +08:00
|
|
|
}
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2016-09-20 05:44:30 +08:00
|
|
|
|
|
|
|
/* Response for all permission events it set, wakeup waiters */
|
2010-08-19 00:25:50 +08:00
|
|
|
wake_up(&group->fanotify_data.access_waitq);
|
2011-10-15 05:43:39 +08:00
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
/* matches the fanotify_init->fsnotify_alloc_group */
|
2011-06-14 23:29:45 +08:00
|
|
|
fsnotify_destroy_group(group);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
static long fanotify_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
|
|
|
|
{
|
|
|
|
struct fsnotify_group *group;
|
2014-01-22 07:48:14 +08:00
|
|
|
struct fsnotify_event *fsn_event;
|
2009-12-18 10:24:26 +08:00
|
|
|
void __user *p;
|
|
|
|
int ret = -ENOTTY;
|
|
|
|
size_t send_len = 0;
|
|
|
|
|
|
|
|
group = file->private_data;
|
|
|
|
|
|
|
|
p = (void __user *) arg;
|
|
|
|
|
|
|
|
switch (cmd) {
|
|
|
|
case FIONREAD:
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_lock(&group->notification_lock);
|
2014-01-22 07:48:14 +08:00
|
|
|
list_for_each_entry(fsn_event, &group->notification_list, list)
|
2009-12-18 10:24:26 +08:00
|
|
|
send_len += FAN_EVENT_METADATA_LEN;
|
2016-10-08 07:56:52 +08:00
|
|
|
spin_unlock(&group->notification_lock);
|
2009-12-18 10:24:26 +08:00
|
|
|
ret = put_user(send_len, (int __user *) p);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
static const struct file_operations fanotify_fops = {
|
2012-12-18 08:05:12 +08:00
|
|
|
.show_fdinfo = fanotify_show_fdinfo,
|
2009-12-18 10:24:26 +08:00
|
|
|
.poll = fanotify_poll,
|
|
|
|
.read = fanotify_read,
|
2009-12-18 10:24:34 +08:00
|
|
|
.write = fanotify_write,
|
2009-12-18 10:24:26 +08:00
|
|
|
.fasync = NULL,
|
|
|
|
.release = fanotify_release,
|
2009-12-18 10:24:26 +08:00
|
|
|
.unlocked_ioctl = fanotify_ioctl,
|
|
|
|
.compat_ioctl = fanotify_ioctl,
|
llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-08-16 00:52:59 +08:00
|
|
|
.llseek = noop_llseek,
|
2009-12-18 10:24:26 +08:00
|
|
|
};
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
static int fanotify_find_path(int dfd, const char __user *filename,
|
|
|
|
struct path *path, unsigned int flags)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
pr_debug("%s: dfd=%d filename=%p flags=%x\n", __func__,
|
|
|
|
dfd, filename, flags);
|
|
|
|
|
|
|
|
if (filename == NULL) {
|
2012-08-29 00:52:22 +08:00
|
|
|
struct fd f = fdget(dfd);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
ret = -EBADF;
|
2012-08-29 00:52:22 +08:00
|
|
|
if (!f.file)
|
2009-12-18 10:24:26 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = -ENOTDIR;
|
|
|
|
if ((flags & FAN_MARK_ONLYDIR) &&
|
2013-01-24 06:07:38 +08:00
|
|
|
!(S_ISDIR(file_inode(f.file)->i_mode))) {
|
2012-08-29 00:52:22 +08:00
|
|
|
fdput(f);
|
2009-12-18 10:24:26 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2012-08-29 00:52:22 +08:00
|
|
|
*path = f.file->f_path;
|
2009-12-18 10:24:26 +08:00
|
|
|
path_get(path);
|
2012-08-29 00:52:22 +08:00
|
|
|
fdput(f);
|
2009-12-18 10:24:26 +08:00
|
|
|
} else {
|
|
|
|
unsigned int lookup_flags = 0;
|
|
|
|
|
|
|
|
if (!(flags & FAN_MARK_DONT_FOLLOW))
|
|
|
|
lookup_flags |= LOOKUP_FOLLOW;
|
|
|
|
if (flags & FAN_MARK_ONLYDIR)
|
|
|
|
lookup_flags |= LOOKUP_DIRECTORY;
|
|
|
|
|
|
|
|
ret = user_path_at(dfd, filename, lookup_flags, path);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* you can only watch an inode if you have read permissions on it */
|
|
|
|
ret = inode_permission(path->dentry->d_inode, MAY_READ);
|
|
|
|
if (ret)
|
|
|
|
path_put(path);
|
|
|
|
out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:33 +08:00
|
|
|
static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
|
|
|
|
__u32 mask,
|
2011-06-14 23:29:49 +08:00
|
|
|
unsigned int flags,
|
|
|
|
int *destroy)
|
2009-12-18 10:24:28 +08:00
|
|
|
{
|
2015-02-11 06:08:24 +08:00
|
|
|
__u32 oldmask = 0;
|
2009-12-18 10:24:28 +08:00
|
|
|
|
|
|
|
spin_lock(&fsn_mark->lock);
|
2009-12-18 10:24:33 +08:00
|
|
|
if (!(flags & FAN_MARK_IGNORED_MASK)) {
|
|
|
|
oldmask = fsn_mark->mask;
|
2018-10-04 05:25:34 +08:00
|
|
|
fsn_mark->mask &= ~mask;
|
2009-12-18 10:24:33 +08:00
|
|
|
} else {
|
2018-10-04 05:25:34 +08:00
|
|
|
fsn_mark->ignored_mask &= ~mask;
|
2009-12-18 10:24:33 +08:00
|
|
|
}
|
2015-02-11 06:08:21 +08:00
|
|
|
*destroy = !(fsn_mark->mask | fsn_mark->ignored_mask);
|
2009-12-18 10:24:28 +08:00
|
|
|
spin_unlock(&fsn_mark->lock);
|
|
|
|
|
|
|
|
return mask & oldmask;
|
|
|
|
}
|
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
static int fanotify_remove_mark(struct fsnotify_group *group,
|
|
|
|
fsnotify_connp_t *connp, __u32 mask,
|
|
|
|
unsigned int flags)
|
2009-12-18 10:24:28 +08:00
|
|
|
{
|
|
|
|
struct fsnotify_mark *fsn_mark = NULL;
|
2009-12-18 10:24:28 +08:00
|
|
|
__u32 removed;
|
2011-06-14 23:29:49 +08:00
|
|
|
int destroy_mark;
|
2009-12-18 10:24:28 +08:00
|
|
|
|
2013-07-09 06:59:42 +08:00
|
|
|
mutex_lock(&group->mark_mutex);
|
2018-06-23 22:54:51 +08:00
|
|
|
fsn_mark = fsnotify_find_mark(connp, group);
|
2013-07-09 06:59:42 +08:00
|
|
|
if (!fsn_mark) {
|
|
|
|
mutex_unlock(&group->mark_mutex);
|
2009-12-18 10:24:29 +08:00
|
|
|
return -ENOENT;
|
2013-07-09 06:59:42 +08:00
|
|
|
}
|
2009-12-18 10:24:28 +08:00
|
|
|
|
2011-06-14 23:29:49 +08:00
|
|
|
removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
|
|
|
|
&destroy_mark);
|
2018-06-23 22:54:50 +08:00
|
|
|
if (removed & fsnotify_conn_mask(fsn_mark->connector))
|
|
|
|
fsnotify_recalc_mask(fsn_mark->connector);
|
2011-06-14 23:29:49 +08:00
|
|
|
if (destroy_mark)
|
2015-09-05 06:43:12 +08:00
|
|
|
fsnotify_detach_mark(fsn_mark);
|
2013-07-09 06:59:42 +08:00
|
|
|
mutex_unlock(&group->mark_mutex);
|
2015-09-05 06:43:12 +08:00
|
|
|
if (destroy_mark)
|
|
|
|
fsnotify_free_mark(fsn_mark);
|
2011-06-14 23:29:49 +08:00
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
/* matches the fsnotify_find_mark() */
|
2009-12-18 10:24:29 +08:00
|
|
|
fsnotify_put_mark(fsn_mark);
|
|
|
|
return 0;
|
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
|
|
|
|
struct vfsmount *mnt, __u32 mask,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return fanotify_remove_mark(group, &real_mount(mnt)->mnt_fsnotify_marks,
|
|
|
|
mask, flags);
|
|
|
|
}
|
|
|
|
|
2018-09-01 15:41:13 +08:00
|
|
|
static int fanotify_remove_sb_mark(struct fsnotify_group *group,
|
|
|
|
struct super_block *sb, __u32 mask,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return fanotify_remove_mark(group, &sb->s_fsnotify_marks, mask, flags);
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:29 +08:00
|
|
|
static int fanotify_remove_inode_mark(struct fsnotify_group *group,
|
2009-12-18 10:24:33 +08:00
|
|
|
struct inode *inode, __u32 mask,
|
|
|
|
unsigned int flags)
|
2009-12-18 10:24:29 +08:00
|
|
|
{
|
2018-06-23 22:54:51 +08:00
|
|
|
return fanotify_remove_mark(group, &inode->i_fsnotify_marks, mask,
|
|
|
|
flags);
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:33 +08:00
|
|
|
static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
|
|
|
|
__u32 mask,
|
|
|
|
unsigned int flags)
|
2009-12-18 10:24:28 +08:00
|
|
|
{
|
2010-10-29 05:21:59 +08:00
|
|
|
__u32 oldmask = -1;
|
2009-12-18 10:24:28 +08:00
|
|
|
|
|
|
|
spin_lock(&fsn_mark->lock);
|
2009-12-18 10:24:33 +08:00
|
|
|
if (!(flags & FAN_MARK_IGNORED_MASK)) {
|
|
|
|
oldmask = fsn_mark->mask;
|
2018-10-04 05:25:34 +08:00
|
|
|
fsn_mark->mask |= mask;
|
2009-12-18 10:24:33 +08:00
|
|
|
} else {
|
2018-10-04 05:25:34 +08:00
|
|
|
fsn_mark->ignored_mask |= mask;
|
2009-12-18 10:24:33 +08:00
|
|
|
if (flags & FAN_MARK_IGNORED_SURV_MODIFY)
|
|
|
|
fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY;
|
2009-12-18 10:24:33 +08:00
|
|
|
}
|
2009-12-18 10:24:28 +08:00
|
|
|
spin_unlock(&fsn_mark->lock);
|
|
|
|
|
|
|
|
return mask & ~oldmask;
|
|
|
|
}
|
|
|
|
|
2013-07-09 06:59:43 +08:00
|
|
|
static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
|
2018-06-23 22:54:48 +08:00
|
|
|
fsnotify_connp_t *connp,
|
|
|
|
unsigned int type)
|
2013-07-09 06:59:43 +08:00
|
|
|
{
|
|
|
|
struct fsnotify_mark *mark;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
|
|
|
|
mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
|
|
|
|
if (!mark)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2016-12-22 01:06:12 +08:00
|
|
|
fsnotify_init_mark(mark, group);
|
2018-06-23 22:54:48 +08:00
|
|
|
ret = fsnotify_add_mark_locked(mark, connp, type, 0);
|
2013-07-09 06:59:43 +08:00
|
|
|
if (ret) {
|
|
|
|
fsnotify_put_mark(mark);
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
return mark;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
static int fanotify_add_mark(struct fsnotify_group *group,
|
|
|
|
fsnotify_connp_t *connp, unsigned int type,
|
|
|
|
__u32 mask, unsigned int flags)
|
2009-12-18 10:24:26 +08:00
|
|
|
{
|
|
|
|
struct fsnotify_mark *fsn_mark;
|
2009-12-18 10:24:28 +08:00
|
|
|
__u32 added;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2013-07-09 06:59:42 +08:00
|
|
|
mutex_lock(&group->mark_mutex);
|
2018-06-23 22:54:48 +08:00
|
|
|
fsn_mark = fsnotify_find_mark(connp, group);
|
2009-12-18 10:24:28 +08:00
|
|
|
if (!fsn_mark) {
|
2018-06-23 22:54:51 +08:00
|
|
|
fsn_mark = fanotify_add_new_mark(group, connp, type);
|
2013-07-09 06:59:43 +08:00
|
|
|
if (IS_ERR(fsn_mark)) {
|
2013-07-09 06:59:42 +08:00
|
|
|
mutex_unlock(&group->mark_mutex);
|
2013-07-09 06:59:43 +08:00
|
|
|
return PTR_ERR(fsn_mark);
|
2013-07-09 06:59:42 +08:00
|
|
|
}
|
2009-12-18 10:24:28 +08:00
|
|
|
}
|
2009-12-18 10:24:33 +08:00
|
|
|
added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
|
2018-06-23 22:54:50 +08:00
|
|
|
if (added & ~fsnotify_conn_mask(fsn_mark->connector))
|
|
|
|
fsnotify_recalc_mask(fsn_mark->connector);
|
2016-12-14 20:53:46 +08:00
|
|
|
mutex_unlock(&group->mark_mutex);
|
2013-07-09 06:59:43 +08:00
|
|
|
|
2010-11-10 01:18:16 +08:00
|
|
|
fsnotify_put_mark(fsn_mark);
|
2013-07-09 06:59:43 +08:00
|
|
|
return 0;
|
2009-12-18 10:24:28 +08:00
|
|
|
}
|
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
|
|
|
|
struct vfsmount *mnt, __u32 mask,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return fanotify_add_mark(group, &real_mount(mnt)->mnt_fsnotify_marks,
|
|
|
|
FSNOTIFY_OBJ_TYPE_VFSMOUNT, mask, flags);
|
|
|
|
}
|
|
|
|
|
2018-09-01 15:41:13 +08:00
|
|
|
static int fanotify_add_sb_mark(struct fsnotify_group *group,
|
|
|
|
struct super_block *sb, __u32 mask,
|
|
|
|
unsigned int flags)
|
|
|
|
{
|
|
|
|
return fanotify_add_mark(group, &sb->s_fsnotify_marks,
|
|
|
|
FSNOTIFY_OBJ_TYPE_SB, mask, flags);
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:28 +08:00
|
|
|
static int fanotify_add_inode_mark(struct fsnotify_group *group,
|
2009-12-18 10:24:33 +08:00
|
|
|
struct inode *inode, __u32 mask,
|
|
|
|
unsigned int flags)
|
2009-12-18 10:24:28 +08:00
|
|
|
{
|
|
|
|
pr_debug("%s: group=%p inode=%p\n", __func__, group, inode);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2010-10-29 05:21:57 +08:00
|
|
|
/*
|
|
|
|
* If some other task has this inode open for write we should not add
|
|
|
|
* an ignored mark, unless that ignored mark is supposed to survive
|
|
|
|
* modification changes anyway.
|
|
|
|
*/
|
|
|
|
if ((flags & FAN_MARK_IGNORED_MASK) &&
|
|
|
|
!(flags & FAN_MARK_IGNORED_SURV_MODIFY) &&
|
2018-12-11 16:27:23 +08:00
|
|
|
inode_is_open_for_write(inode))
|
2010-10-29 05:21:57 +08:00
|
|
|
return 0;
|
|
|
|
|
2018-06-23 22:54:51 +08:00
|
|
|
return fanotify_add_mark(group, &inode->i_fsnotify_marks,
|
|
|
|
FSNOTIFY_OBJ_TYPE_INODE, mask, flags);
|
2009-12-18 10:24:28 +08:00
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
/* fanotify syscalls */
|
2010-05-27 21:41:40 +08:00
|
|
|
SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
|
2009-12-18 10:24:25 +08:00
|
|
|
{
|
2009-12-18 10:24:26 +08:00
|
|
|
struct fsnotify_group *group;
|
|
|
|
int f_flags, fd;
|
2010-10-29 05:21:58 +08:00
|
|
|
struct user_struct *user;
|
2019-01-11 01:04:32 +08:00
|
|
|
struct fanotify_event *oevent;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2018-09-22 02:20:30 +08:00
|
|
|
pr_debug("%s: flags=%x event_f_flags=%x\n",
|
|
|
|
__func__, flags, event_f_flags);
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
if (!capable(CAP_SYS_ADMIN))
|
2010-08-24 18:58:54 +08:00
|
|
|
return -EPERM;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2017-10-03 08:21:39 +08:00
|
|
|
#ifdef CONFIG_AUDITSYSCALL
|
2018-10-04 05:25:35 +08:00
|
|
|
if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT))
|
2017-10-03 08:21:39 +08:00
|
|
|
#else
|
2018-10-04 05:25:35 +08:00
|
|
|
if (flags & ~FANOTIFY_INIT_FLAGS)
|
2017-10-03 08:21:39 +08:00
|
|
|
#endif
|
2009-12-18 10:24:26 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
fanotify: check file flags passed in fanotify_init
Without this patch fanotify_init does not validate the value passed in
event_f_flags.
When a fanotify event is read from the fanotify file descriptor a new
file descriptor is created where file.f_flags = event_f_flags.
Internal and external open flags are stored together in field f_flags of
struct file. Hence, an application might create file descriptors with
internal flags like FMODE_EXEC, FMODE_NOCMTIME set.
Jan Kara and Eric Paris both aggreed that this is a bug and the value of
event_f_flags should be checked:
https://lkml.org/lkml/2014/4/29/522
https://lkml.org/lkml/2014/4/29/539
This updated patch version considers the comments by Michael Kerrisk in
https://lkml.org/lkml/2014/5/4/10
With the patch the value of event_f_flags is checked.
When specifying an invalid value error EINVAL is returned.
Internal flags are disallowed.
File creation flags are disallowed:
O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT.
Flags which do not make sense with fanotify are disallowed:
__O_TMPFILE, O_PATH, FASYNC, and O_DIRECT.
This leaves us with the following allowed values:
O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the
bits given by O_ACCMODE.
O_APPEND is working as expected. The value might be useful in a logging
application which appends the current status each time the log is opened.
O_LARGEFILE is needed for files exceeding 4GB on 32bit systems.
O_NONBLOCK may be useful when monitoring slow devices like tapes.
O_NDELAY is equal to O_NONBLOCK except for platform parisc.
To avoid code breaking on parisc either both flags should be
allowed or none. The patch allows both.
__O_SYNC and O_DSYNC may be used to avoid data loss on power disruption.
O_NOATIME may be useful to reduce disk activity.
O_CLOEXEC may be useful, if separate processes shall be used to scan files.
Once this patch is accepted, the fanotify_init.2 manpage has to be updated.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 07:05:44 +08:00
|
|
|
if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
switch (event_f_flags & O_ACCMODE) {
|
|
|
|
case O_RDONLY:
|
|
|
|
case O_RDWR:
|
|
|
|
case O_WRONLY:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2010-10-29 05:21:58 +08:00
|
|
|
user = get_current_user();
|
|
|
|
if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) {
|
|
|
|
free_uid(user);
|
|
|
|
return -EMFILE;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:34 +08:00
|
|
|
f_flags = O_RDWR | FMODE_NONOTIFY;
|
2009-12-18 10:24:26 +08:00
|
|
|
if (flags & FAN_CLOEXEC)
|
|
|
|
f_flags |= O_CLOEXEC;
|
|
|
|
if (flags & FAN_NONBLOCK)
|
|
|
|
f_flags |= O_NONBLOCK;
|
|
|
|
|
|
|
|
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
|
|
|
|
group = fsnotify_alloc_group(&fanotify_fsnotify_ops);
|
2010-11-24 12:48:26 +08:00
|
|
|
if (IS_ERR(group)) {
|
|
|
|
free_uid(user);
|
2009-12-18 10:24:26 +08:00
|
|
|
return PTR_ERR(group);
|
2010-11-24 12:48:26 +08:00
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2010-10-29 05:21:58 +08:00
|
|
|
group->fanotify_data.user = user;
|
2018-09-22 02:20:30 +08:00
|
|
|
group->fanotify_data.flags = flags;
|
2010-10-29 05:21:58 +08:00
|
|
|
atomic_inc(&user->fanotify_listeners);
|
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 06:46:39 +08:00
|
|
|
group->memcg = get_mem_cgroup_from_mm(current->mm);
|
2010-10-29 05:21:58 +08:00
|
|
|
|
2018-02-21 21:10:59 +08:00
|
|
|
oevent = fanotify_alloc_event(group, NULL, FS_Q_OVERFLOW, NULL);
|
2014-02-22 02:14:11 +08:00
|
|
|
if (unlikely(!oevent)) {
|
|
|
|
fd = -ENOMEM;
|
|
|
|
goto out_destroy_group;
|
|
|
|
}
|
|
|
|
group->overflow_event = &oevent->fse;
|
|
|
|
|
2014-05-07 03:50:10 +08:00
|
|
|
if (force_o_largefile())
|
|
|
|
event_f_flags |= O_LARGEFILE;
|
2010-07-28 22:18:37 +08:00
|
|
|
group->fanotify_data.f_flags = event_f_flags;
|
2009-12-18 10:24:34 +08:00
|
|
|
init_waitqueue_head(&group->fanotify_data.access_waitq);
|
|
|
|
INIT_LIST_HEAD(&group->fanotify_data.access_list);
|
2018-10-04 05:25:35 +08:00
|
|
|
switch (flags & FANOTIFY_CLASS_BITS) {
|
2010-10-29 05:21:56 +08:00
|
|
|
case FAN_CLASS_NOTIF:
|
|
|
|
group->priority = FS_PRIO_0;
|
|
|
|
break;
|
|
|
|
case FAN_CLASS_CONTENT:
|
|
|
|
group->priority = FS_PRIO_1;
|
|
|
|
break;
|
|
|
|
case FAN_CLASS_PRE_CONTENT:
|
|
|
|
group->priority = FS_PRIO_2;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
fd = -EINVAL;
|
2011-06-14 23:29:45 +08:00
|
|
|
goto out_destroy_group;
|
2010-10-29 05:21:56 +08:00
|
|
|
}
|
2009-12-18 10:24:34 +08:00
|
|
|
|
2010-10-29 05:21:57 +08:00
|
|
|
if (flags & FAN_UNLIMITED_QUEUE) {
|
|
|
|
fd = -EPERM;
|
|
|
|
if (!capable(CAP_SYS_ADMIN))
|
2011-06-14 23:29:45 +08:00
|
|
|
goto out_destroy_group;
|
2010-10-29 05:21:57 +08:00
|
|
|
group->max_events = UINT_MAX;
|
|
|
|
} else {
|
|
|
|
group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS;
|
|
|
|
}
|
2010-10-29 05:21:57 +08:00
|
|
|
|
2010-10-29 05:21:58 +08:00
|
|
|
if (flags & FAN_UNLIMITED_MARKS) {
|
|
|
|
fd = -EPERM;
|
|
|
|
if (!capable(CAP_SYS_ADMIN))
|
2011-06-14 23:29:45 +08:00
|
|
|
goto out_destroy_group;
|
2010-10-29 05:21:58 +08:00
|
|
|
group->fanotify_data.max_marks = UINT_MAX;
|
|
|
|
} else {
|
|
|
|
group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS;
|
|
|
|
}
|
2010-10-29 05:21:57 +08:00
|
|
|
|
2017-10-03 08:21:39 +08:00
|
|
|
if (flags & FAN_ENABLE_AUDIT) {
|
|
|
|
fd = -EPERM;
|
|
|
|
if (!capable(CAP_AUDIT_WRITE))
|
|
|
|
goto out_destroy_group;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
fd = anon_inode_getfd("[fanotify]", &fanotify_fops, group, f_flags);
|
|
|
|
if (fd < 0)
|
2011-06-14 23:29:45 +08:00
|
|
|
goto out_destroy_group;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
return fd;
|
|
|
|
|
2011-06-14 23:29:45 +08:00
|
|
|
out_destroy_group:
|
|
|
|
fsnotify_destroy_group(group);
|
2009-12-18 10:24:26 +08:00
|
|
|
return fd;
|
2009-12-18 10:24:25 +08:00
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2018-03-17 22:06:11 +08:00
|
|
|
static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
|
|
|
|
int dfd, const char __user *pathname)
|
2009-12-18 10:24:26 +08:00
|
|
|
{
|
2009-12-18 10:24:29 +08:00
|
|
|
struct inode *inode = NULL;
|
|
|
|
struct vfsmount *mnt = NULL;
|
2009-12-18 10:24:26 +08:00
|
|
|
struct fsnotify_group *group;
|
2012-08-29 00:52:22 +08:00
|
|
|
struct fd f;
|
2009-12-18 10:24:26 +08:00
|
|
|
struct path path;
|
2018-10-04 05:25:37 +08:00
|
|
|
u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
|
2018-10-04 05:25:35 +08:00
|
|
|
unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
|
2012-08-29 00:52:22 +08:00
|
|
|
int ret;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
|
|
|
|
__func__, fanotify_fd, flags, dfd, pathname, mask);
|
|
|
|
|
|
|
|
/* we only use the lower 32 bits as of right now. */
|
|
|
|
if (mask & ((__u64)0xffffffff << 32))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-10-04 05:25:35 +08:00
|
|
|
if (flags & ~FANOTIFY_MARK_FLAGS)
|
2009-12-18 10:24:29 +08:00
|
|
|
return -EINVAL;
|
2018-09-01 15:41:13 +08:00
|
|
|
|
|
|
|
switch (mark_type) {
|
|
|
|
case FAN_MARK_INODE:
|
|
|
|
case FAN_MARK_MOUNT:
|
|
|
|
case FAN_MARK_FILESYSTEM:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:34 +08:00
|
|
|
switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) {
|
2010-11-23 01:46:33 +08:00
|
|
|
case FAN_MARK_ADD: /* fallthrough */
|
2009-12-18 10:24:29 +08:00
|
|
|
case FAN_MARK_REMOVE:
|
2010-11-23 01:46:33 +08:00
|
|
|
if (!mask)
|
|
|
|
return -EINVAL;
|
2014-06-05 07:05:43 +08:00
|
|
|
break;
|
2009-12-18 10:24:34 +08:00
|
|
|
case FAN_MARK_FLUSH:
|
2018-10-04 05:25:35 +08:00
|
|
|
if (flags & ~(FANOTIFY_MARK_TYPE_BITS | FAN_MARK_FLUSH))
|
2014-06-05 07:05:43 +08:00
|
|
|
return -EINVAL;
|
2009-12-18 10:24:29 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2010-10-29 05:21:59 +08:00
|
|
|
|
2017-10-31 04:14:56 +08:00
|
|
|
if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS))
|
2018-10-04 05:25:35 +08:00
|
|
|
valid_mask |= FANOTIFY_PERM_EVENTS;
|
2017-10-31 04:14:56 +08:00
|
|
|
|
|
|
|
if (mask & ~valid_mask)
|
2009-12-18 10:24:26 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2012-08-29 00:52:22 +08:00
|
|
|
f = fdget(fanotify_fd);
|
|
|
|
if (unlikely(!f.file))
|
2009-12-18 10:24:26 +08:00
|
|
|
return -EBADF;
|
|
|
|
|
|
|
|
/* verify that this is indeed an fanotify instance */
|
|
|
|
ret = -EINVAL;
|
2012-08-29 00:52:22 +08:00
|
|
|
if (unlikely(f.file->f_op != &fanotify_fops))
|
2009-12-18 10:24:26 +08:00
|
|
|
goto fput_and_out;
|
2012-08-29 00:52:22 +08:00
|
|
|
group = f.file->private_data;
|
2010-10-29 05:21:56 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF. These are not
|
|
|
|
* allowed to set permissions events.
|
|
|
|
*/
|
|
|
|
ret = -EINVAL;
|
2018-10-04 05:25:35 +08:00
|
|
|
if (mask & FANOTIFY_PERM_EVENTS &&
|
2010-10-29 05:21:56 +08:00
|
|
|
group->priority == FS_PRIO_0)
|
|
|
|
goto fput_and_out;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
2014-06-05 07:05:40 +08:00
|
|
|
if (flags & FAN_MARK_FLUSH) {
|
|
|
|
ret = 0;
|
2018-09-01 15:41:13 +08:00
|
|
|
if (mark_type == FAN_MARK_MOUNT)
|
2014-06-05 07:05:40 +08:00
|
|
|
fsnotify_clear_vfsmount_marks_by_group(group);
|
2018-09-01 15:41:13 +08:00
|
|
|
else if (mark_type == FAN_MARK_FILESYSTEM)
|
|
|
|
fsnotify_clear_sb_marks_by_group(group);
|
2014-06-05 07:05:40 +08:00
|
|
|
else
|
|
|
|
fsnotify_clear_inode_marks_by_group(group);
|
|
|
|
goto fput_and_out;
|
|
|
|
}
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
ret = fanotify_find_path(dfd, pathname, &path, flags);
|
|
|
|
if (ret)
|
|
|
|
goto fput_and_out;
|
|
|
|
|
|
|
|
/* inode held in place by reference to path; group by fget on fd */
|
2018-09-01 15:41:13 +08:00
|
|
|
if (mark_type == FAN_MARK_INODE)
|
2009-12-18 10:24:29 +08:00
|
|
|
inode = path.dentry->d_inode;
|
|
|
|
else
|
|
|
|
mnt = path.mnt;
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
/* create/update an inode mark */
|
2014-06-05 07:05:40 +08:00
|
|
|
switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) {
|
2009-12-18 10:24:28 +08:00
|
|
|
case FAN_MARK_ADD:
|
2018-09-01 15:41:13 +08:00
|
|
|
if (mark_type == FAN_MARK_MOUNT)
|
2009-12-18 10:24:33 +08:00
|
|
|
ret = fanotify_add_vfsmount_mark(group, mnt, mask, flags);
|
2018-09-01 15:41:13 +08:00
|
|
|
else if (mark_type == FAN_MARK_FILESYSTEM)
|
|
|
|
ret = fanotify_add_sb_mark(group, mnt->mnt_sb, mask, flags);
|
2009-12-18 10:24:29 +08:00
|
|
|
else
|
2009-12-18 10:24:33 +08:00
|
|
|
ret = fanotify_add_inode_mark(group, inode, mask, flags);
|
2009-12-18 10:24:28 +08:00
|
|
|
break;
|
|
|
|
case FAN_MARK_REMOVE:
|
2018-09-01 15:41:13 +08:00
|
|
|
if (mark_type == FAN_MARK_MOUNT)
|
2009-12-18 10:24:33 +08:00
|
|
|
ret = fanotify_remove_vfsmount_mark(group, mnt, mask, flags);
|
2018-09-01 15:41:13 +08:00
|
|
|
else if (mark_type == FAN_MARK_FILESYSTEM)
|
|
|
|
ret = fanotify_remove_sb_mark(group, mnt->mnt_sb, mask, flags);
|
2009-12-18 10:24:29 +08:00
|
|
|
else
|
2009-12-18 10:24:33 +08:00
|
|
|
ret = fanotify_remove_inode_mark(group, inode, mask, flags);
|
2009-12-18 10:24:28 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
ret = -EINVAL;
|
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
path_put(&path);
|
|
|
|
fput_and_out:
|
2012-08-29 00:52:22 +08:00
|
|
|
fdput(f);
|
2009-12-18 10:24:26 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-03-17 22:06:11 +08:00
|
|
|
SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
|
|
|
|
__u64, mask, int, dfd,
|
|
|
|
const char __user *, pathname)
|
|
|
|
{
|
|
|
|
return do_fanotify_mark(fanotify_fd, flags, mask, dfd, pathname);
|
|
|
|
}
|
|
|
|
|
2013-03-06 09:10:59 +08:00
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
COMPAT_SYSCALL_DEFINE6(fanotify_mark,
|
|
|
|
int, fanotify_fd, unsigned int, flags,
|
|
|
|
__u32, mask0, __u32, mask1, int, dfd,
|
|
|
|
const char __user *, pathname)
|
|
|
|
{
|
2018-03-17 22:06:11 +08:00
|
|
|
return do_fanotify_mark(fanotify_fd, flags,
|
2013-03-06 09:10:59 +08:00
|
|
|
#ifdef __BIG_ENDIAN
|
|
|
|
((__u64)mask0 << 32) | mask1,
|
2014-01-28 09:07:19 +08:00
|
|
|
#else
|
|
|
|
((__u64)mask1 << 32) | mask0,
|
2013-03-06 09:10:59 +08:00
|
|
|
#endif
|
|
|
|
dfd, pathname);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2009-12-18 10:24:26 +08:00
|
|
|
/*
|
2011-03-01 22:06:02 +08:00
|
|
|
* fanotify_user_setup - Our initialization function. Note that we cannot return
|
2009-12-18 10:24:26 +08:00
|
|
|
* error because we have compiled-in VFS hooks. So an (unlikely) failure here
|
|
|
|
* must result in panic().
|
|
|
|
*/
|
|
|
|
static int __init fanotify_user_setup(void)
|
|
|
|
{
|
2018-10-04 05:25:38 +08:00
|
|
|
BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 7);
|
2018-10-04 05:25:37 +08:00
|
|
|
BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
|
|
|
|
|
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8.
The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.
The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.
The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.
To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.
This patch (of 2):
A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.
However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.
There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.
The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.
The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.
This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.
[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-18 06:46:39 +08:00
|
|
|
fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
|
|
|
|
SLAB_PANIC|SLAB_ACCOUNT);
|
2019-01-11 01:04:32 +08:00
|
|
|
fanotify_event_cachep = KMEM_CACHE(fanotify_event, SLAB_PANIC);
|
2017-10-31 04:14:56 +08:00
|
|
|
if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS)) {
|
|
|
|
fanotify_perm_event_cachep =
|
2019-01-11 01:04:32 +08:00
|
|
|
KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
|
2017-10-31 04:14:56 +08:00
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
|
|
|
|
return 0;
|
2009-12-18 10:24:26 +08:00
|
|
|
}
|
2009-12-18 10:24:26 +08:00
|
|
|
device_initcall(fanotify_user_setup);
|