Pointer to ->free_mark callback unnecessarily occupies one long in each
fsnotify_mark although they are the same for all marks from one
notification group. Move the callback pointer to fsnotify_ops.
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pass fsnotify_iter_info into ->handle_event() handler so that it can
release and reacquire SRCU lock via fsnotify_prepare_user_wait() and
fsnotify_finish_user_wait() functions. These functions also make sure
current marks are appropriately pinned so that iteration protected by
srcu in fsnotify() stays safe.
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Printing inode pointers in warnings has dubious value and with future
changes we won't be able to easily get them without either locking or
chances we oops along the way. So just remove inode pointers from the
warning messages.
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.
Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patchset converts inotify to using the newly introduced
per-userns sysctl infrastructure.
Currently the inotify instances/watches are being accounted in the
user_struct structure. This means that in setups where multiple
users in unprivileged containers map to the same underlying
real user (i.e. pointing to the same user_struct) the inotify limits
are going to be shared as well, allowing one user(or application) to exhaust
all others limits.
Fix this by switching the inotify sysctls to using the
per-namespace/per-user limits. This will allow the server admin to
set sensible global limits, which can further be tuned inside every
individual user namespace. Additionally, in order to preserve the
sysctl ABI make the existing inotify instances/watches sysctls
modify the values of the initial user namespace.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
There's a lot of common code in inode and mount marks handling. Factor it
out to a common helper function.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some failure paths we may attempt to free user context even if it
wasn't assigned yet. This will cause a NULL ptr deref and a kernel BUG.
The path I was looking at is in inotify_new_group():
oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL);
if (unlikely(!oevent)) {
fsnotify_destroy_group(group);
return ERR_PTR(-ENOMEM);
}
fsnotify_destroy_group() would get called here, but
group->inotify_data.user is only getting assigned later:
group->inotify_data.user = get_current_user();
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename fsnotify_add_notify_event() to fsnotify_add_event() since the
"notify" part is duplicit. Rename fsnotify_remove_notify_event() and
fsnotify_peek_notify_event() to fsnotify_remove_first_event() and
fsnotify_peek_first_event() respectively since "notify" part is duplicit
and they really look at the first event in the queue.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My rework of handling of notification events (namely commit 7053aee26a
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.
Fix the problem by passing the cookie value properly.
Fixes: 7053aee26a
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The event returned from fsnotify_add_notify_event() cannot ever be used
safely as the event may be freed by the time the function returns (after
dropping notification_mutex). So change the prototype to just return
whether the event was added or merged into some existing event.
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Jan Kara <jack@suse.cz>
After removing event structure creation from the generic layer there is
no reason for separate .should_send_event and .handle_event callbacks.
So just remove the first one.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups. This is done so that we save memory when several notification
groups are interested in the event. However the need for event
structure shared between inotify & fanotify bloats the event structure
so the result is often higher memory consumption.
Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors
with its events. This has the undesirable effect that filesystem cannot
be unmounted while there are outstanding events - a regression for
inotify compared to a situation before it was converted to fsnotify
framework. For fanotify this problem is hard to avoid and users of
fanotify should kind of expect this behavior when they ask for file
descriptors from notified files.
This patch changes fsnotify and its users to create separate event
structure for each group. This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures. For example
on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data. After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups
interested in the events (both of which are uncommon). Fanotify event
fits in 56 bytes after the conversion (fanotify doesn't care about file
names so its events don't have to have it allocated). A win unless
there are four or more fanotify groups interested in the event.
The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.
[hughd@google.com: fanotify: fix corruption preventing startup]
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idr_destroy() can destroy idr by itself and idr_remove_all() is being
deprecated. Drop its usage.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
but pass the group itself as an additional parameter to the function.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Get a group ref for each mark that is added to the groups list and release that
ref when the mark is freed in fsnotify_put_mark().
We also use get a group reference for duplicated marks and for private event
data.
Now we dont free a group any more when the number of marks becomes 0 but when
the groups ref count does. Since this will only happen when all marks are removed
from a groups mark list, we dont have to set the groups number of marks to 1 at
group creation.
Beside clearing all marks in fsnotify_destroy_group() we do also flush the
groups event queue. This is since events may hold references to groups (due to
private event data) and we have to put those references first before we get a
chance to put the final ref, which will result in a call to
fsnotify_final_destroy_group().
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
On an error path in inotify_init1 a normal user can trigger a double
free of struct user. This is a regression introduced by a2ae4cc9a1
("inotify: stop kernel memory leak on file creation failure").
We fix this by making sure that if a group exists the user reference is
dropped when the group is cleaned up. We should not explictly drop the
reference on error and also drop the reference when the group is cleaned
up.
The new lifetime rules are that an inotify group lives from
inotify_new_group to the last fsnotify_put_group. Since the struct user
and inotify_devs are directly tied to this lifetime they are only
changed/updated in those two locations. We get rid of all special
casing of struct user or user->inotify_devs.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@kernel.org (2.6.37 and up)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 3bcf3860a4 (and the
accompanying commit c1e5c95402 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).
The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.
Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fanotify currently, when given a vfsmount_mark will look up (if it exists)
the corresponding inode mark. This patch drops that lookup and uses the
mark provided.
Signed-off-by: Eric Paris <eparis@redhat.com>
should_send_event() and handle_event() will both need to look up the inode
event if they get a vfsmount event. Lets just pass both at the same time
since we have them both after walking the lists in lockstep.
Signed-off-by: Eric Paris <eparis@redhat.com>
The change to use srcu and walk the object list rather than the global
fsnotify_group list means that should_send_event is no longer needed for a
number of groups and can be simplified for others. Do that.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify now gets a mark in the should_send_event and handle_event
functions. Rather than look up the mark themselves inotify should just use
the mark it was handed.
Signed-off-by: Eric Paris <eparis@redhat.com>
With the change of fsnotify to use srcu walking the marks list instead of
walking the global groups list we now know the mark in question. The code can
send the mark to the group's handling functions and the groups won't have to
find those marks themselves.
Signed-off-by: Eric Paris <eparis@redhat.com>
Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file. To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than the horrific void ** argument and such just to pass the
fanotify_merge event back to the caller of fsnotify_add_notify_event() have
those things return an event if it was different than the event suggusted to
be added.
Signed-off-by: Eric Paris <eparis@redhat.com>
It can be hard to debug fsnotify since there are so few printks. Use
pr_debug to allow for dynamic debugging.
Signed-off-by: Eric Paris <eparis@redhat.com>
An inotify watch on a directory will send events for children even if those
children have been unlinked. This patch add a new inotify flag IN_EXCL_UNLINK
which allows a watch to specificy they don't care about unlinked children.
This should fix performance problems seen by tasks which add a watch to
/tmp and then are overrun with events when other processes are reading and
writing to unlinked files they created in /tmp.
https://bugzilla.kernel.org/show_bug.cgi?id=16296
Requested-by: Matthias Clasen <mclasen@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
During the large inotify rewrite to fsnotify I completely dropped support
for IN_ONESHOT. Reimplement that support.
Signed-off-by: Eric Paris <eparis@redhat.com>
Each group can define their own notification (and secondary_q) merge
function. Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event. But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue. This patch just implements a
void ** argument which is passed to the merge function. fanotify can use
this field to pass the new event back to higher layers.
Signed-off-by: Eric Paris <eparis@redhat.com>
for fanotify to properly deal with permissions events
currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: Eric Paris <eparis@redhat.com>
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call. Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.
Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify will WARN() if it finds that the idr and the fsnotify internals
somehow got out of sync. It was only supposed to do this once but due
to this stupid bug it would warn every single time a problem was
detected.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify decides if private data it passed to get added to an event was
used by checking list_empty(). But it's possible that the event may
have been dequeued and the private event removed so it would look empty.
The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
inotify_destroy_mark_entry could get called twice for the same mark since it
is called directly in inotify_rm_watch and when the mark is being destroyed for
another reason. As an example assume that the file being watched was just
deleted so inotify_destroy_mark_entry would get called from the path
fsnotify_inoderemove() -> fsnotify_destroy_marks_by_inode() ->
fsnotify_destroy_mark_entry() -> inotify_destroy_mark_entry(). If this
happened at the same time as userspace tried to remove a watch via
inotify_rm_watch we could attempt to remove the mark from the idr twice and
could thus double dec the ref cnt and potentially could be in a use after
free/double free situation. The fix is to have inotify_rm_watch use the
generic recursive safe fsnotify_destroy_mark_by_entry() so we are sure the
inotify_destroy_mark_entry() function can only be called one.
This patch also renames the function to inotify_ingored_remove_idr() so it is
clear what is actually going on in the function.
Hopefully this fixes:
[ 20.342058] idr_remove called for id=20 which is not allocated.
[ 20.348000] Pid: 1860, comm: udevd Not tainted 2.6.30-tip #1077
[ 20.353933] Call Trace:
[ 20.356410] [<ffffffff811a82b7>] idr_remove+0x115/0x18f
[ 20.361737] [<ffffffff8134259d>] ? _spin_lock+0x6d/0x75
[ 20.367061] [<ffffffff8111640a>] ? inotify_destroy_mark_entry+0xa3/0xcf
[ 20.373771] [<ffffffff8111641e>] inotify_destroy_mark_entry+0xb7/0xcf
[ 20.380306] [<ffffffff81115913>] inotify_freeing_mark+0xe/0x10
[ 20.386238] [<ffffffff8111410d>] fsnotify_destroy_mark_by_entry+0x143/0x170
[ 20.393293] [<ffffffff811163a3>] inotify_destroy_mark_entry+0x3c/0xcf
[ 20.399829] [<ffffffff811164d1>] sys_inotify_rm_watch+0x9b/0xc6
[ 20.405850] [<ffffffff8100bcdb>] system_call_fastpath+0x16/0x1b
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by: Peter Ziljlstra <peterz@infradead.org>
inotify and dnotify will both indicate that they want any event which came
from a child inode. The fix is to mask off FS_EVENT_ON_CHILD when deciding
if inotify or dnotify is interested in a given event.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reimplement inotify_user using fsnotify. This should be feature for feature
exactly the same as the original inotify_user. This does not make any changes
to the in kernel inotify feature used by audit. Those patches (and the eventual
removal of in kernel inotify) will come after the new inotify_user proves to be
working correctly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>