Holding locks over device_del -> kobject_del -> sysfs_deactivate can
cause deadlocks if those same locks are grabbed in sysfs show or store
methods.
The I model s_active count + completion as a sleeping read/write lock.
I describe to lockdep sysfs_get_active as a read_trylock,
sysfs_put_active as a read_unlock, and sysfs_deactivate as a
write_lock and write_unlock pair. This seems to capture the essence
for purposes of finding deadlocks, and in my testing gives finds real
issues and ignores non-issues.
This brings us back to holding locks over kobject_del is a problem
that ideally we should find a way of addressing, but at least lockdep
can tell us about the problems instead of requiring developers to debug
rare strange system deadlocks, that happen when sysfs files are removed
while being written to.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These two functions do 90% of the same work and it doesn't significantly
obfuscate the function to allow both the parent dir and the name to change
at the same time. So merge them together to simplify maintenance, and
increase testing.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed. I can
show the results of renames correctly without having to
update the dcache during the directory rename.
This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With lazy inode updates and dentry operations bringing everything
into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes. This paves the way
for deleting significant chunks of now unnecessary code.
While doing this we did not reference sysfs_setattr from
sysfs_symlink_inode_operations so I added along with
sysfs_getattr and sysfs_permission.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While refreshing my sysfs patches I noticed a leak in the secdata
implementation. We don't free the secdata when we free the
sysfs dirent.
This is a bug in 2.6.32-rc5 that we really should close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
As device_move() and kobject_move() both handle a NULL destination,
sysfs_move_dir() should do this as well (again) and fall back to
sysfs_root in that case.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds a setxattr handler to the file, directory, and symlink
inode_operations structures for sysfs. The patch uses hooks introduced in the
previous patch to handle the getting and setting of security information for
the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
sysfs_dirent structure has been replaced by a structure which contains the
iattr, secdata and secdata length to allow the changes to persist in the event
that the inode representing the sysfs_dirent is evicted. Because sysfs only
stores this information when a change is made all the optional data is moved
into one dynamically allocated field.
This patch addresses an issue where SELinux was denying virtd access to the PCI
configuration entries in sysfs. The lack of setxattr handlers for sysfs
required that a single label be assigned to all entries in sysfs. Granting virtd
access to every entry in sysfs is not an acceptable solution so fine grained
labeling of sysfs is required such that individual entries can be labeled
appropriately.
[sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Update directory hardlink count when moving kobjects to a new parent.
Fixes the following problem which occurs when several devices are
moved to the same parent and then unregistered:
> ls -laF /sys/devices/css0/defunct/
> total 0
> drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./
> drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../
> drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/
> -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Modify sysfs bin files so that we can remove the bin file while they are
still mapped. When the kobject is removed we unmap the bin file and
arrange for future accesses to the mapping to receive SIGBUS.
Implementing this prevents a nasty DOS when pci devices are hot plugged
and unplugged. Where if any of their resources were mmaped the kernel
could not free up their pci resources or release their pci data
structures.
[akpm@linux-foundation.org: remove unused var]
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs: sysfs_add_one WARNs with full path to duplicate filename
As a debugging aid, it can be useful to know the full path to a
duplicate file being created in sysfs.
We now will display warnings such as:
sysfs: cannot create duplicate filename '/foo'
when attempting to create multiple files named 'foo' in the sysfs
root, or:
sysfs: cannot create duplicate filename '/bus/pci/slots/5/foo'
when attempting to create multiple files named 'foo' under a
given directory in sysfs.
The path displayed is always a relative path to sysfs_root. The
leading '/' in the path name refers to the sysfs_root mount
point, and should not be confused with the "real" '/'.
Thanks to Alex Williamson for essentially writing sysfs_pathname.
Cc: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With this patch all directory fops instances that have a readdir
that doesn't take the BKL are switched to generic_file_llseek.
Signed-off-by: Christoph Hellwig <hch@lst.de>
It finally dawned on me what the clean fix to sysfs_rename_dir
calling kobject_set_name is. Move the work into kobject_rename
where it belongs. The callers serialize us anyway so this is
safe.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As inode creation is protected by sysfs_mutex, ilookup5_nowait()
always either fails to find at all or finds one which is fully
initialized, so using ilookup5_nowait() or ilookup5() doesn't make any
difference. Switch to ilookup5() as it's planned to be removed. This
change also makes lookup return value handling a bit simpler.
This change was suggested by Al Viro.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@hera.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Support sysfs_notify from atomic context with new sysfs_notify_dirent
sysfs_notify currently takes sysfs_mutex.
This means that it cannot be called in atomic context.
sysfs_mutex is sometimes held over a malloc (sysfs_rename_dir)
so it can block on low memory.
In md I want to be able to notify on a sysfs attribute from
atomic context, and I don't want to block on low memory because I
could be in the writeout path for freeing memory.
So:
- export the "sysfs_dirent" structure along with sysfs_get, sysfs_put
and sysfs_get_dirent so I can get the sysfs_dirent that I want to
notify on and hold it in an md structure.
- split sysfs_notify_dirent out of sysfs_notify so the sysfs_dirent
can be notified on with no blocking (just a spinlock).
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes
part of the warning section for better reporting/collection. Also, with this,
one fo the if() sections collapses entirely into the WARN().
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
driver core: Suppress sysfs warnings for device_rename().
Renaming network devices to an already existing name is not
something we want sysfs to print a scary warning for, since the
callers can deal with this correctly. So let's introduce
sysfs_create_link_nowarn() which gets rid of the common warning.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It is possible that the entry in sysfs already exists, one case of this is
when a network device is renamed to bonding_masters. Anyway, in this case
the proper error path is for device_rename to return an error code, not to
generate bogus backtrace and errors.
Also, to avoid possible races, the create link should be done before the
remove link. This makes a device rename atomic operation like other renames.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After an experimental deletion of the unnecessary inclusion of
<linux/slab.h> from the header file <linux/percpu.h>, the following
files under fs/sysfs were exposed as needing to explicitly include
<linux/slab.h>.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_rename/move_dir() have the following bugs.
- On dentry lookup failure, kfree() is called on ERR_PTR() value.
- sysfs_move_dir() has an extra dput() on success path.
Fix them.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs tries to keep dcache a strict subset of sysfs_dirent tree by
shooting down dentries when a node is removed, that is, no negative
dentry for sysfs. However, the lookup function returned NULL and thus
created negative dentries when the target node didn't exist.
Make sysfs_lookup() return ERR_PTR(-ENOENT) on lookup failure. This
fixes the NULL dereference bug in sysfs_get_dentry() discovered by
bluetooth rfcomm device moving around.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Try to fix the mess created by sysfs braindamage.
- refactor code internal to fs/namei.c a little to avoid too much
duplication:
o __lookup_hash_kern is renamed back to __lookup_hash
o the old __lookup_hash goes away, permission checks moves to
the two callers
o useless inline qualifiers on above functions go away
- lookup_one_len_kern loses it's last argument and is renamed to
lookup_one_noperm to make it's useage a little more clear
- added kerneldoc comments to describe lookup_one_len aswell as
lookup_one_noperm and make it very clear that no one should use
the latter ever.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sysfs file poll implementation is scattered over sysfs and kobject.
Event numbering is done in sysfs_dirent but wait itself is done on
kobject. This not only unecessarily bloats both kobject and
sysfs_dirent but is also buggy - if a sysfs_dirent is removed while
there still are pollers, the associaton betwen the kobject and
sysfs_dirent breaks and kobject may be freed with the pollers still
sleeping on it.
This patch moves whole poll implementation into sysfs_open_dirent.
Each time a sysfs_open_dirent is created, event number restarts from 1
and pollers sleep on sysfs_open_dirent. As event sequence number is
meaningless without any open file and pollers should have open file
and thus sysfs_open_dirent, this ephemeral event counting works and is
a saner implementation.
This patch fixes the dnagling sleepers bug and reduces the sizes of
kobject and sysfs_dirent by one pointer.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Children list head is only meaninful for directory nodes. Move it
into s_dir. This doesn't save any space currently but it will with
further changes.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_attach_dentry() now has only one caller and isn't doing much
other than obfuscating the code. Open code and kill it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make s_elem an anonymous union. Prefixing with s_elem makes things
needlessly longer without any advantage.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs_add/remove_one() now link and unlink the target dirent into and
from the children list. Update comments accordingly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We want to let people know when we create a duplicate sysfs file, as
they need to fix up their code.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch rewrites sysfs_move_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache, making sysfs_move_dir
more like the rest of the sysfs directory modification
code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch rewrites sysfs_rename_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache. It turns out that this version
is a little simpler, and a little more like the rest of
the sysfs directory modification code.
tj: fixed double locking of sysfs_mutex
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The only uses of s_dentry left are the code that maintains
s_dentry and trivial users that don't actually need it.
So this patch removes the s_dentry maintenance code and
restructures the trivial uses to use something else.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that we know the sysfs tree structure cannot change under us and
sysfs shadow support is dropped, sysfs_get_dentry() can be simplified
greatly. It can just look up from the root and there's no need to
retry on failure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Looking carefully at the rename code we have a subtle dependency
that the structure of sysfs not change while we are performing
a rename. If the parent directory of the object we are renaming
changes while the rename is being performed nasty things could
happen when we go to release our locks.
So introduce a sysfs_rename_mutex to prevent this highly
unlikely theoretical issue.
In addition hold sysfs_rename_mutex over all calls to
sysfs_get_dentry. Allowing sysfs_get_dentry to be simplified
in the future.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently we find the dentry to drop by looking at sd->s_dentry.
We can just as easily accomplish the same task by looking up the
sysfs inode and finding all of the dentries from there, with the
added bonus that we don't need to play with the sysfs_assoc_lock.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At some point someone wrote sysfs_readdir to insert a cursor
into the list of sysfs_dirents to ensure that sysfs_readdir would
restart properly. That works but it is complex code and tends
to be expensive.
The same effect can be achieved by keeping the sysfs_dirents in
inode order and using the inode number as the f_pos. Then
when we restart we just have to find the first dirent whose inode
number is equal or greater then the last sysfs_dirent we attempted
to return.
Removing the sysfs directory cursor also allows the remove of
all of the mysterious checks for sysfs_type(sd) != 0. Which
were nonbovious checks to see if a cursor was in a directory list.
tj: offset marker for EOF is changed from UINT_MAX to INT_MAX to avoid
overflow in case offset is 32bit.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a small cleanup patch that makes the code just
a little bit cleaner.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch modifies the users of sysfs_mount to use sysfs_root
instead (which is what they are looking for). It then
makes sysfs_mount static to keep people from using it
by accident.
The net result is slightly faster and cleaner code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that sysfs_get_inode is dropping the inode lock
we no longer have a need from sysfs_instantiate.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
lookup_one_len_kern() should be called with the parent's i_mutex
locked. Fix it.
Spotted by Eric W. Biederman.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With the previous sysfs_add_one() update, there is only one user of
the return value of sysfs_addrm_finish() and the user can switch to
testing @sd easily. Make sysfs_addrm_finish() return void for cleaner
semantics as suggested by Satyam Sharma.
This patch doesn't introduce any noticeable behavior change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>