linux-sg2042

Commit Graph

Author	SHA1	Message	Date
David Howells	a528d35e8b	statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char filename, unsigned int flags, unsigned int mask, struct statx buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS__FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-03-02 20:51:15 -05:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
David Howells	75c3cfa855	VFS: assorted weird filesystems: d_inode() annotations Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:58 -04:00
Axel Lin	fbde7c6119	devtmpfs: Calling delete_path() only when necessary The deleted variable is always 1 in current code. Initialize deleted variable to be 0, so delete_path() will be called only when necessary. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-12-19 10:10:32 -08:00
J. Bruce Fields	27ac0ffeac	locks: break delegations on any attribute modification NFSv4 uses leases to guarantee that clients can cache metadata as well as data. Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: David Howells <dhowells@redhat.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:44 -05:00
J. Bruce Fields	b21996e36c	locks: break delegations on unlink We need to break delegations on any operation that changes the set of links pointing to an inode. Start with unlink. Such operations also hold the i_mutex on a parent directory. Breaking a delegation may require waiting for a timeout (by default 90 seconds) in the case of a unresponsive NFS client. To avoid blocking all directory operations, we therefore drop locks before waiting for the delegation. The logic then looks like: acquire locks ... test for delegation; if found: take reference on inode release locks wait for delegation break drop reference on inode retry It is possible this could never terminate. (Even if we take precautions to prevent another delegation being acquired on the same inode, we could get a different inode on each retry.) But this seems very unlikely. The initial test for a delegation happens after the lock on the target inode is acquired, but the directory inode may have been acquired further up the call stack. We therefore add a "struct inode **" argument to any intervening functions, which we use to pass the inode back up to the caller in the case it needs a delegation synchronously broken. Cc: David Howells <dhowells@redhat.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:42 -05:00
Greg Kroah-Hartman	4e4098a3e0	driver core: handle user namespaces properly with the uid/gid devtmpfs change Now that devtmpfs is caring about uid/gid, we need to use the correct internal types so users who have USER_NS enabled will have things work properly for them. Thanks to Eric for pointing this out, and the patch review. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-04-11 11:43:29 -07:00
Ming Lei	d81c8d19da	driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS If CONFIG_UIDGID_STRICT_TYPE_CHECKS is enalbed, the below compile failure will be triggered: drivers/base/devtmpfs.c: In function 'handle_create': drivers/base/devtmpfs.c:214:19: error: incompatible types when assigning to type 'kuid_t' from type 'uid_t' drivers/base/devtmpfs.c:215:19: error: incompatible types when assigning to type 'kgid_t' from type 'gid_t' make[2]: *** [drivers/base/devtmpfs.o] Error 1 This patch fixes the compile failure. Signed-off-by: Ming Lei <ming.lei@canonical.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-04-10 09:36:06 -07:00
Greg Kroah-Hartman	c3a3042090	devtmpfs: add base.h include This fixes a sparse warning, and is a good idea given that the devtmpfs_init() prototype is in this file. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-04-10 09:00:15 -07:00
Kay Sievers	3c2670e651	driver core: add uid and gid to devtmpfs Some drivers want to tell userspace what uid and gid should be used for their device nodes, so allow that information to percolate through the driver core to userspace in order to make this happen. This means that some systems (i.e. Android and friends) will not need to even run a udev-like daemon for their device node manager and can just rely in devtmpfs fully, reducing their footprint even more. Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-04-08 08:21:48 -07:00
Al Viro	3dadecce20	switch vfs_getattr() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-26 02:46:08 -05:00
Jeff Layton	1ac12b4b6d	vfs: turn is_dir argument to kern_path_create into a lookup_flags arg Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags passed in here are currently ignored. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-20 18:50:02 -05:00
Eric W. Biederman	91fa2ccaa8	userns: Convert devtmpfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-09-21 03:13:05 -07:00
Al Viro	921a1650de	new helper: done_path_create() releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:13 +04:00
Al Viro	79714f72d3	get rid of kern_path_parent() all callers want the same thing, actually - a kinda-sorta analog of kern_path_create(). I.e. they want parent vfsmount/dentry (with ->i_mutex held, to make sure the child dentry is still their child) + the child dentry. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:02 +04:00
Peter Korsgaard	02fbe5e61d	devtmpfs: fix 'the the' typo Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-04-18 15:37:35 -07:00
Linus Torvalds	972b2c7199	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc//mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount ...	2012-01-08 12:19:57 -08:00
Al Viro	fbd48a69a0	switch devtmpfs to umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:57 -05:00
Al Viro	2c9ede55ec	switch device_get_devnode() and ->devnode() to umode_t * both callers of device_get_devnode() are only interested in lower 16bits and nobody tries to return anything wider than 16bit anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
Kautuk Consul	65e6757be4	devtmpfsd: fix task state handling - Set the state to TASK_INTERRUPTIBLE using __set_current_state() instead of set_current_state() as the spin_unlock is an implicit memory barrier. - After return from schedule(), there is no need to set the current state to TASK_RUNNING - a call to schedule() always returns in TASK_RUNNING state. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-15 16:07:39 -08:00
Arnaud Lacombe	f9e0b159db	drivers/base/devtmpfs.c: correct annotation of `setup_done' This fixes the following section mismatch issue: WARNING: vmlinux.o(.text+0x1192bf): Section mismatch in reference from the function devtmpfsd() to the variable .init.data:setup_done The function devtmpfsd() references the variable __initdata setup_done. This is often because devtmpfsd lacks a __initdata annotation or the annotation of setup_done is wrong. WARNING: vmlinux.o(.text+0x119342): Section mismatch in reference from the function devtmpfsd() to the variable .init.data:setup_done The function devtmpfsd() references the variable __initdata setup_done. This is often because devtmpfsd lacks a __initdata annotation or the annotation of setup_done is wrong. Signed-off-by: Arnaud Lacombe <lacombar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-08-08 13:53:50 -07:00
Al Viro	9d108d2548	devtmpfs: missing initialialization in never-hit case create_path() on something without a single / in it will return err without initializing it. It actually can't happen (we call that thing only if create on the same path returns -ENOENT, which won't happen happen for single-component path), but in this case initializing err to 0 is more than making compiler to STFU - would be the right thing to return on such paths; the function creates a parent directory of given pathname and in that case it has no work to do... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-27 22:27:33 -04:00
Al Viro	e13889bab3	fix devtmpfs race After we's done complete(&req->done), there's nothing to prevent the scope containing req from being gone and req overwritten by any kind of junk. So we must read req->next before that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-25 14:15:50 -04:00
Al Viro	5da4e68944	devtmpfs: get rid of bogus mkdir in create_path() We do _NOT_ want to mkdir the path itself - we are preparing to mknod it, after all. Normally it'll fail with -ENOENT and just do nothing, but if somebody has created the parent in the meanwhile, we'll get buggered... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:11 -04:00
Al Viro	69753a0f14	switch devtmpfs to kern_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:10 -04:00
Al Viro	2780f1ff6a	switch devtmpfs object creation/removal to separate kernel thread ... and give it a namespace where devtmpfs would be mounted on root, thus avoiding abuses of vfs_path_lookup() (it was never intended to be used with LOOKUP_PARENT). Games with credentials are also gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:09 -04:00
Al Viro	fc14f2fef6	convert get_sb_single() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:28 -04:00
Peter Korsgaard	da5e4ef7fd	devtmpfs: support !CONFIG_TMPFS Make devtmpfs available on (embedded) configurations without SHMEM/TMPFS, using ramfs instead. Saves ~15KB. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-05-21 09:37:30 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Kay Sievers	5e31d76f28	Driver-Core: devtmpfs - reset inode permissions before unlinking Before unlinking the inode, reset the current permissions of possible references like hardlinks, so granted permissions can not be retained across the device lifetime by creating hardlinks, in the unusual case that there is a user-writable directory on the same filesystem. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:48 -08:00
Heiko Carstens	f776c5ec46	driver-core: fix devtmpfs crash on s390 On Mon, Jan 18, 2010 at 05:26:20PM +0530, Sachin Sant wrote: > Hello Heiko, > > Today while trying to boot next-20100118 i came across > the following Oops : > > Brought up 4 CPUs > Unable to handle kernel pointer dereference at virtual kernel address 0000000000 > 543000 > Oops: 0004 #1 SMP > Modules linked in: > CPU: 0 Not tainted 2.6.33-rc4-autotest-next-20100118-5-default #1 > Process swapper (pid: 1, task: 00000000fd792038, ksp: 00000000fd797a30) > Krnl PSW : 0704200180000000 00000000001eb0b8 (shmem_parse_options+0xc0/0x328) > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > Krnl GPRS: 000000000054388a 000000000000003d 0000000000543836 000000000000003d > 0000000000000000 0000000000483f28 0000000000536112 00000000fd797d00 > 00000000fd4ba100 0000000000000100 0000000000483978 0000000000543832 > 0000000000000000 0000000000465958 00000000001eb0b0 00000000fd797c58 > Krnl Code: 00000000001eb0aa: c0e5000994f1 brasl %r14,31da8c > 00000000001eb0b0: b9020022 ltgr %r2,%r2 > 00000000001eb0b4: a784010b brc 8,1eb2ca > >00000000001eb0b8: 92002000 mvi 0(%r2),0 > 00000000001eb0bc: a7080000 lhi %r0,0 > 00000000001eb0c0: 41902001 la %r9,1(%r2) > 00000000001eb0c4: b9040016 lgr %r1,%r6 > 00000000001eb0c8: b904002b lgr %r2,%r11 > Call Trace: > (<00000000fd797c50> 0xfd797c50) > <00000000001eb5da> shmem_fill_super+0x13a/0x25c > <0000000000228cfa> get_sb_single+0xbe/0xdc > <000000000034ffc0> dev_get_sb+0x2c/0x38 > <000000000066c602> devtmpfs_init+0x46/0xc0 > <000000000066c53e> driver_init+0x22/0x60 > <000000000064d40a> kernel_init+0x24e/0x3d0 > <000000000010a7ea> kernel_thread_starter+0x6/0xc > <000000000010a7e4> kernel_thread_starter+0x0/0xc > > I never tried to boot a kernel with DEVTMPFS enabled on a s390 box. > So am wondering if this is supported or not ? If you think this > is supported i will send a mail to community on this. There is nothing arch specific to devtmpfs. This part crashes because the kernel tries to modify the data read-only section which is write protected on s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-01-20 15:02:05 -08:00
Kay Sievers	8042273801	devtmpfs: unlock mutex in case of string allocation error Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-23 11:23:44 -08:00
Thomas Gleixner	f1f76f865b	devtmpfs: Convert dirlock to a mutex devtmpfs has a rw_lock dirlock which serializes delete_path and create_path. This code was obviously never tested with the usual set of debugging facilities enabled. In the dirlock held sections the code calls: - vfs functions which take mutexes - kmalloc(, GFP_KERNEL) In both code pathes the might sleep warning triggers and spams dmesg. Convert the rw_lock to a mutex. There is no reason why this needs to be a rwlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-23 11:23:42 -08:00
Kay Sievers	03d673e6af	Driver-Core: devtmpfs - set root directory mode to 0755 Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mark Rosenstand <rosenstand@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:52 -08:00
Kay Sievers	015bf43b07	Driver Core: devtmpfs: do not remove non-kernel-created directories Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:52 -08:00
Kay Sievers	073120cc28	Driver Core: devtmpfs: use sys_mount() Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:51 -08:00
Kay Sievers	ed413ae6e7	Driver core: devtmpfs: prevent concurrent subdirectory creation and removal Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:51 -08:00
Kay Sievers	0092699643	Driver Core: devtmpfs: ignore umask while setting file mode Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-12-11 11:24:51 -08:00
Kay Sievers	e454cea20b	Driver-Core: extend devnode callbacks to provide permissions This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-19 12:50:38 -07:00
Kay Sievers	2b2af54a5b	Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev Devtmpfs lets the kernel create a tmpfs instance called devtmpfs very early at kernel initialization, before any driver-core device is registered. Every device with a major/minor will provide a device node in devtmpfs. Devtmpfs can be changed and altered by userspace at any time, and in any way needed - just like today's udev-mounted tmpfs. Unmodified udev versions will run just fine on top of it, and will recognize an already existing kernel-created device node and use it. The default node permissions are root:root 0600. Proper permissions and user/group ownership, meaningful symlinks, all other policy still needs to be applied by userspace. If a node is created by devtmps, devtmpfs will remove the device node when the device goes away. If the device node was created by userspace, or the devtmpfs created node was replaced by userspace, it will no longer be removed by devtmpfs. If it is requested to auto-mount it, it makes init=/bin/sh work without any further userspace support. /dev will be fully populated and dynamic, and always reflect the current device state of the kernel. With the commonly used dynamic device numbers, it solves the problem where static devices nodes may point to the wrong devices. It is intended to make the initial bootup logic simpler and more robust, by de-coupling the creation of the inital environment, to reliably run userspace processes, from a complex userspace bootstrap logic to provide a working /dev. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Tested-By: Harald Hoyer <harald@redhat.com> Tested-By: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-15 09:50:49 -07:00

40 Commits