OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Stanislav Kinsbursky	3d1221dfa9	LockD: service start function introduced This is just a code move, which from my POV makes the code look better. I.e. now on start we have 3 different stages: 1) Service creation. 2) Service per-net data allocation. 3) Service start. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:44 -04:00
Stanislav Kinsbursky	7d13ec761a	LockD: move global usage counter manipulation from error path Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:43 -04:00
Stanislav Kinsbursky	2445223909	LockD: service creation function introduced This function creates service if it doesn't exist, or increases usage counter if it does, and returns a pointer to it. The usage counter will be droppepd by svc_destroy() later in lockd_up(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky	dbf9b5d74c	LockD: use existing per-net data function on service creation This patch also replaces svc_rpcb_setup() with svc_bind(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:42 -04:00
Stanislav Kinsbursky	4db77695bf	LockD: pass service to per-net up and down functions Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:41 -04:00
Stanislav Kinsbursky	786185b5f8	SUNRPC: move per-net operations from svc_destroy() The idea is to separate service destruction and per-net operations, because these are two different things and the mix looks ugly. Notes: 1) For NFS server this patch looks ugly (sorry for that). But these place will be rewritten soon during NFSd containerization. 2) LockD per-net counter increase int lockd_up() was moved prior to make_socks() to make lockd_down_net() call safe in case of error. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:40 -04:00
Stanislav Kinsbursky	9793f7c889	SUNRPC: new svc_bind() routine introduced This new routine is responsible for service registration in a specified network context. The idea is to separate service creation from per-net operations. Note also: since registering service with svc_bind() can fail, the service will be destroyed and during destruction it will try to unregister itself from rpcbind. In this case unregistration has to be skipped. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:39 -04:00
Weston Andros Adamson	e7a0444aef	nfsd: add IPv6 addr escaping to fs_location hosts The fs_location->hosts list is split on colons, but this doesn't work when IPv6 addresses are used (they contain colons). This patch adds the function nfsd4_encode_components_esc() to allow the caller to specify escape characters when splitting on 'sep'. In order to fix referrals, this patch must be used with the mountd patch that similarly fixes IPv6 [] escaping. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:38 -04:00
J. Bruce Fields	45eaa1c1a1	nfsd4: fix change attribute endianness Though actually this doesn't matter much, as NFSv4.0 clients are required to treat the change attribute as opaque. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:38 -04:00
J. Bruce Fields	d1829b3824	nfsd4: fix free_stateid return endianness Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:37 -04:00
J. Bruce Fields	57b7b43b40	nfsd4: int/__be32 fixes In each of these cases there's a simple unambiguous correct choice, and no actual bug. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:37 -04:00
J. Bruce Fields	bc1b542be9	nfsd4: preserve __user annotation on cld downcall msg Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:36 -04:00
J. Bruce Fields	2355c59644	nfsd4: fix missing "static" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:35 -04:00
J. Bruce Fields	bfa4b36525	nfsd: state.c should include current_stateid.h OK, admittedly I'm mainly just trying to shut sparse up. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-05-31 20:29:35 -04:00
Chris Mason	1e20932a23	Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-05-31 16:49:53 -04:00
Trond Myklebust	1d59d61f60	NFS: Ensure that setattr and getattr wait for O_DIRECT write completion Use the same mechanism as the block devices are using, but move the helper functions from fs/direct-io.c into fs/inode.c to remove the dependency on CONFIG_BLOCK. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Fred Isaman <iisaman@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-31 11:41:36 -07:00
Jan Schmidt	c31931088f	Btrfs: fix tree mod log rewinded level and rewinding of moved keys When we rewind REMOVE_WHILE_FREEING operations, there's code that allocates a fresh buffer instead of cloning the old one. Setting that buffer's level correctly was missing in this case. When rewinding a MOVE_KEYS operation, btrfs_node_key_ptr_offset(slot) was missing for memmove_extent_buffer()'s arguments. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-31 19:56:19 +02:00
Jan Schmidt	f395694c2c	Btrfs: fix tree mod log del_ptr Logging for del_ptr when we're not deleting the last pointer was wrong. This fixes both, duplicate log entries and log sequence. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-31 19:56:19 +02:00
Jan Schmidt	e9b7fd4d8b	Btrfs: add tree_mod_dont_log helper Replace duplicate code by small inline helper function. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-31 19:56:18 +02:00
Jan Schmidt	926dd8a640	Btrfs: add missing spin_lock for insertion into tree mod log tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before doing so. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-31 19:56:18 +02:00
Jan Schmidt	3301958b7c	Btrfs: add inodes before dropping the extent lock in find_all_leafs We must build up the inode list with the extent lock held after following indirect refs. This also requires an extension to ulists, which allows to modify the stored aux value in case a key already exists in the list. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-31 19:53:08 +02:00
Al Viro	e5467859f7	split ->file_mmap() into ->mmap_addr()/->mmap_file() ... i.e. file-dependent and address-dependent checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-31 13:11:54 -04:00
Theodore Ts'o	f3fc0210c0	ext4: add missing save_error_info() to ext4_error() The ext4_error() function is missing a call to save_error_info(). Since this is the function which marks the file system as containing an error, this oversight (which was introduced in 2.6.36) is quite significant, and should be backported to older stable kernels with high urgency. Reported-by: Ken Sumrall <ksumrall@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ksumrall@google.com Cc: stable@kernel.org	2012-05-30 23:00:16 -04:00
Theodore Ts'o	2c0544b235	ext4: add debugging trigger for ext4_error() Make it easy to test whether or not the error handling subsystem in ext4 is working correctly. This allows us to simulate an ext4_error() by echoing a string to /sys/fs/ext4/<dev>/trigger_fs_error. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ksumrall@google.com	2012-05-30 22:56:46 -04:00
Al Viro	7696e0c37f	binfmt_flat: use vm_munmap, we are missing ->mmap_sem there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:56 -04:00
Al Viro	5a5e4c2eca	binfmt_elf: switch elf_map() to vm_mmap/vm_munmap No reason to hold ->mmap_sem over the sequence Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:55 -04:00
Al Viro	63d37a84ab	vfs: umount_tree() might be called on subtree that had never made it __mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:55 -04:00
Will Deacon	46ce341b2f	pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command As described in commit `07d106d0a` ("vfs: fix up ENOIOCTLCMD error handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl command which they don't understand. Doing so will result in -ENOTTY being returned to userspace, which matches the behaviour of the compat layer if it fails to translate an ioctl command. This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL when passed an unknown ioctl command. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:55 -04:00
J. Bruce Fields	3f50fff4da	vfs: remove unused __d_splice_alias argument Nobody sets want_disconn any more. Reported-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:54 -04:00
J. Bruce Fields	7732a557b1	vfs: stop d_splice_alias creating directory aliases A directory should never have more than one dentry pointing to it. But d_splice_alias() will add one if it finds a directory with an already-existing non-DISCONNECTED dentry. I can't find an obvious reproducer, but I also can't see what prevents d_splice_alias() from encountering such a case. It therefore seems safest to allow d_splice_alias to use any dentry it finds. (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0, this could cause an nfsd deadlock like this: - Somebody attempts to remove a non-empty directory. - The dentry_unhash() in vfs_rmdir() unhashes the dentry pointing to the non-empty directory. - ->rmdir() then fails with -ENOTEMPTY - Before the vfs_rmdir() caller reaches dput(), an nfsd process in rename looks up the directory by filehandle; at the end of that lookup, this dentry is found by d_alloc_anon(), and a reference is taken on it, preventing dput() from removing it. - A regular lookup of the directory calls d_splice_alias(), finds only an unhashed (not a DISCONNECTED) dentry, and insteads adds a new one, so the directory now has two dentries. - The nfsd process in rename, which was previously looking up the source directory of the rename, now looks up the target directory (which is the same), and gets the dentry newly created by the previous lookup. - The rename, seeing two different dentries, assumes this is a cross-directory rename and attempts to take the i_mutex on the directory twice. That reproducer no longer exists, but I don't think there was anything fundamentally incorrect about the vfs_rmdir() behavior there, so I think the real fault was here in d_splice_alias().) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:54 -04:00
Dan Carpenter	fd657170c0	fsnotify: remove unused parameter from send_to_group() We don't use "mnt" anymore in send_to_group() after `1968f5eed5` ("fanotify: use both marks when possible") was applied. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:53 -04:00
Dmitry Kasatkin	799243a389	vfs: increment iversion when a file is truncated When a file is truncated with truncate()/ftruncate() and then closed, iversion is not updated. This patch uses ATTR_SIZE flag as an indication to increment iversion. Mimi said: On fput(), i_version is used to detect and flag files that have changed and need to be re-measured in the IMA measurement policy. When a file is truncated with truncate()/ftruncate() and then closed, i_version is not updated. As a result, although the file has changed, it will not be re-measured and added to the IMA measurement list on subsequent access. Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:53 -04:00
Shai Fultheim	a0a9b04337	fs: Move bh_cachep to the __read_mostly section bh_cachep is only written to once on initialization, so move it to the __read_mostly section. Signed-off-by: Shai Fultheim <shai@scalemp.com> Signed-off-by: Vlad Zolotarov <vlad@scalemp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:52 -04:00
Cong Wang	3ed37648e1	fs: move file_remove_suid() to fs/inode.c file_remove_suid() is a generic function operates on struct file, it almost has no relations with file mapping, so move it to fs/inode.c. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:52 -04:00
Artem Bityutskiy	8bdc81c506	jffs2: get rid of jffs2_sync_super Currently JFFS2 file-system maps the VFS "superblock" abstraction to the write-buffer. Namely, it uses VFS services to synchronize the write-buffer periodically. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblock using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds no matter what. So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super' VFS service, and then remove it together with the kernel thread. This patch switches the JFFS2 write-buffer management from '->write_super()'/'->s_dirt' to a delayed work. Instead of setting the 's_dirt' flag we just schedule a delayed work for synchronizing the write-buffer. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:52 -04:00
Artem Bityutskiy	06688905cc	jffs2: remove unnecessary GC pass on sync We do not need to call 'jffs2_write_super()' on sync. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows sync down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see `d579ed00aa`. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:51 -04:00
Artem Bityutskiy	d0490eea14	jffs2: remove unnecessary GC pass on umount We do not need to call 'jffs2_write_super()' on unmount. This function causes a GC pass to make sure the current contents is pushed out with the data which we already have on the media. But this is not needed on unmount and only slows unmount down unnecessarily. It is enough to just sync the write-buffer. This call was added by one of the generic VFS rework patch-sets, see `8c85e12512`. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:51 -04:00
Artem Bityutskiy	3a0c0e26b6	jffs2: remove lock_super We do not need 'lock_super()'/'unlock_super()' in JFFS2 - kill them. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-30 21:04:51 -04:00
Linus Torvalds	af56e0aa35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...	2012-05-30 11:17:19 -07:00
Jan Schmidt	95a06077f7	Btrfs: use delayed ref sequence numbers for all fs-tree updates The sequence number for delayed refs is needed to postpone certain delayed refs for a very short period while walking backrefs. Before the tree modification log, we thought we'd only have to hold back those references that don't have a counter operation. While now we've the tree mod log, we're rewinding fs tree blocks to a defined consistent state. We cannot know in advance for which tree block we'll be doing rewind operations later. Therefore, we must postpone all the delayed refs for fs-tree blocks, even those having a counter operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 18:18:21 +02:00
Chris Mason	cfc442b696	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into HEAD	2012-05-30 11:55:38 -04:00
Linus Torvalds	0d167518e0	Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block Merge block/IO core bits from Jens Axboe: "This is a bit bigger on the core side than usual, but that is purely because we decided to hold off on parts of Tejun's submission on 3.4 to give it a bit more time to simmer. As a consequence, it's seen a long cycle in for-next. It contains: - Bug fix from Dan, wrong locking type. - Relax splice gifting restriction from Eric. - A ton of updates from Tejun, primarily for blkcg. This improves the code a lot, making the API nicer and cleaner, and also includes fixes for how we handle and tie policies and re-activate on switches. The changes also include generic bug fixes. - A simple fix from Vivek, along with a fix for doing proper delayed allocation of the blkcg stats." Fix up annoying conflict just due to different merge resolution in Documentation/feature-removal-schedule.txt * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits) blkcg: tg_stats_alloc_lock is an irq lock vmsplice: relax alignement requirements for SPLICE_F_GIFT blkcg: use radix tree to index blkgs from blkcg blkcg: fix blkcg->css ref leak in __blkg_lookup_create() block: fix elvpriv allocation failure handling block: collapse blk_alloc_request() into get_request() blkcg: collapse blkcg_policy_ops into blkcg_policy blkcg: embed struct blkg_policy_data in policy specific data blkcg: mass rename of blkcg API blkcg: style cleanups for blk-cgroup.h blkcg: remove blkio_group->path[] blkcg: blkg_rwstat_read() was missing inline blkcg: shoot down blkgs if all policies are deactivated blkcg: drop stuff unused after per-queue policy activation update blkcg: implement per-queue policy activation blkcg: add request_queue->root_blkg blkcg: make request_queue bypassing on allocation blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing blkcg: make blkg_conf_prep() take @pol and return with queue lock held blkcg: remove static policy ID enums ...	2012-05-30 08:52:42 -07:00
Stefan Behrens	48235a68a3	Btrfs: fix false positive in check-integrity on unmount During unmount, it could happen that the integrity checker printed a warning message "attempt to free ... on umount which is not yet iodone" which turned out to be a false positive. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:44 -04:00
Stefan Behrens	86ff7ffce0	Btrfs: fix runtime warning in check-integrity check data mode If a file_extent_item was located at the very end of a leaf and there was not enough space to hold a full item, but there was enough space to hold one of type BTRFS_FILE_EXTENT_INLINE or PREALLOC, and it was only such a short item, a warning was printed anyway. This check is now fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:43 -04:00
Stefan Behrens	3d136a1131	Btrfs: set ioprio of scrub readahead to idle Reduce ioprio class of scrub readahead threads to idle priority. This setting is fixed. This priority has shown the best performance during all measurements. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:43 -04:00
Josef Bacik	5bdbeb2187	Btrfs: fix return code in drop_objectid_items So dpkg fsync()'s the file and the directory containing the file whenever it writes to a file which is really slow in btrfs. This is partly because fsync()'ing a directory _always_ committed the transaction instead of just going to the tree log. This is because drop_objectid_items() would return 1 since it does a btrfs_search_slot() which returns 1. In tree-log jargon this means that we have to commit the transaction to be safe. So just check if ret is greater than 0 and set it to 0 if it does. With this patch we now use the tree-log instead of committing the entire transaction, which is twice as fast on my box. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:42 -04:00
Josef Bacik	22ee6985de	Btrfs: check to see if the inode is in the log before fsyncing We have this check down in the actual logging code, but this is after we start a transaction and all that good stuff. So move the helper inode_in_log() out so we can call it in fsync() and avoid starting a transaction altogether and just exit if we've already fsync()'ed this file recently. You would notice this issue if you fsync()'ed a file over and over again until the transaction committed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:42 -04:00
Tsutomu Itoh	018642a1f1	Btrfs: return value of btrfs_read_buffer is checked correctly btrfs_read_buffer() has the possibility of returning the error. Therefore, I add the code in which the return value of btrfs_read_buffer() is checked. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>	2012-05-30 10:23:41 -04:00
Stefan Behrens	733f4fbbc1	Btrfs: read device stats on mount, write modified ones during commit The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:41 -04:00
Stefan Behrens	c11d2c236c	Btrfs: add ioctl to get and reset the device stats An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:40 -04:00
Stefan Behrens	442a4f6308	Btrfs: add device counters for detected IO and checksum errors The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-05-30 10:23:39 -04:00
Asias He	d07eb91170	btrfs: Drop unused function btrfs_abort_devices() 1) This function is not used anywhere. 2) Using the blk_abort_queue() to abort the queue seems not correct. blk_abort_queue() is used for timeout handling (block/blk-timeout.c). Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Signed-off-by: Asias He <asias@redhat.com>	2012-05-30 10:23:39 -04:00
Miao Xie	762f226326	Btrfs: fix the same inode id problem when doing auto defragment Two files in the different subvolumes may have the same inode id, so The rb-tree which is used to manage the defragment object must take it into account. This patch fix this problem. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2012-05-30 10:23:38 -04:00
Josef Bacik	2adcac1a73	Btrfs: fall back to non-inline if we don't have enough space If cow_file_range_inline fails with ENOSPC we abort the transaction which isn't very nice. This really shouldn't be happening anyways but there's no sense in making it a horrible error when we can easily just go allocate normal data space for this stuff. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:38 -04:00
Josef Bacik	8a35d95ff4	Btrfs: fix how we deal with the orphan block rsv Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Btrfs: fix how we deal with the orphan block rsv Ceph was hitting this race where we would remove an inode from the per-root orphan list before we would release the space we had reserved for the inode. We actually don't need a list or anything, we just need to make sure the root doesn't try to free up the orphan reserve until after the inodes have released their reservations. So use an atomic counter instead of a list on the root and only decrement the counter after we've released our reservation. I've tested this as well as several others and we no longer see the warnings that you would see while running ceph. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:37 -04:00
Josef Bacik	72ac3c0d79	Btrfs: convert the inode bit field to use the actual bit operations Miao pointed this out while I was working on an orphan problem that messing with a bitfield where different ranges are protected by different locks doesn't work out right. Turns out we've been doing this forever where we have different parts of the bit field protected by either no lock at all or different locks which could cause all sorts of weird problems including the issue I was hitting. So instead make a runtime_flags thing that we use the normal bit operations on that are all atomic so we can keep having our no/different locking for the different flags and then make force_compress it's own thing so it can be treated normally. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:36 -04:00
Josef Bacik	cd023e7b17	Btrfs: merge contigous regions when loading free space cache When we write out the free space cache we will write out everything that is in our in memory tree, and then we will just walk the pinned extents tree and write anything we see there. The problem with this is that during normal operations the pinned extents will be merged back into the free space tree normally, and then we can allocate space from the merged areas and commit them to the tree log. If we crash and replay the tree log we will crash again because the tree log will try to free up space from what looks like 2 seperate but contiguous entries, since one entry is from the original free space cache and the other was a pinned extent that was merged back. To fix this we just need to walk the free space tree after we load it and merge contiguous entries back together. This will keep the tree log stuff from breaking and it will make the allocator behave more nicely. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:36 -04:00
Liu Bo	9ba1f6e44e	Btrfs: do not do balance in readonly mode In normal cases, we would not be allowed to do balance in RO mode. However, when we're using a seeding device and adding another device to sprout, things will change: $ mkfs.btrfs /dev/sdb7 $ btrfstune -S 1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs fi bal /mnt/btrfs -----------------------> fail. $ btrfs dev add /dev/sdb8 /mnt/btrfs $ btrfs fi bal /mnt/btrfs -----------------------> works! It should not be designed as an exception, and we'd better add another check for mnt flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:35 -04:00
Liu Bo	d1ac6e41d5	Btrfs: use fastpath in extent state ops as much as possible Fully utilize our extent state's new helper functions to use fastpath as much as possible. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:34 -04:00
Liu Bo	f8c5d0b443	Btrfs: fix wrong error returned by adding a device Reproduce: $ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs dev add /dev/sdb8 /mnt/btrfs ERROR: error adding the device '/dev/sdb8' - Invalid argument Since we mount with readonly options, and /dev/sdb7 is not a seeding one, a readonly notification is preferred. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:34 -04:00
Josef Bacik	5fd0204355	Btrfs: finish ordered extents in their own thread We noticed that the ordered extent completion doesn't really rely on having a page and that it could be done independantly of ending the writeback on a page. This patch makes us not do the threaded endio stuff for normal buffered writes and direct writes so we can end page writeback as soon as possible (in irq context) and only start threads to do the ordered work when it is actually done. Compression needs to be reworked some to take advantage of this as well, but atm it has to do a find_get_page in its endio handler so it must be done in its own thread. This makes direct writes quite a bit faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:33 -04:00
Josef Bacik	4e89915220	Btrfs: do not check delalloc when updating disk_i_size We are checking delalloc to see if it is ok to update the i_size. There are 2 cases it stops us from updating 1) If there is delalloc between our current disk_i_size and this ordered extent 2) If there is delalloc between our current ordered extent and the next ordered extent These tests are racy however since we can set delalloc for these ranges at any time. Also for the first case if we notice there is delalloc between disk_i_size and our ordered extent we will not update disk_i_size and assume that when that delalloc bit gets written out it will update everything properly. However if we crash before that we will have file extents outside of our i_size, which is not good, so this test is dangerous as well as racy. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:33 -04:00
Jim Meyering	f60d16a892	Btrfs: avoid buffer overrun in mount option handling There is an off-by-one error: allocating room for a maximal result string but without room for a trailing NUL. That, can lead to returning a transformed string that is not NUL-terminated, and then to a caller reading beyond end of the malloc'd buffer. Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy (the result is guaranteed to fit), remove dead strlen at end, and change a few variable names and comments. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Jim Meyering <meyering@redhat.com>	2012-05-30 10:23:32 -04:00
Jim Meyering	a27202fbe9	Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer would not be NUL-terminated in the DEV_INFO ioctl result buffer. Signed-off-by: Jim Meyering <meyering@redhat.com>	2012-05-30 10:23:31 -04:00
Jim Meyering	f07c9a79f0	Btrfs: avoid buffer overrun in btrfs_printk The buffer read-overrun would be triggered by a printk format starting with <N>, where N is a single digit. NUL-terminate after strncpy. Use memcpy, not strncpy, since we know the string we're copying fits in the destination buffer and contains no NUL byte. Signed-off-by: Jim Meyering <meyering@redhat.com>	2012-05-30 10:23:31 -04:00
Daniel J Blueman	2eec6c8102	Fix minor type issues Address some minor type issues identified by sparse checker. Signed-off-by: Daniel J Blueman <daniel@quora.org>	2012-05-30 10:23:30 -04:00
Sergei Trofimovich	0d2450abfa	btrfs: allow changing 'thread_pool' size at remount time Changing 'mount -oremount,thread_pool=2 /' didn't make any effect: maximum amount of worker threads is specified in 2 places: - in 'strict btrfs_fs_info::thread_pool_size' - in each worker struct: 'struct btrfs_workers::max_workers' 'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'. Fix it by pushing new maximum value to all created worker structures as well. Cc: Josef Bacik <josef@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2012-05-30 10:23:30 -04:00
Josef Bacik	0885ef5b56	Btrfs: do not do filemap_write_and_wait_range in fsync We already do the btrfs_wait_ordered_range which will do this for us, so just remove this call so we don't call it twice. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:29 -04:00
Josef Bacik	551ebb2d34	Btrfs: remove useless waiting and extra filemap work In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice because compression does strange things and then waiting. Then we look up ordered extents and if we find any we will always schedule_timeout(); once and then loop back around and do it all again. We will even check to see if there is delalloc pages on this range and loop again. So this patch gets rid of the multipe fdata_write() calls and just does filemap_write_and_wait(). In the case of compression we will still find the ordered extents and start those individually if we need to so that is ok, but in the normal buffered case we avoid all this weird overhead. Then in the case of the schedule_timeout(1), we don't need it. All callers either 1) don't care, they just want to make sure what they just wrote maeks it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing in which case it will lock and check for ordered extents _anyway_ so get back to them as quickly as possible. The delaloc check is simply not needed, this only catches the case where we write to the file again since doing the filemap_write_and_wait() and if the caller truly cares about that it will take care of everything itself. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:28 -04:00
Josef Bacik	d7dbe9e7f6	Btrfs: fix compile warnings in extent_io.c These warnings are bogus since we will always have at least one page in an eb, but to make the compiler happy just set ret = 0 in these two cases. Thanks, Btrfs: fix compile warnings in extent_io.c These warnings are bogus since we will always have at least one page in an eb, but to make the compiler happy just set ret = 0 in these two cases. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:28 -04:00
Josef Bacik	30f8fe3e47	Btrfs: cache no acl on new inodes When running compilebench I noticed we were spending some time looking up acls on new inodes, which shouldn't be happening since there were no acls. This is because when we init acls on the inode after creating them we don't cache the fact there are no acls if there aren't any. Doing this adds a little bit of a bump to my compilebench runs. Thanks, Btrfs: cache no acl on new inodes Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:27 -04:00
Josef Bacik	0c4d2d95d0	Btrfs: use i_version instead of our own sequence We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:27 -04:00
Jan Schmidt	20b297d620	Btrfs: tree mod log sanity checks in join_transaction When a fresh transaction begins, the tree mod log must be clean. Users of the tree modification log must ensure they never span across transaction boundaries. We reset the sequence to 0 in this safe situation to make absolutely sure overflow can't happen. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:36 +02:00
Jan Schmidt	19ae4e8133	Btrfs: fs_info variable for join_transaction Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:35 +02:00
Jan Schmidt	8445f61cad	Btrfs: use the tree modification log for backref resolving This enables backref resolving on life trees while they are changing. This is a prerequisite for quota groups and just nice to have for everything else. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:34 +02:00
Jan Schmidt	5d9e75c41d	Btrfs: add btrfs_search_old_slot The tree modification log together with the current state of the tree gives a consistent, old version of the tree. btrfs_search_old_slot is used to search through this old version and return old (dummy!) extent buffers. Naturally, this function cannot do any tree modifications. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:33 +02:00
Jan Schmidt	f3ea38da3e	Btrfs: add del_ptr and insert_ptr modifications to the tree mod log Record all relevant modifications to block pointers in the tree mod log so that we can rewind them later on for backref walking. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:32 +02:00
Jan Schmidt	f230475e62	Btrfs: put all block modifications into the tree mod log When running functions that can make changes to the internal trees (e.g. btrfs_search_slot), we check if somebody may be interested in the block we're currently modifying. If so, we record our modification to be able to rewind it later on. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:29 +02:00
Jan Schmidt	bd989ba359	Btrfs: add tree modification log functions The tree mod log will log modifications made fs-tree nodes. Most modifications are done by autobalance of the tree. Such changes are recorded as long as a block entry exists. When released, the log is cleaned. With the tree modification log, it's possible to reconstruct a consistent old state of the tree. This is required to do backref walking on a busy file system. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-30 15:17:01 +02:00
Al Viro	1676765238	get rid of idiotic misplaced __kernel_mode_t in ncfps kernel-private data structure Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:42 -04:00
Andi Kleen	962830df36	brlocks/lglocks: API cleanups lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Andi Kleen	eea62f831b	brlocks/lglocks: turn into functions lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. Since there are at least two users it makes sense to share this code in a library. This is also easier maintainable than a macro forest. This will also make it later possible to dynamically allocate lglocks and also use them in modules (this would both still need some additional, but now straightforward, code) [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Al Viro	ea022dfb3c	ocfs: simplify symlink handling seeing that "fast" symlinks still get allocation + copy, we might as well simply switch them to pagecache-based variant of ->follow_link(); just need an appropriate ->readpage() for them... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:40 -04:00
Al Viro	408bd629ba	get rid of pointless allocations and copying in ecryptfs_follow_link() switch to generic_readlink(), while we are at it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:40 -04:00
Al Viro	28fe3c1963	hpfs: assorted endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:39 -04:00
Al Viro	77ee26e44c	hpfs: annotate ea Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:39 -04:00
Al Viro	46287aa652	hpfs: annotate struct hpfs_dirent Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:39 -04:00
Al Viro	6ce2bbba52	hpfs: annotate struct anode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:38 -04:00
Al Viro	2b9f1cc29b	hpfs: annotate struct fnode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:38 -04:00
Al Viro	ddc19e6e04	hpfs: annotate btree nodes, get rid of bitfields mess Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:38 -04:00
Al Viro	39413c6046	hpfs: annotate struct dnode little-endians... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:37 -04:00
Al Viro	52576da354	hpfs: bitmaps are little-endian annotate properly... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:37 -04:00
Al Viro	c4c995430a	hpfs: get rid of bitfields in struct fnode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:37 -04:00
Al Viro	4085e155b1	hpfs: get rid of bitfields endianness wanking in extended_attribute Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:36 -04:00
Randy Dunlap	185553b224	fs: fix inode.c kernel-doc warnings Fix kernel-doc warnings in fs/inode.c: Warning(fs/inode.c:1493): No description found for parameter 'path' Warning(fs/inode.c:1493): Excess function parameter 'mnt' description in 'touch_atime' Warning(fs/inode.c:1493): Excess function parameter 'dentry' description in 'touch_atime' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:36 -04:00
Al Viro	de5e2b3628	hpfs: endianness bugs a couple of le32 and le16 used with wrong le..._to_cpu(), plus idiotic use of le32_to_cpu() on 1-bit bitfield Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:36 -04:00
Al Viro	528c032764	btrfs: trivial endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:35 -04:00
Al Viro	1db5df98fa	ocfs2: kill endianness abuses in blockcheck.c ocfs2_block_check is for little-endian contents; if we just want to its fields converted to host-endian in a couple of functions, just put those values into local u32 and u16... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:35 -04:00
Al Viro	f6a5690324	ocfs2: deal with __user misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:35 -04:00
Al Viro	8515841086	ocfs2: trivial endianness misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:34 -04:00
Al Viro	66f8f50920	affs: bury unused macros ... unused since 2.4.4. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:34 -04:00
Al Viro	af569596a9	kill v9fs_dentry_from_dir_inode() In all callers we have a dentry of child of that directory. Just use ->d_parent of that one, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:34 -04:00
Sage Weil	c862868bb4	ceph: move encode_fh to new API Use parent_inode has a flag for whether nfsd wants a connectable fh, but generate one opportunistically so that we can take advantage of the additional info in there. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:33 -04:00
Al Viro	b0b0382bb4	->encode_fh() API change pass inode + parent's inode or NULL instead of dentry + bool saying whether we want the parent or not. NOTE: that needs ceph fix folded in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:33 -04:00
Al Viro	6d42e7e9f6	ubifs: use generic_fillattr() don't open-code it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:32 -04:00
Al Viro	77ba78776e	xfs: switch to proper __bitwise type for KM_... flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:32 -04:00
Al Viro	c217a2a004	switch utimes() to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:32 -04:00
Al Viro	0aa2ee5f0a	switch statfs to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:31 -04:00
Al Viro	bdc689594b	switch flock to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:31 -04:00
Al Viro	20ba5d736f	switch signalfd4() to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:30 -04:00
Al Viro	545ec2c794	switch fcntl to fget_raw_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:30 -04:00
Al Viro	7449af1e8b	switch xattr syscalls to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:30 -04:00
Al Viro	863ced7fe7	switch readdir/getdents to fget_light/fput_light Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:29 -04:00
Al Viro	c2bd6c11cd	switch do_fsync() to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:29 -04:00
Linus Torvalds	7d36014b97	Merge branch 'akpm' (Andrew's patch-bomb) Merge patches through Andrew Morton: "180 patches - err 181 - listed below: - most of MM. I held back the (large) "memcg: add hugetlb extension" series because a bunfight has recently broken out. - leds. After this, Bryan Wu will be handling drivers/leds/ - backlight - lib/ - rtc" * emailed from Andrew Morton <akpm@linux-foundation.org>: (181 patches) drivers/rtc/rtc-s3c.c: fix compiler warning drivers/rtc/rtc-tegra.c: clean up probe/remove routines drivers/rtc/rtc-pl031.c: remove RTC timer interrupt handling drivers/rtc/rtc-lpc32xx.c: add device tree support drivers/rtc/rtc-m41t93.c: don't let get_time() reset M41T93_FLAG_OF rtc: ds1307: add trickle charger support rtc: ds1307: remove superfluous initialization rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC drivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers" drivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature rtc: add ioctl to get/clear battery low voltage status drivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver() rtc/spear: add Device Tree probing capability lib/vsprintf.c: "%#o",0 becomes '0' instead of '00' radix-tree: fix preload vector size spinlock_debug: print kallsyms name for lock vsprintf: fix %ps on non symbols when using kallsyms lib/bitmap.c: fix documentation for scnprintf() functions lib/string_helpers.c: make arrays static lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata ...	2012-05-29 18:05:31 -07:00
David Rientjes	a7f638f999	mm, oom: normalize oom scores to oom_score_adj scale only for userspace The oom_score_adj scale ranges from -1000 to 1000 and represents the proportion of memory available to the process at allocation time. This means an oom_score_adj value of 300, for example, will bias a process as though it was using an extra 30.0% of available memory and a value of -350 will discount 35.0% of available memory from its usage. The oom killer badness heuristic also uses this scale to report the oom score for each eligible process in determining the "best" process to kill. Thus, it can only differentiate each process's memory usage by 0.1% of system RAM. On large systems, this can end up being a large amount of memory: 256MB on 256GB systems, for example. This can be fixed by having the badness heuristic to use the actual memory usage in scoring threads and then normalizing it to the oom_score_adj scale for userspace. This results in better comparison between eligible threads for kill and no change from the userspace perspective. Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:24 -07:00
Hugh Dickins	17cf28afea	mm/fs: remove truncate_range Remove vmtruncate_range(), and remove the truncate_range method from struct inode_operations: only tmpfs ever supported it, and tmpfs has now converted over to using the fallocate method of file_operations. Update Documentation accordingly, adding (setlease and) fallocate lines. And while we're in mm.h, remove duplicate declarations of shmem_lock() and shmem_file_setup(): everyone is now using the ones in shmem_fs.h. Based-on-patch-by: Cong Wang <amwang@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:23 -07:00
Sasha Levin	08fa29d916	mm: fix NULL ptr deref when walking hugepages A missing validation of the value returned by find_vma() could cause a NULL ptr dereference when walking the pagetable. This is triggerable from usermode by a simple user by trying to read a page info out of /proc/pid/pagemap which doesn't exist. Introduced by commit `025c5b2451` ("thp: optimize away unnecessary page table locking"). Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:18 -07:00
Linus Torvalds	442a9ffabb	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS updates from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (29 commits) cifs: fix oops while traversing open file list (try #4) cifs: Fix comment as d_alloc_root() is replaced by d_make_root() CIFS: Introduce SMB2 mounts as vers=2.1 CIFS: Introduce SMB2 Kconfig option CIFS: Move add/set_credits and get_credits_field to ops structure CIFS: Move protocol specific demultiplex thread calls to ops struct CIFS: Move protocol specific part from cifs_readv_receive to ops struct CIFS: Move header_size/max_header_size to ops structure CIFS: Move protocol specific part from SendReceive2 to ops struct cifs: Include backup intent search flags during searches {try #2) CIFS: Separate protocol specific part from setlk CIFS: Separate protocol specific part from getlk CIFS: Separate protocol specific lock type handling CIFS: Convert lock type to 32 bit variable CIFS: Move locks to cifsFileInfo structure cifs: convert send_nt_cancel into a version specific op cifs: add a smb_version_operations/values structures and a smb_version enum cifs: remove the vers= and version= synonyms for ver= cifs: add warning about change in default cache semantics in 3.7 cifs: display cache= option in /proc/mounts ...	2012-05-29 12:42:10 -07:00
Linus Torvalds	53f2c4a8fd	NFS client updates for Linux 3.5 New features include: - Rewrite the O_DIRECT code so that it can share the same coalescing and pNFS functionality as the page cache code. - Allow the server to provide hints as to when we should use pNFS, and when it is more efficient to read and write through the metadata server. - NFS cache consistency updates: - Use the ctime to emulate a change attribute for NFSv2/v3 so that all NFS versions can share the same cache management code. - New cache management code will only look at the change attribute and size attribute when deciding whether or not our cached data is still valid or not. - Don't request NFSv4 post-op attributes on writes in cases such as O_DIRECT, where we don't care about data cache consistency, or when we have a write delegation, and know that our cache is still consistent. - Don't request NFSv4 post-op attributes on operations such as COMMIT, where there are no expected metadata updates. - Don't request NFSv4 directory post-op attributes in cases where the operations themselves already return change attribute updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and RENAME. - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS if we detect no attempts to lookup filenames. - Improve the code sharing between NFSv2/v3 and v4 mounts - NFSv4.1 state management efficiency improvements - More patches in preparation for NFSv4/v4.1 migration functionality. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw 79iQKFGkxPOSHx2yCdAF =suVW -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "New features include: - Rewrite the O_DIRECT code so that it can share the same coalescing and pNFS functionality as the page cache code. - Allow the server to provide hints as to when we should use pNFS, and when it is more efficient to read and write through the metadata server. - NFS cache consistency updates: * Use the ctime to emulate a change attribute for NFSv2/v3 so that all NFS versions can share the same cache management code. * New cache management code will only look at the change attribute and size attribute when deciding whether or not our cached data is still valid or not. * Don't request NFSv4 post-op attributes on writes in cases such as O_DIRECT, where we don't care about data cache consistency, or when we have a write delegation, and know that our cache is still consistent. * Don't request NFSv4 post-op attributes on operations such as COMMIT, where there are no expected metadata updates. * Don't request NFSv4 directory post-op attributes in cases where the operations themselves already return change attribute updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and RENAME. - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS if we detect no attempts to lookup filenames. - Improve the code sharing between NFSv2/v3 and v4 mounts - NFSv4.1 state management efficiency improvements - More patches in preparation for NFSv4/v4.1 migration functionality." Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache qstr name initialization changes (that made the length/hash a 64-bit union) * tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits) NFSv4: Add debugging printks to state manager NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager NFSv4.1: Handle errors in nfs4_bind_conn_to_session NFSv4.1: nfs4_bind_conn_to_session should drain the session NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid NFSv4.1: Add DESTROY_CLIENTID NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session NFSv4.1: Ensure we use the correct credentials for session create/destroy NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM NFSv4: Clean up the error handling for nfs4_reclaim_lease NFSv4.1: Exchange ID must use GFP_NOFS allocation mode nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN* nfs4.1: add BIND_CONN_TO_SESSION operation NFSv4.1 test the mdsthreshold hint parameters ...	2012-05-29 10:43:51 -07:00
Tao Ma	6f2e9f0e7d	ext4: protect group inode free counting with group lock Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 18:20:59 -04:00
Zheng Liu	8563000d3b	ext4: use consistent ssize_t type in ext4_file_write() The generic_file_aio_write() function returns ssize_t, and ext4_file_write() returns a ssize_t, so use a ssize_t to collect the return value from generic_file_aio_write(). It shouldn't matter since the VFS read/write paths shouldn't allow a read greater than MAX_INT, but there was previously a bug in the AIO code paths, and it's best if we use a consistent type so that the return value from generic_file_aio_write() can't get truncated. Reported-by: Jouni Siren <jouni.siren@iki.fi> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 18:06:51 -04:00
Zheng Liu	4a3c3a5120	ext4: fix format flag in ext4_ext_binsearch_idx() fix ext_debug format flag in ext4_ext_binsearch_idx(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 17:55:16 -04:00
Zheng Liu	400db9d301	ext4: cleanup in ext4_discard_allocated_blocks() remove 'len' variable in ext4_discard_allocated_blocks() because it is useless. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 17:53:53 -04:00
Theodore Ts'o	2cde417de0	ext4: return ENOMEM when mounts fail due to lack of memory This is a port of the ext3 commit: `4569cd1b0d` Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 17:49:54 -04:00
Theodore Ts'o	2716b80284	ext4: remove redundundant "(char ) bh->b_data" casts The b_data field of the buffer_head is already a char , so there's no point casting it to a char *. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 17:47:52 -04:00
Trond Myklebust	cc0a984368	NFSv4: Add debugging printks to state manager Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-28 17:21:57 -04:00
Trond Myklebust	fb13bfa7e1	NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO If a file OPEN is denied due to a share lock, the resulting NFS4ERR_SHARE_DENIED is currently mapped to the default EIO. This patch adds a more appropriate mapping, and brings Linux into line with what Solaris 10 does. See https://bugzilla.kernel.org/show_bug.cgi?id=43286 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-05-28 17:21:48 -04:00
Andreas Dilger	7e936b7372	ext4: disallow hard-linked directory in ext4_lookup A hard-linked directory to its parent can cause the VFS to deadlock, and is a sign of a corrupted file system. So detect this case in ext4_lookup(), before the rmdir() lockup scenario can take place. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-05-28 17:02:25 -04:00
Linus Torvalds	a01ee165a1	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd Pull exofs updates from Boaz Harrosh: "Just a couple of patches. The first is a BUG fix destined for stable which missed the 3.4-rc7 Kernel. The second is just a fixture addition so exofs is able to be better exported as a cluster file system via pNFS." * 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Add SYSFS info for autologin/pNFS export exofs: Fix CRASH on very early IO errors.	2012-05-28 13:10:41 -07:00
Haogang Chen	967ac8af44	ext4: fix potential integer overflow in alloc_flex_gd() In alloc_flex_gd(), when flexbg_size is large, kmalloc size would overflow and flex_gd->groups would point to a buffer smaller than expected, causing OOB accesses when it is used. Note that in ext4_resize_fs(), flexbg_size is calculated using sbi->s_log_groups_per_flex, which is read from the disk and only bounded to [1, 31]. The patch returns NULL for too large flexbg_size. Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-05-28 14:21:55 -04:00
Akira Fujita	9d99012ff2	ext4: remove needs_recovery in ext4_mb_init() needs_recovery in ext4_mb_init() is not used, remove it. Signed-off-by: Akira Fujita <a-fujita@rs.jp.ne.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 14:19:25 -04:00
Eric Sandeen	7e84b62164	ext4: force ro mount if ext4_setup_super() fails If ext4_setup_super() fails i.e. due to a too-high revision, the error is logged in dmesg but the fs is not mounted RO as indicated. Tested by: # mkfs.ext4 -r 4 /dev/sdb6 # mount /dev/sdb6 /mnt/test # dmesg \| grep "too high" [164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode # grep sdb6 /proc/mounts /dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-05-28 14:17:25 -04:00
Dan Carpenter	bb3d132a24	ext4: fix potential NULL dereference in ext4_free_inodes_counts() The ext4_get_group_desc() function returns NULL on error, and ext4_free_inodes_count() function dereferences it without checking. There is a check on the next line, but it's too late. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-05-28 14:16:57 -04:00
Linus Torvalds	90324cc1b1	avoid iput() from flusher thread -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF /9c4zxxekjHZnn2oooEa =oLXf -----END PGP SIGNATURE----- Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally	2012-05-28 09:54:45 -07:00
Trond Myklebust	359d7d1c97	NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE We're already invalidating the data cache, and setting the new change attribute. Since directories don't care about the i_size field, there is no need to be forcing any extra revalidation of the page cache. We do keep the NFS_INO_INVALID_ATTR flag, in order to force an attribute cache revalidation on stat() calls since we do not update the mtime and ctime fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-28 10:05:47 -04:00
Trond Myklebust	f2c1b5100d	NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error The results from a call to nfs4_proc_create_session() should always be fed into nfs4_handle_reclaim_lease_error, so that we can handle errors such as NFS4ERR_SEQ_MISORDERED correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-27 14:50:44 -04:00
Trond Myklebust	9f594791dd	NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION Let nfs4_schedule_session_recovery() handle the details of choosing between resetting the session, and other session related recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-27 14:33:07 -04:00
Trond Myklebust	7c5d725684	NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-27 14:33:07 -04:00
Trond Myklebust	bf674c8228	NFSv4.1: Handle errors in nfs4_bind_conn_to_session Ensure that we handle NFS4ERR_DELAY errors separately, and then let nfs4_recovery_handle_error() handle all other cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-27 14:32:06 -04:00
Trond Myklebust	43ac544cb3	NFSv4.1: nfs4_bind_conn_to_session should drain the session In order to avoid races with other RPC calls that end up setting the NFS4CLNT_BIND_CONN_TO_SESSION flag. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-27 14:00:08 -04:00
Darrick J. Wong	e93376c20b	ext4/jbd2: add metadata checksumming to the list of supported features Activate the metadata checksumming feature by adding it to ext4 and jbd2's lists of supported features. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:12:42 -04:00
Darrick J. Wong	c390087591	jbd2: checksum data blocks that are stored in the journal Calculate and verify checksums of each data block being stored in the journal. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:12:12 -04:00
Darrick J. Wong	1f56c5890e	jbd2: checksum commit blocks Calculate and verify the checksum of commit blocks. In checksum v2, deprecate most of the checksum v1 commit block checksum fields, since each block has its own checksum. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:10:25 -04:00
Darrick J. Wong	3caa487f53	jbd2: checksum descriptor blocks Calculate and verify a checksum of each descriptor block. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:10:22 -04:00
Darrick J. Wong	42a7106de6	jbd2: checksum revocation blocks Compute and verify revoke blocks inside the journal. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:08:24 -04:00
Darrick J. Wong	4fd5ea43bc	jbd2: checksum journal superblock Calculate and verify a checksum covering the journal superblock. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 08:08:22 -04:00
Darrick J. Wong	01b5adcebb	jbd2: Grab a reference to the crc32c driver if necessary Obtain a reference to the crc32c driver if needed for the v2 checksum. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 07:50:56 -04:00
Darrick J. Wong	25ed6e8a54	jbd2: enable journal clients to enable v2 checksumming Add in the necessary code so that journal clients can enable the new journal checksumming features. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-27 07:48:56 -04:00
Linus Torvalds	36126f8f2e	word-at-a-time: make the interfaces truly generic This changes the interfaces in <asm/word-at-a-time.h> to be a bit more complicated, but a lot more generic. In particular, it allows us to really do the operations efficiently on both little-endian and big-endian machines, pretty much regardless of machine details. For example, if you can rely on a fast population count instruction on your architecture, this will allow you to make your optimized <asm/word-at-a-time.h> file with that. NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is not truly generic, it actually only works on big-endian. Why? Because on little-endian the generic algorithms are wasteful, since you can inevitably do better. The x86 implementation is an example of that. (The only truly non-generic part of the asm-generic implementation is the "find_zero()" function, and you could make a little-endian version of it. And if the Kbuild infrastructure allowed us to pick a particular header file, that would be lovely) The <asm/word-at-a-time.h> functions are as follows: - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm uses. - has_zero(): take a word, and determine if it has a zero byte in it. It gets the word, the pointer to the constant pool, and a pointer to an intermediate "data" field it can set. This is the "quick-and-dirty" zero tester: it's what is run inside the hot loops. - "prep_zero_mask()": take the word, the data that has_zero() produced, and the constant pool, and generate an exact mask of which byte had the first zero. This is run directly outside the loop, and allows the "has_zero()" function to answer the "is there a zero byte" question without necessarily getting exactly which byte is the first one to contain a zero. If you do multiple byte lookups concurrently (eg "hash_name()", which looks for both NUL and '/' bytes), after you've done the prep_zero_mask() phase, the result of those can be or'ed together to get the "either or" case. - The result from "prep_zero_mask()" can then be fed into "find_zero()" (to find the byte offset of the first byte that was zero) or into "zero_bytemask()" (to find the bytemask of the bytes preceding the zero byte). The existence of zero_bytemask() is optional, and is not necessary for the normal string routines. But dentry name hashing needs it, so if you enable DENTRY_WORD_AT_A_TIME you need to expose it. This changes the generic strncpy_from_user() function and the dentry hashing functions to use these modified word-at-a-time interfaces. This gets us back to the optimized state of the x86 strncpy that we lost in the previous commit when moving over to the generic version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-26 11:33:40 -07:00
Trond Myklebust	32b0131069	NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory supposed to already know the correct value of the seqid, in which case RFC5661 states that it should ignore the value returned. Also ensure that if the sanity check in nfs4_check_cl_exchange_flags fails, then we must not change the nfs_client fields. Finally, clean up the code: we don't need to retest the value of 'status' unless it can change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-26 14:17:31 -04:00
Trond Myklebust	6624553910	NFSv4.1: Add DESTROY_CLIENTID Ensure that we destroy our lease on last unmount Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-26 14:17:30 -04:00
Jan Schmidt	f29021b29a	Btrfs: add tree mod log to fs_info Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:54 +02:00
Jan Schmidt	815a51c74a	Btrfs: dummy extent buffers for tree mod log The tree modification log needs two ways to create dummy extent buffers, once by allocating a fresh one (to rebuild an old root) and once by cloning an existing one (to make private rewind modifications) to it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:54 +02:00
Jan Schmidt	64947ec0d1	Btrfs: move struct seq_list to ctree.h Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:53 +02:00
Jan Schmidt	5581a51a59	Btrfs: don't set for_cow parameter for tree block functions Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed parameter for_cow = 1. In fact, these two functions should never mark their tree modification operations as for_cow, because they can change the number of blocks referenced by a tree. Hence, we remove the extra for_cow parameter from these functions and make them pass a zero down. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:53 +02:00
Jan Schmidt	976b1908d9	Btrfs: look into the extent during find_all_leafs Before this patch we called find_all_leafs for a data extent, then called find_all_roots and then looked into the extent to grab the information we were seeking. This was done without holding the leaves locked to avoid deadlocks. However, this can obviouly race with concurrent tree modifications. Instead, we now look into the extent while we're holding the lock during find_all_leafs and store this information together with the leaf list. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:52 +02:00
Jan Schmidt	d5c88b735f	Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs The key we store with a tree block backref is only a hint. It is set when the ref is created and can remain correct for a long time. As the tree is rebalanced, however, eventually the key no longer points to the correct destination. With this patch, we change find_parent_nodes to no longer add keys unless it knows for sure they're correct (e.g. because they're for an extent data backref). Then when we later encounter a backref ref with no parent and no key set, we grab the block and take the first key from the block itself. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:51 +02:00
Jan Schmidt	dadcaf78b5	Btrfs: bugfix in btrfs_find_parent_nodes That one has been around since the addition of backref.c. Due to the way we calculate our slot numbers, after adding inline refs we're missing one keyed ref unless it's located at the beginning of a new leaf. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:51 +02:00
Jan Schmidt	cd1b413c5c	Btrfs: ulist realloc bugfix ulist_next gets the pointer to the previously returned element to find the next element from there. However, when we call ulist_add while iteration with ulist_next is in progress (ulist explicitly supports this), we can realloc the ulist internal memory, which makes the pointer to the previous element useless. Instead, we now use an iterator parameter that's independent from the internal pointers. Reported-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-05-26 12:17:49 +02:00
Trond Myklebust	2cf047c994	NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com>	2012-05-25 18:02:10 -04:00
Trond Myklebust	848f5bda54	NFSv4.1: Ensure we use the correct credentials for session create/destroy Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-25 18:02:09 -04:00
Trond Myklebust	ad24ecfbcd	NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations For backward compatibility with nfs-utils. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com>	2012-05-25 18:02:09 -04:00
Trond Myklebust	89a217360e	NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier" is tickling a Linux nfs server bug, and causing a regression: the server can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED to our CREATE_SESSION request even when we are sending the correct sequence ID. Fix this by purging the lease and then retrying. Reported-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-25 16:17:42 -04:00
Trond Myklebust	be0bfed002	NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM Otherwise we can end up not sending a new exchange-id/setclientid Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-25 16:17:13 -04:00
Trond Myklebust	2a6ee6aa2f	NFSv4: Clean up the error handling for nfs4_reclaim_lease Try to consolidate the error handling for nfs4_reclaim_lease into a single function instead of doing a bit here, and a bit there... Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-25 16:17:13 -04:00
Linus Torvalds	ece78b7df7	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, ext3 and quota fixes from Jan Kara: "Interesting bits are: - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since quota code does not need i_mutex anymore in any unusual way. - backport (from ext4) of a fix of a checkpointing bug (missing cache flush) that could lead to fs corruption on power failure The rest are just random small fixes & cleanups." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: trivial fix to comment for ext2_free_blocks ext2: remove the redundant comment for ext2_export_ops ext3: return 32/64-bit dir name hash according to usage type quota: Get rid of nested I_MUTEX_QUOTA locking subclass quota: Use precomputed value of sb_dqopt in dquot_quota_sync ext2: Remove i_mutex use from ext2_quota_write() reiserfs: Remove i_mutex use from reiserfs_quota_write() ext4: Remove i_mutex use from ext4_quota_write() ext3: Remove i_mutex use from ext3_quota_write() quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG jbd: Write journal superblock with WRITE_FUA after checkpointing jbd: protect all log tail updates with j_checkpoint_mutex jbd: Split updating of journal superblock and marking journal empty ext2: do not register write_super within VFS ext2: Remove s_dirt handling ext2: write superblock only once on unmount ext3: update documentation with barrier=1 default ext3: remove max_debt in find_group_orlov() jbd: Refine commit writeout logic	2012-05-25 08:14:59 -07:00
Linus Torvalds	b5f4035adf	Features: * Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly. * Fix self-ballooning code, and balloon logic when booting as initial domain. * Move array printing code to generic debugfs * Support XenBus domains. * Lazily free grants when a domain is dead/non-existent. * In M2P code use batching calls Bug-fixes: * Fix NULL dereference in allocation failure path (hvc_xen) * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2 /Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da 6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8= =uyRr -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: * Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly. * Fix self-ballooning code, and balloon logic when booting as initial domain. * Move array printing code to generic debugfs * Support XenBus domains. * Lazily free grants when a domain is dead/non-existent. * In M2P code use batching calls Bug-fixes: * Fix NULL dereference in allocation failure path (hvc_xen) * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one." Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of apic ipi interface next to the new apic_id functions. * tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: do not map the same GSI twice in PVHVM guests. hvc_xen: NULL dereference on allocation failure xen: Add selfballoning memory reservation tunable. xenbus: Add support for xenbus backend in stub domain xen/smp: unbind irqworkX when unplugging vCPUs. xen: enter/exit lazy_mmu_mode around m2p_override calls xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep xen: implement IRQ_WORK_VECTOR handler xen: implement apic ipi interface xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/gnttab: add deferred freeing logic debugfs: Add support to print u32 array in debugfs xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.	2012-05-24 16:02:08 -07:00
Linus Torvalds	ce004178be	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull sparc changes from David S. Miller: "This has the generic strncpy_from_user() implementation architectures can now use, which we've been developing on linux-arch over the past few days. For good measure I ran both a 32-bit and a 64-bit glibc testsuite run, and the latter of which pointed out an adjustment I needed to make to sparc's user_addr_max() definition. Linus, you were right, STACK_TOP was not the right thing to use, even on sparc itself :-) From Sam Ravnborg, we have a conversion of sparc32 over to the common alloc_thread_info_node(), since the aspect which originally blocked our doing so (sun4c) has been removed." Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Fix user_addr_max() definition. lib: Sparc's strncpy_from_user is generic enough, move under lib/ kernel: Move REPEAT_BYTE definition into linux/kernel.h sparc: Increase portability of strncpy_from_user() implementation. sparc: Optimize strncpy_from_user() zero byte search. sparc: Add full proper error handling to strncpy_from_user(). sparc32: use the common implementation of alloc_thread_info_node()	2012-05-24 15:10:28 -07:00
Linus Torvalds	9978306e31	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs Pull XFS update from Ben Myers: - Removal of xfsbufd - Background CIL flushes have been moved to a workqueue. - Fix to xfs_check_page_type applicable to filesystems where blocksize < page size - Fix for stale data exposure when extsize hints are used. - A series of xfs_buf cache cleanups. - Fix for XFS_IOC_ALLOCSP - Cleanups for includes and removal of xfs_lrw.[ch]. - Moved all busy extent handling to it's own file so that it is easier to merge with userspace. - Fix for log mount failure. - Fix to enable inode reclaim during quotacheck at mount time. - Fix for delalloc quota accounting. - Fix for memory reclaim deadlock on agi buffer. - Fixes for failed writes and to clean up stale delalloc blocks. - Fix to use GFP_NOFS in blkdev_issue_flush - SEEK_DATA/SEEK_HOLE support * 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits) xfs: add trace points for log forces xfs: fix memory reclaim deadlock on agi buffer xfs: fix delalloc quota accounting on failure xfs: protect xfs_sync_worker with s_umount semaphore xfs: introduce SEEK_DATA/SEEK_HOLE support xfs: make xfs_extent_busy_trim not static xfs: make XBF_MAPPED the default behaviour xfs: flush outstanding buffers on log mount failure xfs: Properly exclude IO type flags from buffer flags xfs: clean up xfs_bit.h includes xfs: move xfs_do_force_shutdown() and kill xfs_rw.c xfs: move xfs_get_extsz_hint() and kill xfs_rw.h xfs: move xfs_fsb_to_db to xfs_bmap.h xfs: clean up busy extent naming xfs: move busy extent handling to it's own file xfs: move xfsagino_t to xfs_types.h xfs: use iolock on XFS_IOC_ALLOCSP calls xfs: kill XBF_DONTBLOCK xfs: kill xfs_read_buf() xfs: kill XBF_LOCK ...	2012-05-24 14:14:46 -07:00
Trond Myklebust	bbafffd293	NFSv4.1: Exchange ID must use GFP_NOFS allocation mode Exchange ID can be called in a lease reclaim situation, so it will deadlock if it then tries to write out dirty NFS pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:31:39 -04:00
Weston Andros Adamson	a9e64442f1	nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN* The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a BIND_CONN_TO_SESSION instead of destroying the session and creating a new one. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:26:21 -04:00
Weston Andros Adamson	7c44f1ae4a	nfs4.1: add BIND_CONN_TO_SESSION operation This patch adds the BIND_CONN_TO_SESSION operation which is needed for upcoming SP4_MACH_CRED work and useful for recovering from broken connections without destroying the session. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:22:19 -04:00
Andy Adamson	d23d61c8d3	NFSv4.1 test the mdsthreshold hint parameters Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:15:49 -04:00
Andy Adamson	2701d086db	NFSv4.1 add nfs_inode book keeping for mdsthreshold Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:15:48 -04:00
Andy Adamson	82be417aa3	NFSv4.1 cache mdsthreshold values on OPEN Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:15:48 -04:00
Andy Adamson	88034c3d88	NFSv4.1 mdsthreshold attribute xdr We only support one layout type per file system, so one threshold_item4 per mdsthreshold4. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-24 16:15:47 -04:00
David S. Miller	446969084d	kernel: Move REPEAT_BYTE definition into linux/kernel.h And make sure that everything using it explicitly includes that header file. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-24 13:10:05 -07:00
Linus Torvalds	28f3d71761	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull more networking updates from David Miller: "Ok, everything from here on out will be bug fixes." 1) One final sync of wireless and bluetooth stuff from John Linville. These changes have all been in his tree for more than a week, and therefore have had the necessary -next exposure. John was just away on a trip and didn't have a change to send the pull request until a day or two ago. 2) Put back some defines in user exposed header file areas that were removed during the tokenring purge. From Stephen Hemminger and Paul Gortmaker. 3) A bug fix for UDP hash table allocation got lost in the pile due to one of those "you got it.. no I've got it.." situations. :-) From Tim Bird. 4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll try to coalesce overlapping frags and crash. Fix from Eric Dumazet. 5) RCU routing table lookups can race with free_fib_info(), causing crashes when we deref the device pointers in the route. Fix by releasing the net device in the RCU callback. From Yanmin Zhang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits) tcp: take care of overlaps in tcp_try_coalesce() ipv4: fix the rcu race between free_fib_info and ip_route_output_slow mm: add a low limit to alloc_large_system_hash ipx: restore token ring define to include/linux/ipx.h if: restore token ring ARP type to header xen: do not disable netfront in dom0 phy/micrel: Fix ID of KSZ9021 mISDN: Add X-Tensions USB ISDN TA XC-525 gianfar:don't add FCB length to hard_header_len Bluetooth: Report proper error number in disconnection Bluetooth: Create flags for bt_sk() Bluetooth: report the right security level in getsockopt Bluetooth: Lock the L2CAP channel when sending Bluetooth: Restore locking semantics when looking up L2CAP channels Bluetooth: Fix a redundant and problematic incoming MTU check Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C Bluetooth: Fix EIR data generation for mgmt_device_found Bluetooth: Fix Inquiry with RSSI event mask Bluetooth: improve readability of l2cap_seq_list code Bluetooth: Fix skb length calculation ...	2012-05-24 11:54:29 -07:00
Tim Bird	31fe62b958	mm: add a low limit to alloc_large_system_hash UDP stack needs a minimum hash size value for proper operation and also uses alloc_large_system_hash() for proper NUMA distribution of its hash tables and automatic sizing depending on available system memory. On some low memory situations, udp_table_init() must ignore the alloc_large_system_hash() result and reallocs a bigger memory area. As we cannot easily free old hash table, we leak it and kmemleak can issue a warning. This patch adds a low limit parameter to alloc_large_system_hash() to solve this problem. We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table allocation. Reported-by: Mark Asselstine <mark.asselstine@windriver.com> Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-24 00:28:21 -04:00
Linus Torvalds	644473e9c6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...	2012-05-23 17:42:39 -07:00
Linus Torvalds	468f4d1a85	Power management updates for 3.5 * Implementation of opportunistic suspend (autosleep) and user space interface for manipulating wakeup sources. * Hibernate updates from Bojan Smojver and Minho Ban. * Updates of the runtime PM core and generic PM domains framework related to PM QoS. * Assorted fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJPu+jwAAoJEKhOf7ml8uNsOw0P/0w1FqXD64a1laE43JIlBe9w yHEcLHc9MXN+8lS0XQ6jFiL/VC3U5Sj7Ro+DFKcL2MWX//dfDcZcwA9ep/qh4tHV tJ987IijdWqJV14pde3xQafhp/9i12rArLxns7S5fzkdfVk0iDjhZZaZy4afFJYM SuCsDhCwWefZh89+oLikByiFPnhW+f2ZC9YQeokBM/XvZLtxmOiVfL6duloT/Cr+ 58jkrJ8xz/5kmmN4bXM4Wlpf9ZIYFXbvtbKrq3GZOXc+LpNKlWQyFgg/pIuxBewC uSgsNXXV0LFDi5JfER/8l9MMLtJwwc4VHzpLvMnRv+GtwO2/FKIIr9Fcv000IL2N 0/Ppr52M7XpRruM/k+YroUQ4F1oBX6HB4e3rwqC+XG6n5bwn/Jc7kdy7aUojqNLG Nlr5f0vBjLTSF66Jnel71Bn+gbA1ogER7E+esSTMpyX+RgGJAUVt5oX9IjbXl3PI bk8xW1csSRxBI2NkFOd9EM3vMzdGc5uu+iOoy7iBvcAK0AEfo2Ml9YuSVFQeqAu0 A96MUW155A+GKMC7I/LK8pTgMvYDedWhVW9uyXpMRjwdFC5/ywZU1aM00tL9HMpG pzHOFJgsYrf/6VCV8BwqgudRYd0K5EPSGeITCg973os/XzJIOCfJuy+Pn5V/F0ew lTbi8ipQD0Hh8A/Xt0QB =Q2vo -----END PGP SIGNATURE----- Merge tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: - Implementation of opportunistic suspend (autosleep) and user space interface for manipulating wakeup sources. - Hibernate updates from Bojan Smojver and Minho Ban. - Updates of the runtime PM core and generic PM domains framework related to PM QoS. - Assorted fixes. * tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) epoll: Fix user space breakage related to EPOLLWAKEUP PM / Domains: Make it possible to add devices to inactive domains PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format PM / Domains: Fix computation of maximum domain off time PM / Domains: Fix link checking when add subdomain PM / Sleep: User space wakeup sources garbage collector Kconfig option PM / Sleep: Make the limit of user space wakeup sources configurable PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo PM / Domains: Cache device stop and domain power off governor results, v3 PM / Domains: Make device removal more straightforward PM / Sleep: Fix a mistake in a conditional in autosleep_store() epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready PM / QoS: Create device constraints objects on notifier registration PM / Runtime: Remove device fields related to suspend time, v2 PM / Domains: Rework default domain power off governor function, v2 PM / Domains: Rework default device stop governor function, v2 PM / Sleep: Add user space interface for manipulating wakeup sources, v3 PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources PM / Sleep: Implement opportunistic sleep, v2 PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints ...	2012-05-23 14:07:06 -07:00
Trond Myklebust	54ac471c83	NFS: Add memory barriers to the nfs_client->cl_cons_state initialisation Ensure that a process that uses the nfs_client->cl_cons_state test for whether the initialisation process is finished does not read stale data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-23 15:24:59 -04:00
Trond Myklebust	4697bd5e94	NFSv4: Fix a race in the net namespace mount notification Since the struct nfs_client gets added to the global nfs_client_list before it is initialised, it is possible that rpc_pipefs_event can end up trying to create idmapper entries on such a thing. The solution is to have the mount notification wait for the initialisation of each nfs_client to complete, and then to skip any entries for which the it failed. Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>	2012-05-23 15:21:13 -04:00
Trond Myklebust	7b38c3682c	NFSv4.1: Fix session initialisation races Session initialisation is not complete until the lease manager has run. We need to ensure that both nfs4_init_session and nfs4_init_ds_session do so, and that they check for any resulting errors in clp->cl_cons_state. Only after this is done, can nfs4_ds_connect check the contents of clp->cl_exchange_flags. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com>	2012-05-23 15:20:57 -04:00
Linus Torvalds	ec0d7f18ab	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull fpu state cleanups from Ingo Molnar: "This tree streamlines further aspects of FPU handling by eliminating the prepare_to_copy() complication and moving that logic to arch_dup_task_struct(). It also fixes the FPU dumps in threaded core dumps, removes and old (and now invalid) assumption plus micro-optimizes the exit path by avoiding an FPU save for dead tasks." Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came in because we now do the FPU handling in arch_dup_task_struct() rather than the legacy (and now gone) prepare_to_copy(). * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: drop the fpu state during thread exit x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() coredump: ensure the fpu state is flushed for proper multi-threaded core dump fork: move the real prepare_to_copy() users to arch_dup_task_struct()	2012-05-23 10:59:07 -07:00
Shirish Pargaonkar	2c0c2a08be	cifs: fix oops while traversing open file list (try #4 ) While traversing the linked list of open file handles, if the identfied file handle is invalid, a reopen is attempted and if it fails, we resume traversing where we stopped and cifs can oops while accessing invalid next element, for list might have changed. So mark the invalid file handle and attempt reopen if no valid file handle is found in rest of the list. If reopen fails, move the invalid file handle to the end of the list and start traversing the list again from the begining. Repeat this four times before giving up and returning an error if file reopen keeps failing. Cc: <stable@vger.kernel.org> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:18 +04:00
Sedat Dilek	ea4b574028	cifs: Fix comment as d_alloc_root() is replaced by d_make_root() For more details see <file: Documentation/filesystems/porting>. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:16 +04:00
Steve French	1080ef758f	CIFS: Introduce SMB2 mounts as vers=2.1 As with Linux nfs client, which uses "nfsvers=" or "vers=" to indicate which protocol to use for mount, specifying "vers=2.1" will force an SMB2 mount. When vers is not specified CIFS is used "vers=1" We can eventually autonegotiate down from SMB2 to CIFS when SMB2 is stable enough to make it the default, but this is for the future. At that time we could also implement a "maxprotocol" mount option as smbclient and Samba have today, but that would be premature until SMB2 is stable. Intially the SMB2 Kconfig option will depend on "BROKEN" until the merge is complete, and then be "EXPERIMENTAL" When it is no longer experimental we can consider changing the default protocol to attempt first. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:15 +04:00
Steve French	675f36fb1d	CIFS: Introduce SMB2 Kconfig option SMB2 is the followon to the CIFS (and SMB) protocols and the default for Windows since Windows Vista, and also now implemented by various non-Windows servers. SMB2 is more secure, has various performance advantages, including larger i/o sizes, flow control, better caching model and more. SMB2 also resolves some scalability limits in the CIFS protocol and adds many new features while being much simpler (only a few dozen commands instead of hundreds) and since the protocol is clearer it is also more consistently implemented across servers and thus easier to optimize. After much discussion with Jeff Layton, Jeremy Allison and others at Connectathon, we decided to move the SMB2 code from a distinct .ko and fstype into distinct C files that optionally build in cifs.ko. As a result the Kconfig gets simpler. To avoid destabilizing CIFS, the SMB2 code is going to be moved into its own experimental CONFIG_CIFS_SMB2 ifdef as it is merged and rereviewed. The changes to stable CIFS (builds with the SMB2 ifdef off) are expected to be fairly small. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:14 +04:00
Pavel Shilovsky	452757897a	CIFS: Move add/set_credits and get_credits_field to ops structure Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:12 +04:00
Pavel Shilovsky	8aa26f3ed8	CIFS: Move protocol specific demultiplex thread calls to ops struct Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:11 +04:00
Pavel Shilovsky	eb37871118	CIFS: Move protocol specific part from cifs_readv_receive to ops struct Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:09 +04:00
Pavel Shilovsky	1887f60103	CIFS: Move header_size/max_header_size to ops structure Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:33:08 +04:00
Pavel Shilovsky	082d0642c6	CIFS: Move protocol specific part from SendReceive2 to ops struct Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-23 12:32:57 +04:00
Darrick J. Wong	8f888ef846	jbd2: change disk layout for metadata checksumming Define flags and allocate space in on-disk journal structures to support checksumming of journal metadata. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-22 22:43:41 -04:00
Linus Torvalds	6101167727	dlm for 3.5 This set includes some minor fixes and improvements. The one large patch addresses the special "nodir" mode, which has been a long neglected proof of concept, but with these fixes seems to be quite usable. It allows the resource master to be assigned statically instead of dynamically, which can improve performance if there is little locality and most resources are shared. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPu/MlAAoJEDgbc8f8gGmq860P/0o+tYG2pAUz87WnKg92cGwm ajaI78ydY6qOjndcEjbgdX6uWqVQ7f/OKo3drzVH8KFQ67eiaXC4wv2xTL3aymbX 2Ua55oiVsW+k9d9yK5Dzfa4qAlR5QPV1WEAnoVkiEDNoiGCGecjmVebhK1/Sb5Lu 1gaIJ3C+3L1ngfAzpfeB+7LwuVB36UlIyBrvPOj6yWiSDgpPaVbTrEU0NaDDDDIi oo7tTiqivCZf/GH+ZcIjPE/LBen/lVqXSDU2YShiac/ErRfpRk9rnDFIUeN2nYPd JwPjzutFWM+N6HIA2RCBXKo7FkK2rvYXw84/RVMvA4goEH/Qu8yDtBww20BmvFYY 3guU1udka0/NR7/ap98Btdqsvqco6R2X/rpzx8y1eD1jzUvb6El6yg3PM1Qvd8zQ 72aVzcdgAI4qtEAVziy5X4omNeQ6a55sUYXlCcvkiwZJQdPzkDuzntC28q3bgJva QD0ugX7ltBpHuZZZb2tbBN9hfMqyo7gneaY2OoGVCTb1U9ibb5JgfZOswTC2gQsE 17vykdL5owQ8bbBj2tkRQiJ8dZoxn23hV+sZrvLm3TR8xF4oJtDqUdRs9K7iX8It YxTTCL1LmxHRFG/0Cy2l7VhoqkIKsoVFdavW7pivFNkzp/yQNHk4r2iJWhR9YArV qaE2HqIxJsev/B/lBPyo =mHOh -----END PGP SIGNATURE----- Merge tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes some minor fixes and improvements. The one large patch addresses the special "nodir" mode, which has been a long neglected proof of concept, but with these fixes seems to be quite usable. It allows the resource master to be assigned statically instead of dynamically, which can improve performance if there is little locality and most resources are shared." * tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: NULL dereference on failure in kmem_cache_create() gfs2: fix recovery during unmount dlm: fixes for nodir mode dlm: improve error and debug messages dlm: avoid unnecessary search in search_rsb dlm: limit rcom debug messages dlm: fix waiter recovery dlm: prevent connections during shutdown	2012-05-22 19:31:38 -07:00
Linus Torvalds	6133308ad1	UBIFS: * Always support xattrs (remove the Kconfig option) * Always support debugging (remove the Kconfig option) * A fix for a memory leak on error path * A number of clean-ups UBI: * Always support debugging (remove the Kconfig option) * Remove "data type" hint support * Huge amount of renames to prepare for the fastmap wor * A lot of clean-ups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPuzTxAAoJECmIfjd9wqK0D/AP/iNOnYWnYZmmO18jDM48kKt/ Jp7VTAE0l7DBUDxtiIthq4c7YxIE1o1bN9gMmvzZibvwIrZcoAOnQpeL96s1Bc9J t0aGm8ONvrtuyFeyxPC0aplWgqWQ49qDLGV/lIVJ+BSGmXMeU4giUIXqbsjyCPR4 YVJJw6rLTC10EhuAUs99keJxxuN5ZMrCB8y47fD+bkalVxgqNh9JNkKabyjevt5C AERVWnP20hnEcwnbQWMHueGWiaqFeesTytNOy6heRi0uL3bNy5nrol7AFXKqnDc9 OpSkApH6SCO3C8X/bIep2bL9kKiW1LpClxgDIF6p7lj2t2ToPn6PZJbP60zSHQPb 0bgy1SzHccF3ihIMgCdOXYZ5EomBgKZyDyU6Ec+gAttE00ZbIigNmjFmukwMhO89 I0bGvjQdKFAFSzo+ffm8xNfYjmmNfB+edLkPaVttjMWAbQ4V831ZPDT07Q11W4TQ 2p2NDKTps3etbtkemZ/Cm1jeEWI3KuogrFhyDhpcgXc7pxlJbvMg+tt22FusoQ8T VPGGT+WhmXfF0ZG/gurI69k8opj4BUhm4EfGL6pGEoUMe1nGp2pSUNv5Kwby1wau 1wElJt2qO9xdjJ4QlLc+Ux1vm8rCS1iQst9plUX1BZt2bKja7tZaW7uu4hGKqe5u UwrosuYcmS1Ei1Rs7Sqz =+6Qi -----END PGP SIGNATURE----- Merge tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs Pull UBI and UBIFS updates from Artem Bityutskiy: UBIFS: * Always support xattrs (remove the Kconfig option) * Always support debugging (remove the Kconfig option) * A fix for a memory leak on error path * A number of clean-ups UBI: * Always support debugging (remove the Kconfig option) * Remove "data type" hint support * Huge amount of renames to prepare for the fastmap wor * A lot of clean-ups * tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs: (54 commits) UBI: modify ubi_wl_flush function to clear work queue for a lnum UBI: introduce UBI_ALL constant UBI: add lnum and vol_id to struct ubi_work UBI: add volume id struct ubi_ainf_peb UBI: add in hex the value for UBI_INTERNAL_VOL_START to comment UBI: rename scan.c to attach.c UBI: remove scan.h UBI: rename UBI_SCAN_UNKNOWN_EC UBI: move and rename attach_by_scanning UBI: rename _init_scan functions UBI: amend comments after all the renamings UBI: rename ubi_scan_leb_slab UBI: rename ubi_scan_move_to_list UBI: rename ubi_scan_destroy_ai UBI: rename ubi_scan_get_free_peb UBI: rename ubi_scan_rm_volume UBI: rename ubi_scan_find_av UBI: rename ubi_scan_add_used UBI: remove unused function UBI: make ubi_scan_erase_peb static and rename ...	2012-05-22 19:30:27 -07:00
Linus Torvalds	e8650a0823	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "As usual, it's mostly typo fixes, redundant code elimination and some documentation updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits) edac, mips: don't change code that has been removed in edac/mips tree xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer lib: Change mail address of Oskar Schirmer net: Change mail address of Oskar Schirmer arm/m68k: Change mail address of Sebastian Hess i2c: Change mail address of Oskar Schirmer net: Fix tcp_build_and_update_options comment in struct tcp_sock atomic64_32.h: fix parameter naming mismatch Kconfig: replace "--- help ---" with "---help---" c2port: fix bogus Kconfig "default no" edac: Fix spelling errors. qla1280: Remove redundant NULL check before release_firmware() call remoteproc: remove redundant NULL check before release_firmware() qla2xxx: Remove redundant NULL check before release_firmware() call. aic94xx: Get rid of redundant NULL check before release_firmware() call tehuti: delete redundant NULL check before release_firmware() qlogic: get rid of a redundant test for NULL before call to release_firmware() bna: remove redundant NULL test before release_firmware() tg3: remove redundant NULL test before release_firmware() call typhoon: get rid of redundant conditional before all to release_firmware() ...	2012-05-22 19:22:50 -07:00
Linus Torvalds	fb09bafda6	Staging tree pull request for 3.5-rc1 Here is the big staging tree pull request for the 3.5-rc1 merge window. Loads of changes here, and we just narrowly added more lines than we added: 622 files changed, 28356 insertions(+), 26059 deletions(-) But, good news is that there is a number of subsystems that moved out of the staging tree, to their respective "real" portions of the kernel. Code that moved out was: - iio core code - mei driver - vme core and bridge drivers There was one broken network driver that moved into staging as a step before it is removed from the tree (pc300), and there was a few new drivers added to the tree: - new iio drivers - gdm72xx wimax USB driver - ipack subsystem and 2 drivers All of the movements around have acks from the various subsystem maintainers, and all of this has been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk+7q8MACgkQMUfUDdst+ymjogCguo8fANFVlPWeZGeoBTL+aQfQ yTkAoLE0codmh+2SvhulYgyU1Wh6ZDK2 =nJ2F -----END PGP SIGNATURE----- Merge tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging tree changes from Greg Kroah-Hartman: "Here is the big staging tree pull request for the 3.5-rc1 merge window. Loads of changes here, and we just narrowly added more lines than we added: 622 files changed, 28356 insertions(+), 26059 deletions(-) But, good news is that there is a number of subsystems that moved out of the staging tree, to their respective "real" portions of the kernel. Code that moved out was: - iio core code - mei driver - vme core and bridge drivers There was one broken network driver that moved into staging as a step before it is removed from the tree (pc300), and there was a few new drivers added to the tree: - new iio drivers - gdm72xx wimax USB driver - ipack subsystem and 2 drivers All of the movements around have acks from the various subsystem maintainers, and all of this has been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up various trivial conflicts, along with a non-trivial one found in -next and pointed out by Olof Johanssen: a clean - but incorrect - merge of the arch/arm/boot/dts/at91sam9g20.dtsi file. Fix up manually as per Stephen Rothwell. * tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (536 commits) Staging: bcm: Remove two unused variables from Adapter.h Staging: bcm: Removes the volatile type definition from Adapter.h Staging: bcm: Rename all "INT" to "int" in Adapter.h Staging: bcm: Fix warning: __packed vs. __attribute__((packed)) in Adapter.h Staging: bcm: Correctly format all comments in Adapter.h Staging: bcm: Fix all whitespace issues in Adapter.h Staging: bcm: Properly format braces in Adapter.h Staging: ipack/bridges/tpci200: remove unneeded casts Staging: ipack/bridges/tpci200: remove TPCI200_SHORTNAME constant Staging: ipack: remove board_name and bus_name fields from struct ipack_device Staging: ipack: improve the register of a bus and a device in the bus. staging: comedi: cleanup all the comedi_driver 'detach' functions staging: comedi: remove all 'default N' in Kconfig staging: line6/config.h: Delete unused header staging: gdm72xx depends on NET staging: gdm72xx: Set up parent link in sysfs for gdm72xx devices staging: drm/omap: initial dmabuf/prime import support staging: drm/omap: dmabuf/prime mmap support pstore/ram: Add ECC support pstore/ram: Switch to persistent_ram routines ...	2012-05-22 16:34:21 -07:00
Linus Torvalds	5d4e2d08e7	Driver core pull for 3.5-rc1 Here's the driver core, and other driver subsystems, pull request for the 3.5-rc1 merge window. Outside of a few minor driver core changes, we ended up with the following different subsystem and core changes as well, due to interdependancies on the driver core: - hyperv driver updates - drivers/memory being created and some drivers moved into it - extcon driver subsystem created out of the old Android staging switch driver code - dynamic debug updates - printk rework, and /dev/kmsg changes All of this has been tested in the linux-next releases for a few weeks with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk+7q28ACgkQMUfUDdst+ykXmwCfcPASzC+/bDkuqdWsqzxlWZ7+ VOQAnAriySv397St36J6Hz5bMQZwB1Yq =SQc+ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg Kroah-Hartman: "Here's the driver core, and other driver subsystems, pull request for the 3.5-rc1 merge window. Outside of a few minor driver core changes, we ended up with the following different subsystem and core changes as well, due to interdependancies on the driver core: - hyperv driver updates - drivers/memory being created and some drivers moved into it - extcon driver subsystem created out of the old Android staging switch driver code - dynamic debug updates - printk rework, and /dev/kmsg changes All of this has been tested in the linux-next releases for a few weeks with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fix up conflicts in drivers/extcon/extcon-max8997.c where git noticed that a patch to the deleted drivers/misc/max8997-muic.c driver needs to be applied to this one. * tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (90 commits) uio_pdrv_genirq: get irq through platform resource if not set otherwise memory: tegra{20,30}-mc: Remove empty _remove() printk() - isolate KERN_CONT users from ordinary complete lines sysfs: get rid of some lockdep false positives Drivers: hv: util: Properly handle version negotiations. Drivers: hv: Get rid of an unnecessary check in vmbus_prep_negotiate_resp() memory: tegra{20,30}-mc: Use dev_err_ratelimited() driver core: Add dev__ratelimited() family Driver Core: don't oops with unregistered driver in driver_find_device() printk() - restore prefix/timestamp printing for multi-newline strings printk: add stub for prepend_timestamp() ARM: tegra30: Make MC optional in Kconfig ARM: tegra20: Make MC optional in Kconfig ARM: tegra30: MC: Remove unnecessary BUG() ARM: tegra20: MC: Remove unnecessary BUG() printk: correctly align __log_buf ARM: tegra30: Add Tegra Memory Controller(MC) driver ARM: tegra20: Add Tegra Memory Controller(MC) driver printk() - restore timestamp printing at console output printk() - do not merge continuation lines of different threads ...	2012-05-22 16:02:13 -07:00
Chuck Lever	acdeb69d9c	NFS: EXCHANGE_ID should save the server major and minor ID Save the server major and minor ID results from EXCHANGE_ID, as they are needed for detecting server trunking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:48 -04:00
Chuck Lever	4bf590e08f	NFS: Add nfs_client behavior flags "noresvport" and "discrtry" can be passed to nfs_create_rpc_client() by setting flags in the passed-in nfs_client. This change makes it easy to add new flags. Note that these settings are now "sticky" over the lifetime of a struct nfs_client, and may even be copied when an nfs_client is cloned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:47 -04:00
Chuck Lever	8cab4c390b	NFS: Refactor nfs_get_client(): initialize nfs_client Clean up: Continue to rationalize the locking in nfs_get_client() by moving the logic that handles the case where a matching server IP address is not found. When we support server trunking detection, client initialization may return a different nfs_client struct than was passed to it. Change the synopsis of the init_client methods to return an nfs_client. The client initialization logic in nfs_get_client() is not much more than a wrapper around ->init_client. It's simpler to keep the little bits of error handling in the version-specific init_client methods. No behavior change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:47 -04:00
Chuck Lever	f411703adc	NFS: Refactor nfs_get_client(): add nfs_found_client() Clean up: Code that takes and releases nfs_client_lock remains in nfs_get_client(). Logic that handles a pre-existing nfs_client is moved to a separate function. No behavior change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:46 -04:00
Chuck Lever	f092075dd3	NFS: Always use the same SETCLIENTID boot verifier Currently our NFS client assigns a unique SETCLIENTID boot verifier for each server IP address it knows about. It's set to CURRENT_TIME when the struct nfs_client for that server IP is created. During the SETCLIENTID operation, our client also presents an nfs_client_id4 string to servers, as an identifier on which the server can hang all of this client's NFSv4 state. Our client's nfs_client_id4 string is unique for each server IP address. An NFSv4 server is obligated to wipe all NFSv4 state associated with an nfs_client_id4 string when the client presents the same nfs_client_id4 string along with a changed SETCLIENTID boot verifier. When our client unmounts the last of a server's shares, it destroys that server's struct nfs_client. The next time the client mounts that NFS server, it creates a fresh struct nfs_client with a fresh boot verifier. On seeing the fresh verifer, the server wipes any previous NFSv4 state associated with that nfs_client_id4. However, NFSv4.1 clients are supposed to present the same nfs_client_id4 string to all servers. And, to support Transparent State Migration, the same nfs_client_id4 string should be presented to all NFSv4.0 servers so they recognize that migrated state for this client belongs with state a server may already have for this client. (This is known as the Uniform Client String model). If the nfs_client_id4 string is the same but the boot verifier changes for each server IP address, SETCLIENTID and EXCHANGE_ID operations from such a client could unintentionally result in a server wiping a client's previously obtained lease. Thus, if our NFS client is going to use a fixed nfs_client_id4 string, either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a boot verifier that does not change depending on server IP address. Replace our current per-nfs_client boot verifier with a per-nfs_net boot verifier. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:46 -04:00
Chuck Lever	2c820d9a97	NFS: Force server to drop NFSv4 state nfs4_reset_all_state() refreshes the boot verifier a server sees to trigger that server to wipe this client's state. This function is invoked when an NFSv4.1 server reports that it has revoked some or all of a client's NFSv4 state. To facilitate server trunking discovery, we will eventually want to move the cl_boot_time field to a more global structure. The Uniform Client String model (and specifically, server trunking detection) requires that all servers see the same boot verifier until the client actually does reboot, and not a fresh verifier every time the client unmounts and remounts the server. Without the cl_boot_time field, however, nfs4_reset_all_state() will have to find some other way to force the server to purge the client's NFSv4 state. Because these verifiers are opaque (ie, the server doesn't know or care that they happen to be timestamps), we can force the server to wipe NFSv4 state by updating the boot verifier as we do now, then immediately afterwards establish a fresh client ID using the old boot verifier again. Hopefully there are no extra paranoid server implementations that keep track of the client's boot verifiers and prevent clients from reusing a previous one. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:45 -04:00
Chuck Lever	ce1c8fc12d	NFS: Remove nfs_unique_id Clean up: this structure is unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:45 -04:00
Chuck Lever	177313f149	NFS: Clean up return code checking in nfs4_proc_exchange_id() Clean up: update to use matching types in "if" expressions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:44 -04:00
Chuck Lever	73ea666c2b	NFS: Use proper naming conventions for the nfs_client.net field Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Introduced by commit `e50a7a1a` "NFS: make NFS client allocated per network namespace context," Tue Jan 10, 2012. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:44 -04:00
Chuck Lever	591555465e	NFS: Use proper naming conventions for nfs_client.impl_id field Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Additionally, for consistency, move the impl_id field into the NFSv4- specific part of the nfs_client, and free that memory in the logic that shuts down NFSv4 nfs_clients. Introduced by commit `7d2ed9ac` "NFSv4: parse and display server implementation ids," Fri Feb 17, 2012. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:43 -04:00
Chuck Lever	79d4e1f0d8	NFS: Use proper naming conventions for NFSv4.1 server scope fields Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Additionally, for consistency, move the scope field into the NFSv4- specific part of the nfs_client, and free that memory in the logic that shuts down NFSv4 nfs_clients. Introduced by commit `99fe60d0` "nfs41: exchange_id operation", April 1 2009. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:43 -04:00
Chuck Lever	e3c0fb7ef5	NFS: Add NFSDBG_STATE fs/nfs/nfs4state.c does not yet have any dprintk() call sites, and I'm about to introduce some. We will need a new flag for enabling them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:42 -04:00
Chuck Lever	c3607282b4	NFS: Don't swap bytes in nfs4_construct_boot_verifier() The SETCLIENTID boot verifier is opaque to NFSv4 servers, thus there is no requirement for byte swapping before the client puts the verifier on the wire. This treatment is similar to other timestamp-based verifiers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:45:42 -04:00
Bryan Schumaker	497826af60	NFS: Fix compiler warnings The "struct inode inode" was only used in a dprintk, so compiling with CONFIG_SUNRPC_DEBUG off triggers a warning. To get around this, I remove the "struct inode inode" variable and instead change the dprintk()s to use hdr->inode instead. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:43:04 -04:00
Andy Adamson	bd4aeffb5b	NFSv4.1 skip rpc_call_done only on disconnected DS slot_table_waitq tasks We reset all I/O on a disconnected data server through the pgio layer indicated by the NFS_IOHDR_REDO flag. Differentiate between on-the-wire tasks returning with an error which must call rpc_call_done and tasks woken from the data server slot_table_waitq waiting for a session slot with a status of zero which call rpc_exit in rpc_prepare and need to skip rpc_call_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:42:42 -04:00
Andy Adamson	996074cb8c	NFSv4.1 Just use nfs_put_client in filelayout release Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:42:37 -04:00
Andy Adamson	d42e78737c	NFSv4.1 fix null state reference in filelayout_async_handle_error Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:42:28 -04:00
Trond Myklebust	53b8ee3464	NFSv4.1: Fix a bad reference count issue in the pNFS commit code filelayout_scan_commit_lists needs to bump the reference count on the struct nfs_page just like nfs_scan_commit_list(). Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-22 16:36:27 -04:00
Rafael J. Wysocki	8714c8d74d	Merge branch 'pm-sleep' * pm-sleep: epoll: Fix user space breakage related to EPOLLWAKEUP	2012-05-22 20:57:19 +02:00
Rafael J. Wysocki	a8159414d7	epoll: Fix user space breakage related to EPOLLWAKEUP Commit `4d7e30d` (epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready) caused some applications to malfunction, because they set the bit corresponding to the new EPOLLWAKEUP flag in their eventpoll flags and they don't have the new CAP_EPOLLWAKEUP capability. To prevent that from happening, change epoll_ctl() to clear EPOLLWAKEUP in epds.events if the caller doesn't have the CAP_EPOLLWAKEUP capability instead of failing and returning an error code, which allows the affected applications to function normally. Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-05-22 20:57:06 +02:00
Linus Torvalds	cb60e3e65c	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...	2012-05-21 20:27:36 -07:00
Linus Torvalds	62c8d92278	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 changes from Steven Whitehouse. * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits) GFS2: Fix quota adjustment return code GFS2: Add rgrp information to block_alloc trace point GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer GFS2: Update glock doc to add new stats info GFS2: Update main gfs2 doc GFS2: Remove redundant metadata block type check GFS2: Fix sgid propagation when using ACLs GFS2: eliminate log elements and simplify GFS2: Eliminate vestigial sd_log_le_rg GFS2: Eliminate needless parameter from function gfs2_setbit GFS2: Log code fixes GFS2: Remove unused argument from gfs2_internal_read GFS2: Remove bd_list_tr GFS2: Remove duplicate log code GFS2: Clean up log write code path GFS2: Use variable rather than qa to determine if unstuff necessary GFS2: Change variable blk to biblk GFS2: Fix function parameter comments in rgrp.c GFS2: Eliminate offset parameter to gfs2_setbit GFS2: Use slab for block reservation memory ...	2012-05-21 19:21:20 -07:00
Linus Torvalds	2e321806b6	Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu" This reverts commit `8c01a529b8`. It turns out the d_unhashed() check isn't unnecessary after all: while it's true that unhashing will increment the sequence numbers, that does not necessarily invalidate the RCU lookup, because it might have seen the dentry pointer (before it got unhashed), but by the time it loaded the sequence number, it could have seen the new sequence number (after it got unhashed). End result: we might look up an unhashed dentry that is about to be freed, with the sequence number never indicating anything bad about it. So checking that the dentry is still hashed (after reading the sequence number) is indeed the proper fix, and was never unnecessary. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-21 18:48:10 -07:00
James Morris	ff2bb047c4	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next Per pull request, for 3.5.	2012-05-22 11:21:06 +10:00
Linus Torvalds	6326c71fd2	vfs: be even more careful about dentry RCU name lookups Miklos Szeredi points out that we need to also worry about memory odering when doing the dentry name comparison asynchronously with RCU. In particular, doing a rename can do a memcpy() of one dentry name over another, and we want to make sure that any unlocked reader will always see the proper terminating NUL character, so that it won't ever run off the allocation. Rather than having to be extra careful with the name copy or at lookup time for each character, this resolves the issue by making sure that all names that are inlined in the dentry always have a NUL character at the end of the name allocation. If we do that at dentry allocation time, we know that no future name copy will ever change that final NUL to anything else, so there are no memory ordering issues. So even if a concurrent rename ends up overwriting the NUL character that terminates the original name, we always know that there is one final NUL at the end, and there is no worry about the lockless RCU lookup traversing the name too far. The out-of-line allocations are never copied over, so we can just make sure that we write the name (with terminating NULL) and do a write barrier before we expose the name to anything else by setting it in the dentry. Reported-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-21 16:14:04 -07:00
Linus Torvalds	a70b52ec1a	vfs: make AIO use the proper rw_verify_area() area helpers We had for some reason overlooked the AIO interface, and it didn't use the proper rw_verify_area() helper function that checks (for example) mandatory locking on the file, and that the size of the access doesn't cause us to overflow the provided offset limits etc. Instead, AIO did just the security_file_permission() thing (that rw_verify_area() also does) directly. This fixes it to do all the proper helper functions, which not only means that now mandatory file locking works with AIO too, we can actually remove lines of code. Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-21 16:06:20 -07:00
Linus Torvalds	cd975ae0ce	Clean up some c6x Kconfig items and add support for Elf FDPIC loader. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPunCjAAoJEOiN4VijXeFPR3cP+gLDhWJGWC/xnI+vr1WJJxZU xJQbaopEyqJa7GBT+3BasCCLUCU7MVIrYaqubzEI4dzVCMDxwuJYPT7m/N3Avssg bm1ZaLUnanc8ZLfLHlnsG02xtBtEYI4mUM7ggCfcDJJC0b7tDGcz0kiM2WDGK0fq LRbZmlC57qK0UTx2IEvl0DoMuH7pq7N1sSXFr6keBm/lqN7Qj3VtAf/ASVQdIgN8 TfQiWbu40v8EEcKAZSkVKvhS3kSiPI9Dr1DGeR1JIhjFqFYUaw52aOzNTJyC9k1g 4oPIoDh0xnSzYrYpetP1gN5bT/FMRcVXLvKM7cIbeUzoAlgnQsBDAGA7x205X8Bf ChrQEpm/yKZnwEBtp9uIEQY3rRT7iecqVgeVO5LXskg+/OX0+gO5CtCDPopWzqJN wSOCsP9va0G9W0a8G5hKqEGYhbCXDkzU6KzmcgxbxIeF+CvJ+72mWgqxJkF0GtuU KCoLQT7Necq+4p/SaiXL7KogS9m2rClbszManceyAYSGwqAPIfU3RzUw+Fz6NcIh DrYXNTZelrVN2wrVrlhn3B6GdgfyVabhQ1e3CftmAtbY/i+i/ArHCT9JxjyAHTuV ekTnpVVDBZUCUluJXFJKzrvRuSUt9X18AxK0NfJ+xglOPQnPh026K/sCuvmn6mAq U1f1XmCqG2l2TDtkZ6Wu =q4Z6 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming Pull c6x updates from Mark Salter: "Clean up some c6x Kconfig items and add support for Elf FDPIC loader." * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming: C6X: remove unused config items C6X: add support to build with BINFMT_ELF_FDPIC C6X: change main arch kbuild symbol	2012-05-21 12:46:48 -07:00
Linus Torvalds	cb62ab71fe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: 1) Get rid of the error prone NLA_PUT() macros that used an embedded goto. 2) Kill off the token-ring and MCA networking drivers, from Paul Gortmaker. 3) Reduce high-order allocations made by datagram AF_UNIX sockets, from Eric Dumazet. 4) Add PTP hardware clock support to IGB and IXGBE, from Richard Cochran and Jacob Keller. 5) Allow users to query timestamping capabilities of a card via ethtool, from Richard Cochran. 6) Add loadbalance mode to the teaming driver, from Jiri Pirko. Part of this is that we can now have BPF filters not attached to sockets, and the loadbalancing function is calculated using one. 7) Francois Romieu went through the network drivers removing gratuitous uses of netdev->base_addr, perhaps some day we can remove it completely but it's used for ISA probing still. 8) Add a BPF JIT for sparc. I know, who cares, right? :-) 9) Move networking sysctl registry away from using the compatability mode interfaces in the sysctl code. From Eric W Biederman. 10) Pavel Emelyanov added a way to save and restore TCP socket state via TCP_REPAIR, TCP_REPAIR_QUEUE, and TCP_QUEUE_SEQ socket options as well as a way to forcefully bind a socket to a port via the sk->sk_reuse value SK_FORCE_REUSE. There is also a TCP_REPAIR_OPTIONS which allows to reinstante the TCP options enabled on the connection. 11) Several enhancements from Eric Dumazet that, in particular, can enhance splice performance on TCP sockets significantly. a) Reset the offset of the per-socket sendmsg page when we know we're the only use of the page in linear_to_page(). b) Add facilities such that skb->data can be backed a page rather than SLAB kmalloc'd memory. In particular devices which were receiving into linear RX buffers can now end up providing paged data. The big result is that code like splice and GRO do not have to copy any more. 12) Allow a pure sender to more gracefully handle ACK backlogs in TCP. What can happen at high rates is that the sender hasn't grown his receive buffer limits at all (he's not receiving data so really doesn't need to), but the non-data ACKs consume receive buffer space. sk_add_backlog() is too aggressive in dropping frames in this case, so relax it's requirements by using the receive buffer plus the send buffer limit as the backlog limit instead of just the former. Also from Eric Dumazet. 13) Add ipv6 support to L2TP, from Benjamin LaHaise, James Chapman, and Chris Elston. 14) Implement TCP early retransmit (RFC 5827), from Yuchung Cheng. Basically, we can start fast retransmit before hiting the dupack threshold under certain conditions. 15) New CODEL active queue management packet scheduler, from Eric Dumazet based upon initial work by Dave Taht. Basically, the big feature is that packets are dropped (or ECN bits are set) based upon how long packets live in the queue, rather than the queue length (which is what RED uses). git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1341 commits) drivers/net/stmmac: seq_file fix memory leak ipv6/exthdrs: strict Pad1 and PadN check USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z USB: qmi_wwan: Add ZTE (Vodafone) K3765-Z USB: qmi_wwan: Make forced int 4 whitelist generic net/ipv4: replace simple_strtoul with kstrtoul net/ipv4/ipconfig: neaten __setup placement net: qmi_wwan: Add Vodafone/Huawei K5005 support net: cdc_ether: Add ZTE WWAN matches before generic Ethernet ipv6: use skb coalescing in reassembly ipv4: use skb coalescing in defragmentation net: introduce skb_try_coalesce() net:ipv6:fixed space issues relating to operators. net:ipv6:fixed a trailing white space issue. ipv6: disable GSO on sockets hitting dst_allfrag tg3: use netdev_alloc_frag() API net: napi_frags_skb() is static ppp: avoid false drop_monitor false positives ipv6: bool/const conversions phase2 ipx: Remove spurious NULL checking in ipx_ioctl(). ...	2012-05-21 10:03:46 -07:00
Linus Torvalds	31ed8e6f93	Merge branch 'dentry-cleanups' (dcache access cleanups and optimizations) This branch simplifies and clarifies the dcache lookup, and allows us to do certain nice optimizations when comparing dentries. It also cleans up the interface to __d_lookup_rcu(), especially around passing the inode information around. * dentry-cleanups: vfs: make it possible to access the dentry hash/len as one 64-bit entry vfs: move dentry name length comparison from dentry_cmp() into callers vfs: do the careful dentry name access for all dentry_cmp cases vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces	2012-05-21 08:50:57 -07:00
Linus Torvalds	7e5cb5e151	Merge branch 'vfs-cleanups' (random vfs cleanups) This teaches vfs_fstat() to use the appropriate f[get\|put]_light functions, allowing it to avoid some unnecessary locking for the common case. More noticeably, it also cleans up and simplifies the "getname_flags()" function, which now relies on the architecture strncpy_from_user() doing all the user access checks properly, instead of hacking around the fact that on x86 it didn't use to do it right (see commit 92ae03f2ef99: "x86: merge 32/64-bit versions of 'strncpy_from_user()' and speed it up"). * vfs-cleanups: VFS: make vfs_fstat() use f[get\|put]_light() VFS: clean up and simplify getname_flags() x86: make word-at-a-time strncpy_from_user clear bytes at the end	2012-05-21 08:46:08 -07:00
Dave Chinner	14c26c6a05	xfs: add trace points for log forces To enable easy tracing of the location of log forces and the frequency of them via perf, add a pair of trace points to the log force functions. This will help debug where excessive log forces are being issued from by simple perf commands like: # ~/perf/perf top -e xfs:xfs_log_force -G -U Which gives this sort of output: Events: 141 xfs:xfs_log_force - 100.00% [kernel] [k] xfs_log_force - xfs_log_force 87.04% xfsaild kthread kernel_thread_helper - 12.87% xfs_buf_lock _xfs_buf_find xfs_buf_get xfs_trans_get_buf xfs_da_do_buf xfs_da_get_buf xfs_dir2_data_init xfs_dir2_leaf_addname xfs_dir_createname xfs_create xfs_vn_mknod xfs_vn_create vfs_create do_last.isra.41 path_openat do_filp_open do_sys_open sys_open system_call_fastpath Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sig.com>	2012-05-21 10:45:44 -05:00
Peter Watkins	3ba3160374	xfs: fix memory reclaim deadlock on agi buffer Note xfs_iget can be called while holding a locked agi buffer. If it goes into memory reclaim then inode teardown may try to lock the same buffer. Prevent the deadlock by calling radix_tree_preload with GFP_NOFS. Signed-off-by: Peter Watkins <treestem@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-05-21 10:45:44 -05:00
Dave Chinner	ea562ed6e7	xfs: fix delalloc quota accounting on failure xfstest 270 was causing quota reservations way beyond what was sane (ten to hundreds of TB) for a 4GB filesystem. There's a sign problem in the error handling path of xfs_bmapi_reserve_delalloc() because xfs_trans_unreserve_quota_nblks() simple negates the value passed - which doesn't work for an unsigned variable. This causes reservations of close to 2^32 block instead of removing a reservation of a handful of blocks. Fix the same problem in the other xfs_trans_unreserve_quota_nblks() callers where unsigned integer variables are used, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-05-21 10:45:43 -05:00
Theodore Ts'o	f32aaf2d2b	ext4: enable the 64-bit jbd2 feature based on the 64-bit ext4 feature Previously we were only enabling the 64-bit jbd2 feature if the number of blocks in the file system was greater 232-1. The problem with this is that it makes it harder to test the 64-bit journal code paths with small file systems, since a small test file system would with the 64-bit ext4 feature enable would use a 64-bit file system on-disk data structures, but use a 32-bit journal. This would also cause problems when trying to do an online resize to grow the filesystem above the 232-1 boundary. Fortunately the patch to support online resize for 64-bit file systems hasn't been merged yet, so this problem hasn't arisen in practice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-21 11:42:02 -04:00
Linus Torvalds	8c12fec90c	Merge branch 'stat-cleanups' (clean up copying of stat info to user space) This makes cp_new_stat() a bit more readable, and avoids having to memset() the whole structure just to fill in a couple of padding fields. This is another result of me looking at code generation of functions that show up high on certain kernel profiles, and just going "Oh, let's just clean that up". Architectures that don't supply the #define to fill just the padding fields will still fall back to memset(). * stat-cleanups: vfs: don't force a big memset of stat data just to clear padding fields vfs: de-crapify "cp_new_stat()" function	2012-05-21 08:41:38 -07:00
Trond Myklebust	b3f87b98aa	Merge branch 'bugfixes' into nfs-for-next	2012-05-21 10:12:39 -04:00
Sachin Bhamare	8b56a30caa	exofs: Add SYSFS info for autologin/pNFS export Introduce sysfs infrastructure for exofs cluster filesystem. Each OSD target shows up as below in the sysfs hierarchy: /sys/fs/exofs/<osdname>_<partition_id>/devX Where <osdname>_<partition_id> is the unique identification of a Superblock. Where devX: 0 <= X < device_table_size. They are ordered in device-table order as specified to the mkfs.exofs command Each OSD device devX has following attributes : osdname - ReadOnly systemid - ReadOnly uri - Read/Write It is up to user-mode to update devX/uri for support of autologin. These sysfs information are used both for autologin as well as support for exporting exofs via a pNFSD server in user-mode. (.eg NFS-Ganesha) Signed-off-by: Sachin Bhamare <sbhamare@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-05-21 12:24:01 +03:00
David S. Miller	17eea0df5f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-20 21:53:04 -04:00
Artem Bityutskiy	4415626732	UBI: amend commentaries WRT dtype Richard removed the "dtype" hint, but few commentaries were left and this patch removes them. I've also added a better description about the "dtype" field in the ubi-user.h for people who may ever wonder what was that dtype thing about. This patch also adds an important note that it is better to use value "3" for the "dtype" field. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-20 20:25:59 +03:00
Richard Weinberger	b36a261e8c	UBI: Kill data type hint We do not need this feature and to our shame it even was not working and there was a bug found very recently. -- Artem Bityutskiy Without the data type hint UBI2 (fastmap) will be easier to implement. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-20 20:25:59 +03:00
Sidney Amani	56b04e3e8b	UBIFS: fix memory leak on error path UBIFS leaks memory on error path in 'mount_ubifs()'. In case of failure in 'ubifs_fixup_free_space()', it does not call 'ubifs_lpt_free()' whereas LPT data structures can potentially be allocated. The amount of memory leaked can be quite high -- see 'ubifs_lpt_init()'. The bug was introduced when moving the LPT initialisation earlier in the mount process (commit '781c5717a95a74b294beb38b8276943b0f8b5bb4'). Signed-off-by: Sidney Amani <seed95@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-20 20:19:08 +03:00
Artem Bityutskiy	4994297606	UBIFS: make ubifs_lpt_init clean-up in case of failure Most functions in UBIFS follow the following designn pattern: if the function allocates multiple resources, and failss at some point, it frees what it has allocated and returns an error. So the caller can rely on the fact that the callee has cleaned up everything after own failure. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Sidney Amani <seed95@gmail.com>	2012-05-20 20:19:01 +03:00
Boaz Harrosh	6abe4a87f7	exofs: Fix CRASH on very early IO errors. If at exofs_fill_super() we had an early termination do to any error, like an IO error while reading the super-block. We would crash inside exofs_free_sbi(). This is because sbi->oc.numdevs was set to 1, before we actually have a device table at all. Fix it by moving the sbi->oc.numdevs = 1 to after the allocation of the device table. Reported-by: Johannes Schild <JSchild@gmx.de> Stable: This is a bug since v3.2.0 CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-05-20 19:42:41 +03:00
Andy Adamson	041245c88a	NFSv4.1 resend LAYOUTGET on data server invalid layout errors The "invalid layout" class of errors is handled by destroying the layout and getting a new layout from the server. Currently, the layout must be destroyed before a new layout can be obtained. This means that all references (e.g.lsegs) to the "to be destroyed" layout header must be dropped before it can be destroyed. This in turn means waiting for all in flight RPC's using the old layout as well as draining the data server session slot table wait queue. Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for the old layout to be destroyed. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:33 -04:00
Andy Adamson	b4a2967e52	NFSv4.1 dereference a disconnected data server client record When the last DS io is processed, the data server client record will be freed. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:32 -04:00
Andy Adamson	3a7936c3fc	NFSv4.1 ref count nfs_client across filelayout data server io Prepare to put a dis-connected DS client record. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:32 -04:00
Andy Adamson	0a57cdac3f	NFSv4.1 send layoutreturn to fence disconnected data server Let the MDS know that you are redirecting I/O from pNFS to MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:31 -04:00
Andy Adamson	671fb89695	NFSv4.1 wake up all tasks on un-connected DS slot table waitq The DS has a connection error (invalid deviceid). Drain the fore channel slot table waitq. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:31 -04:00
Andy Adamson	0ad2f378e1	NFSv4.1 Check invalid deviceid upon slot table waitq wakeup Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state. Reset the task for io through the MDS if the deviceid is invalid. The reset functions put the io pages through the pageio layer which has the advantage of re-coalescing which allows for the MDS and DS having different r/wsizes. Exit the awakened task without executing the rpc_call_done routine. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:55:31 -04:00
Andy Adamson	a033a09189	NFSv4.1 remove nfs4_reset_write and nfs4_reset_read Replaced by filelayout_reset_write and filelayout_reset_read Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:59 -04:00
Andy Adamson	e7dd79af01	NFSv4.1: mark deviceid invalid on filelayout DS connection errors This prevents the use of any layout for i/o that references the deviceid. I/O is redirected through the MDS. Redirect the unhandled failed I/O to the MDS without marking either the layout or the deviceid invalid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:20 -04:00
Andy Adamson	98fc685ae2	NFSv4.1 data server timeo and retrans module parameters Set the recovery parameters for data servers. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:20 -04:00
Andy Adamson	9f0ec176b3	NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS file layout to quickly try the MDS or perhaps another DS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:19 -04:00
Andy Adamson	90fecfcb34	NFSv4.1 cleanup filelayout invalid layout handling The invalid layout bits are should only be used to block LAYOUTGETs. Do not invalidate a layout on deviceid invalidation. Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:19 -04:00
Andy Adamson	554d458d79	NFSv4.1: cleanup filelayout invalid deviceid handling Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY is no longer needed. Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING. An invalid device prevents pNFS io. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:54:18 -04:00
Matthew Treinish	e73e6c9e85	Fixed goto readability in nfs_update_inode. Simplified error gotos to make it slightly easier to read, it doesn't affect the functionality of the routine. Signed-off-by: Matthew Treinish <treinish@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-19 17:10:10 -04:00
Linus Torvalds	14e931a264	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()	2012-05-19 10:12:17 -07:00
Linus Torvalds	73f1f5dd3e	Merge branch 'akpm' (Andrew's patch-bomb) Merge misc fixes from Andrew Morton. * emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches) frv: delete incorrect task prototypes causing compile fail slub: missing test for partial pages flush work in flush_all() fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01	2012-05-18 15:56:25 -07:00
Linus Torvalds	30a08bf2d3	proc: move fd symlink i_mode calculations into tid_fd_revalidate() Instead of doing the i_mode calculations at proc_fd_instantiate() time, move them into tid_fd_revalidate(), which is where the other inode state (notably uid/gid information) is updated too. Otherwise we'll end up with stale i_mode information if an fd is re-used while the dentry still hangs around. Not that anything really cares (symlink permissions don't really matter), but Tetsuo Handa noticed that the owner read/write bits don't always match the state of the readability of the file descriptor, and we _used_ to get this right a long time ago in a galaxy far, far away. Besides, aside from fixing an ugly detail (that has apparently been this way since commit 61a28784028e: "proc: Remove the hard coded inode numbers" in 2006), this removes more lines of code than it adds. And it just makes sense to update i_mode in the same place we update i_uid/gid. Al Viro correctly points out that we could just do the inode fill in the inode iops ->getattr() function instead. However, that does require somewhat slightly more invasive changes, and adds yet another lookup of the file descriptor. We need to do the revalidate() for other reasons anyway, and have the file descriptor handy, so we might as well fill in the information at this point. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-18 14:06:17 -07:00
Wang Sheng-Hui	0324876628	ext2: trivial fix to comment for ext2_free_blocks The function is ext2_free_blocks(), not ext2_free_blocks_sb(). Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-18 15:23:08 +02:00
Cyrill Gorcunov	eb94cd96e0	fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries map_files/ entries are never supposed to be executed, still curious minds might try to run them, which leads to the following deadlock ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc4-24406-g841e6a6 #121 Not tainted ------------------------------------------------------- bash/1556 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1 but task is already holding lock: (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sig->cred_guard_mutex){+.+.+.}: validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_killable_nested+0x40/0x45 lock_trace+0x24/0x59 proc_map_files_lookup+0x5a/0x165 __lookup_hash+0x52/0x73 do_lookup+0x276/0x2b1 walk_component+0x3d/0x114 do_last+0xfc/0x540 path_openat+0xd3/0x306 do_filp_open+0x3d/0x89 do_sys_open+0x74/0x106 sys_open+0x21/0x23 tracesys+0xdd/0xe2 -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: check_prev_add+0x6a/0x1ef validate_chain+0x444/0x4f4 __lock_acquire+0x387/0x3f8 lock_acquire+0x12b/0x158 __mutex_lock_common+0x56/0x3a9 mutex_lock_nested+0x40/0x45 do_lookup+0x267/0x2b1 walk_component+0x3d/0x114 link_path_walk+0x1f9/0x48f path_openat+0xb6/0x306 do_filp_open+0x3d/0x89 open_exec+0x25/0xa0 do_execve_common+0xea/0x2f9 do_execve+0x43/0x45 sys_execve+0x43/0x5a stub_execve+0x6c/0xc0 This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex and when do_lookup happens we try to grab task->signal->cred_guard_mutex again in lock_trace. Fix it using plain ptrace_may_access() helper in proc_map_files_lookup() and in proc_map_files_readdir() instead of lock_trace(), the caller must be CAP_SYS_ADMIN granted anyway. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Jones <davej@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-17 18:00:51 -07:00
Anton Vorontsov	39eb7e9791	pstore/ram: Add ECC support This is now straightforward: just introduce a module parameter and pass the needed value to persistent_ram_new(). Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Marco Stornelli <marco.stornelli@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-17 08:51:59 -07:00
Anton Vorontsov	896fc1f0c4	pstore/ram: Switch to persistent_ram routines The patch switches pstore RAM backend to use persistent_ram routines, one step closer to the ECC support. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-17 08:51:41 -07:00
Anton Vorontsov	cddb8751c8	staging: android: persistent_ram: Move to fs/pstore/ram_core.c This is a first step for adding ECC support for pstore RAM backend: we will use the persistent_ram routines, kindly provided by Google. Basically, persistent_ram is a set of helper routines to deal with the [optionally] ECC-protected persistent ram regions. A bit of Makefile, Kconfig and header files adjustments were needed because of the move. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-17 08:50:00 -07:00
Alex Elder	8f43fb5389	ceph: use info returned by get_authorizer Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-05-17 08:18:13 -05:00
Alex Elder	a3530df33e	ceph: have get_authorizer methods return pointers Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-05-17 08:18:13 -05:00
Alex Elder	a255651d4c	ceph: ensure auth ops are defined before use In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-05-17 08:18:13 -05:00
Alex Elder	74f1869f76	ceph: messenger: reduce args to create_authorizer Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-05-17 08:18:12 -05:00
Alex Elder	6c4a19158b	ceph: define ceph_auth_handshake type The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-05-17 08:18:12 -05:00
Shirish Pargaonkar	2608bee744	cifs: Include backup intent search flags during searches {try #2 ) As observed and suggested by Tushar Gosavi... --------- readdir calls these function to send TRANS2_FIND_FIRST and TRANS2_FIND_NEXT command to the server. The current cifs module is not specifying CIFS_SEARCH_BACKUP_SEARCH flag while sending these command when backupuid/backupgid is specified. This can be resolved by specifying CIFS_SEARCH_BACKUP_SEARCH flag. --------- Cc: <stable@kernel.org> Reported-and-Tested-by: Tushar Gosavi <tugosavi@in.ibm.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-17 13:07:49 +04:00
Pavel Shilovsky	7f92447aa7	CIFS: Separate protocol specific part from setlk Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-17 13:07:48 +04:00
Pavel Shilovsky	55157dfbb5	CIFS: Separate protocol specific part from getlk Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-17 13:07:41 +04:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Pavel Shilovsky	106dc538ab	CIFS: Separate protocol specific lock type handling Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-16 20:13:36 -05:00
Pavel Shilovsky	04a6aa8acf	CIFS: Convert lock type to 32 bit variable to handle SMB2 lock type field further. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-16 20:13:35 -05:00
Pavel Shilovsky	fbd35acadd	CIFS: Move locks to cifsFileInfo structure CIFS brlock cache can be used by several file handles if we have a write-caching lease on the file that is supported by SMB2 protocol. Prepate the code to handle this situation correctly by sorting brlocks by a fid to easily push them in portions when lease break comes. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-16 20:13:35 -05:00
Jeff Layton	121b046af5	cifs: convert send_nt_cancel into a version specific op For SMB2, this should be a no-op. Obviously if we wanted to do something for the SMB2 case, we could also define an operation here for it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-16 20:13:34 -05:00
Jeff Layton	23db65f511	cifs: add a smb_version_operations/values structures and a smb_version enum We need a way to dispatch different operations for different versions. Behold the smb_version_operations/values structures. For now, those structures just hold the version enum value and nothing uses them. Eventually, we'll expand them to cover other operations/values as we change the callers to dispatch from here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>	2012-05-16 20:13:34 -05:00
Jeff Layton	5249af32da	cifs: remove the vers= and version= synonyms for ver= We want these to mean something different entirely, and the mount.cifs helper only ever passed in ver= automatically. Also, don't allow ver=cifs anymore since that was never passed in by the mount helper. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:34 -05:00
Jeff Layton	296838b182	cifs: add warning about change in default cache semantics in 3.7 Add a warning that will be displayed when there is no cache= option specified. We want to ensure that users are aware of the change in defaults coming in 3.7. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:33 -05:00
Jeff Layton	d06b5056ae	cifs: display cache= option in /proc/mounts ...and deprecate the display of strictcache, forcedirectio, and fsc as separate options. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:33 -05:00
Jeff Layton	09983b2fab	cifs: add deprecation warnings to strictcache and forcedirectio Leave them in for 2 releases and remove for 3.7. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:32 -05:00
Jeff Layton	15b6a47322	cifs: add a cache= option to better describe the different cache flavors Currently, we have several mount options that control cifs' cache behavior, but those options aren't considered to be mutually exclusive. The result is poorly-defined when someone specifies more than one of these options at mount time. Fix this by adding a new cache= mount option that will supercede "strictcache", and "forcedirectio". That will help make it clear that these options are mutually exclusive. Also, change the legacy options to be mutually exclusive too, to ensure that users don't get surprises. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:32 -05:00
Jeff Layton	4d61cd6ec7	cifs: add a deprecation warning to CIFS_IOC_CHECKUMOUNT ioctl This was used by an ancient version of umount.cifs and in nowhere else that I'm aware of. Let's add a warning now and dump it for 3.7. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:32 -05:00
Jeff Layton	5e500ed125	cifs: remove legacy MultiuserMount option We've now warned about this for two releases. Remove it for 3.5. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:31 -05:00
Jeff Layton	1c89254926	cifs: convert cifs_iovec_read to use async reads Convert cifs_iovec_read to use async I/O. This also raises the limit on the rsize for uncached reads. We first allocate a set of pages to hold the replies, then issue the reads in parallel and then collect the replies and copy the results into the iovec. A possible future optimization would be to kmap and inline the iovec buffers and read the data directly from the socket into that. That would require some rather complex conversion of the iovec into a kvec however. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:31 -05:00
Jeff Layton	2a1bb13853	cifs: add wrapper for cifs_async_readv to retry opening file We'll need this same bit of code for the uncached case. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:30 -05:00
Jeff Layton	6993f74a5b	cifs: add refcounting to cifs_readdata structures This isn't strictly necessary for the async readpages code, but the uncached version will need to be able to collect the replies after issuing the calls. Add a kref to cifs_readdata and use change the code to take and put references appropriately. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:30 -05:00
Jeff Layton	8d5ce4d23c	cifs: abstract out function to marshal the iovec for readv receives Cached and uncached reads will need to do different things here to handle the difference when the pages are in pagecache and not. Abstract out the function that marshals the page list into a kvec array. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:29 -05:00
Jeff Layton	0471ca3fe4	cifs: make cifs_readdata_alloc take a work_func_t arg We'll need different completion routines for an uncached read. Allow the caller to set the one he needs at allocation time. Also, move most of these functions to file.c so we can make more of them static. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2012-05-16 20:13:29 -05:00
Suresh Siddha	11aeca0b3a	coredump: ensure the fpu state is flushed for proper multi-threaded core dump Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump of a multi-threaded application. A look at the exit seqeuence shows that other threads can still be on the runqueue potentially at the below shown exit_mm() code snippet: if (atomic_dec_and_test(&core_state->nr_threads)) complete(&core_state->startup); ===> other threads can still be active here, but we notify the thread ===> dumping core to wakeup from the coredump_wait() after the last thread ===> joins this point. Core dumping thread will continue dumping ===> all the threads state to the core file. for (;;) { set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!self.task) /* see coredump_finish() */ break; schedule(); } As some of those threads are on the runqueue and didn't call schedule() yet, their fpu state is still active in the live registers and the thread proceeding with the coredump will hit the above mentioned BUG_ON while trying to dump other threads fpustate to the coredump file. BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is in the code paths for processors supporting xsaveopt. With or without xsaveopt, multi-threaded coredump is broken and maynot contain the correct fpustate at the time of exit. In coredump_wait(), wait for all the threads to be come inactive, so that we are sure all the extended register state is flushed to the memory, so that it can be reliably copied to the core file. Reported-by: Suresh Nalluru <suresh@aristanetworks.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1336692811-30576-2-git-send-email-suresh.b.siddha@intel.com Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-05-16 15:16:48 -07:00
Linus Torvalds	dfae359f08	Merge git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fix from Jeff Layton * git://git.samba.org/sfrench/cifs-2.6: cifs: fix misspelling of "forcedirectio"	2012-05-16 14:22:38 -07:00
Sage Weil	c047be0934	ceph: ignore preferred_osd field Old users may not expect EINVAL, and there is no clear user-visibile behavior change now that we ignore it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-05-16 14:28:28 -05:00
Sage Weil	702aeb1f88	ceph: fully initialize new layout When we are setting a new layout, fully initialize the structure: - zero it out - always set preferred_osd to -1 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-05-16 14:28:27 -05:00
Benny Halevy	5f23eff381	NFS: fix unsigned comparison in nfs4_create_sec_client fs/nfs/nfs4namespace.c: In function ‘nfs4_create_sec_client’: fs/nfs/nfs4namespace.c:171:2: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits] Introduced by commit `72de53ec4b` "NFS: Do secinfo as part of lookup" Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-16 10:36:50 -07:00
Trond Myklebust	39ffb9218e	NFS: Fix a compile issue when CONFIG_NFS_FSCACHE was undefined Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-16 10:24:20 -07:00
Artem Bityutskiy	a6aae4dd0f	UBIFS: get rid of dbg_err This patch removes the 'dbg_err()' macro and we now use 'ubifs_err()' instead. The idea of 'dbg_err()' was to compile out some error message to make the binary a bit smaller - but I think it was a bad idea. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-16 20:11:23 +03:00
Artem Bityutskiy	f70b7e52aa	UBIFS: remove Kconfig debugging option Have the debugging stuff always compiled-in instead. It simplifies maintanance a lot. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-16 19:53:46 +03:00
Artem Bityutskiy	1baafd28dc	UBIFS: remove a couple of unused macros Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-16 19:36:04 +03:00
Jeff Layton	531c8ff0d4	cifs: fix misspelling of "forcedirectio" ...and add a "directio" synonym since that's what the manpage has always advertised. Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2012-05-16 11:26:25 -05:00
Artem Bityutskiy	edf6be245f	UBIFS: rename dumping functions This commit re-names all functions which dump something from "dbg_dump_()" to "ubifs_dump_()". This is done for consistency with UBI and because this way it will be more logical once we remove the debugging sompilation option. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-16 19:15:56 +03:00
Artem Bityutskiy	7c46d0ae29	UBIFS: get rid of dbg_dump_stack In case of errors we almost always need the stack dump - it makes no sense to compile it out. Remove the 'dbg_dump_stack()' function completely. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-05-16 19:04:54 +03:00
Anton Vorontsov	1894a253db	ramoops: Move to fs/pstore/ram.c Since ramoops was converted to pstore, it has nothing to do with character devices nowadays. Instead, today it is just a RAM backend for pstore. The patch just moves things around. There are a few changes were needed because of the move: 1. Kconfig and Makefiles fixups, of course. 2. In pstore/ram.c we have to play a bit with MODULE_PARAM_PREFIX, this is needed to keep user experience the same as with ramoops driver (i.e. so that ramoops.foo kernel command line arguments would still work). Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Marco Stornelli <marco.stornelli@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-05-16 08:06:37 -07:00
Bob Peterson	500242ac61	GFS2: Fix quota adjustment return code This patch changes function gfs2_adjust_quota so that it properly returns a good (zero) return code on the normal path through the code. Without this, mounting GFS2 with -o quota=account periodically gave this error message: GFS2: fsid=cluster:fs: gfs2_quotad: sync error -5 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-05-16 12:22:38 +01:00
Eric W. Biederman	ab27b91b9f	userns: Convert sysfs to use kgid/kuid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:29 -07:00
Eric W. Biederman	091bd3ea4e	userns: Convert sysctl permission checks to use kuid and kgids. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:28 -07:00
Eric W. Biederman	dcb0f22282	userns: Convert proc to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:28 -07:00
Eric W. Biederman	08cefc7ab8	userns: Convert ext4 to user kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:27 -07:00
Eric W. Biederman	1523299d58	userns: Convert ext3 to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:27 -07:00
Eric W. Biederman	b8a9f9e183	userns: Convert ext2 to use kuid/kgid where appropriate. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:26 -07:00
Eric W. Biederman	f04c6ce2cf	userns: Convert devpts to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:26 -07:00
Eric W. Biederman	ebc887b278	userns: Convert binary formats to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:25 -07:00
Eric W. Biederman	9e4a36ece6	userns: Fail exec for suid and sgid binaries with ids outside our user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:23 -07:00
Wang Sheng-Hui	aa9e939d52	ext2: remove the redundant comment for ext2_export_ops The comment is outdated and isn't particularly informative anyway - NULL meaning the default behavior is very common in kernel. And we really set about half of entries. So remove the whole comment for ext2_export_ops. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:39 +02:00
Eric Sandeen	d7dab39b6e	ext3: return 32/64-bit dir name hash according to usage type This is based on commit `d1f5273e9a` ext4: return 32/64-bit dir name hash according to usage type by Fan Yong <yong.fan@whamcloud.com> Traditionally ext2/3/4 has returned a 32-bit hash value from llseek() to appease NFSv2, which can only handle a 32-bit cookie for seekdir() and telldir(). However, this causes problems if there are 32-bit hash collisions, since the NFSv2 server can get stuck resending the same entries from the directory repeatedly. Allow ext3 to return a full 64-bit hash (both major and minor) for telldir to decrease the chance of hash collisions. This patch does implement a new ext3_dir_llseek op, because with 64-bit hashes, nfs will attempt to seek to a hash "offset" which is much larger than ext3's s_maxbytes. So for dx dirs, we call generic_file_llseek_size() with the appropriate max hash value as the maximum seekable size. Otherwise we just pass through to generic_file_llseek(). Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Patch-updated-by: Eric Sandeen <sandeen@redhat.com> (blame us if something is not correct) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:39 +02:00
Jan Kara	a80b12c3d0	quota: Get rid of nested I_MUTEX_QUOTA locking subclass So far i_mutex was ranking above dqonoff_mutex and i_mutex on quota files was special and ranking below dqonoff_mutex (and several other locks). However there's no real need for i_mutex on quota files to be special. IO on quota files is serialized by dqio_mutex anyway so we don't need to take i_mutex when writing to quota files. Other places where we take i_mutex on quota file can accomodate standard i_mutex lock ranking, we only need to change the lock ranking to be dqonoff_mutex > i_mutex which is a matter of changing documentation because there's no place which would enforce ordering in the other direction. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:39 +02:00
Jan Kara	f9ef178412	quota: Use precomputed value of sb_dqopt in dquot_quota_sync Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:39 +02:00
Jan Kara	e2a3fde750	ext2: Remove i_mutex use from ext2_quota_write() We don't need i_mutex in ext2_quota_write() because writes to quota file are serialized by dqio_mutex anyway. Changes to quota files outside of quota code are forbidded and enforced by NOATIME and IMMUTABLE bits. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:38 +02:00
Jan Kara	67f1648d21	reiserfs: Remove i_mutex use from reiserfs_quota_write() We don't need i_mutex in reiserfs_quota_write() because writes to quota file are serialized by dqio_mutex anyway. Changes to quota files outside of quota code are forbidded and enforced by NOATIME and IMMUTABLE bits. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:38 +02:00
Jan Kara	0b7f7cefae	ext4: Remove i_mutex use from ext4_quota_write() We don't need i_mutex in ext4_quota_write() because writes to quota file are serialized by dqio_mutex anyway. Changes to quota files outside of quota code are forbidded and enforced by NOATIME and IMMUTABLE bits. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:38 +02:00
Jan Kara	905c393762	ext3: Remove i_mutex use from ext3_quota_write() We don't need i_mutex in ext3_quota_write() because writes to quota file are serialized by dqio_mutex anyway. Changes to quota files outside of quota code are forbidded and enforced by NOATIME and IMMUTABLE bits. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:37 +02:00
Jan Kara	d7e9711760	quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG When CONFIG_QUOTA_DEBUG is enabled we call inode_get_rsv_space() from add_dquot_ref() while holding i_lock. But inode_get_rsv_space() is trying to get i_lock as well resulting in double lock. Fix the problem by moving inode_get_rsv_space() call out of i_lock. Reported-and-analyzed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:37 +02:00
Jan Kara	fd2cbd4dfa	jbd: Write journal superblock with WRITE_FUA after checkpointing If journal superblock is written only in disk's caches and other transaction starts reusing space of the transaction cleaned from the log, it can happen blocks of a new transaction reach the disk before journal superblock. When power failure happens in such case, subsequent journal replay would still try to replay the old transaction but some of it's blocks may be already overwritten by the new transaction. For this reason we must use WRITE_FUA when updating log tail and we must first write new log tail to disk and update in-memory information only after that. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:37 +02:00
Jan Kara	1ce8486dcc	jbd: protect all log tail updates with j_checkpoint_mutex There are some log tail updates that are not protected by j_checkpoint_mutex. Some of these are harmless because they happen during startup or shutdown but updates in journal_commit_transaction() and journal_flush() can really race with other log tail updates (e.g. someone doing journal_flush() with someone running cleanup_journal_tail()). So protect all log tail updates with j_checkpoint_mutex. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:36 +02:00
Jan Kara	9754e39c7b	jbd: Split updating of journal superblock and marking journal empty There are three case of updating journal superblock. In the first case, we want to mark journal as empty (setting s_sequence to 0), in the second case we want to update log tail, in the third case we want to update s_errno. Split these cases into separate functions. It makes the code slightly more straightforward and later patches will make the distinction even more important. Signed-off-by: Jan Kara <jack@suse.cz>	2012-05-15 23:34:36 +02:00
Eric W. Biederman	a7c1938e22	userns: Convert stat to return values mapped from kuids and kgids - Store uids and gids with kuid_t and kgid_t in struct kstat - Convert uid and gids to userspace usable values with from_kuid and from_kgid Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:08:35 -07:00
Ben Myers	1307bbd2af	xfs: protect xfs_sync_worker with s_umount semaphore xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing work during mount and unmount. This flag can be cleared by unmount after the xfs_sync_worker checks it but before the work is completed. The has caused crashes in the completion handler for the dummy transaction commited by xfs_sync_worker: PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0" #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee #7 [ffff88016fdffc60] page_fault at ffffffff813ac635 [exception RIP: xlog_get_lowest_lsn+0x30] RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246 RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980 RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400 R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs] #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs] Protect xfs_sync_worker by using the s_umount semaphore at the read level to provide exclusion with unmount while work is progressing. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-05-15 14:35:43 -05:00
Dan Carpenter	75af271ed5	dlm: NULL dereference on failure in kmem_cache_create() We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if both allocations fail, it leads to a NULL dereference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Teigland <teigland@redhat.com>	2012-05-15 10:39:28 -05:00
Mark Salter	fce2447627	C6X: add support to build with BINFMT_ELF_FDPIC C6x userspace supports a shared library mechanism called DSBT for systems with no MMU. DSBT is similar to FDPIC in allowing shared text segments and private copies of data segments without an MMU. Both methods access data using a base register and offset. With FDPIC, the caller of an external function sets up the base register for the callee. With DSBT, the called function sets up its own base register. Other details differ but both userspaces need the same thing from the kernel loader: a map of where each ELF segment was loaded. The FDPIC loader already provides this, so DSBT just uses it. This patch enables BINFMT_ELF_FDPIC by default for C6X and provides the necessary architecture hooks for the generic loader. Signed-off-by: Mark Salter <msalter@redhat.com>	2012-05-15 09:17:34 -04:00
Dan Carpenter	5abc03cd91	NFS: kmalloc() doesn't return an ERR_PTR() Obviously we should check for NULL here instead of IS_ERR(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org [3.4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:44:01 -07:00
Bryan Schumaker	981f9face8	NFS: Turn v3 on by default Most users will use NFS v3 or possibly v4 so this makes it easier for them. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:42:22 -07:00
Bryan Schumaker	2ba68002a7	NFS: Make v2 configurable With this patch NFS v2 can be disabled during Kconfig. I default the option to "y" to match the current behavior. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:42:22 -07:00
Bryan Schumaker	5e7e5a0da2	NFS: Create an NFS v3 stat_to_errno() In theory, NFS v3 can have different error versions than NFS v2. v4 is already using its own nfs4_stat_to_errno() to map error codes, so rather than create something in the generic client for v2 and v3 to share I instead give v3 its own function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:42:21 -07:00
Bryan Schumaker	87c7083dc3	NFS: Pass mntfh as part of the nfs_mount_info structure This allows me to use the filehandle allocated in nfs_fs_mount() for nfs v4 mounts instead of allocating a new one. Rather than change nfs4_mount() to look almost exactly like nfs_fs_mount(), I instead remove the function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:31 -07:00
Bryan Schumaker	46058d46d3	NFS: Allocate parsed mount data directly to the nfs_mount_info structure Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:31 -07:00
Bryan Schumaker	d72c727cd9	NFS: Create a single nfs_validate_mount_data() function This new function chooses between the v2/3 parser and the v4 parser by filesystem type. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:30 -07:00
Bryan Schumaker	b72e4f42a3	NFS: Create a single function for text mount data The v2/3 and v4 cases were very similar, with just a few parameters changed. This makes it easy to share code. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:30 -07:00
Bryan Schumaker	486aa699ff	NFS: Create a new nfs_try_mount() This function returns the same same return type as nfs4_try_mount() so they two can be more easily substituted. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:29 -07:00
Bryan Schumaker	db83335191	NFS: Let mount data parsing set the NFS version This field is unconditionally set while parsing mount data, so there is no need to fill it in here. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:29 -07:00
Bryan Schumaker	21e4b82e13	NFS: Use nfs_fs_mount_common() for remote referral mounts Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:28 -07:00
Bryan Schumaker	3d176e3fe4	NFS: Use nfs_fs_mount_common() for xdev mounts At this point, there are only a few small differences between these two functions. I can set a few function pointers in the nfs_mount_info struct to get around these differences. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:28 -07:00
Bryan Schumaker	8c958e0c4c	NFS: Create a common xdev_mount() function The only difference between nfs_xdev_mount() and nfs4_xdev_mount() is the clone_super() function called to clone the super block. I can combine these two functions by using the fill_super field in the mount_info structure. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:28 -07:00
Bryan Schumaker	c40f8d1d35	NFS: Create a common fs_mount() function The nfs4_remote_mount() function was only slightly different from the nfs_fs_mount() function used by the generic client. I created a new nfs_mount_info structure to set different parameters to help combine these functions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:27 -07:00
Bryan Schumaker	586f95cd4f	NFS: Remove NFS4_MOUNT_UNSHARED This flag is numerically equivalent to NFS_MOUNT_UNSHARED, so I can remove it to make collapsing functions more straightforward. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:27 -07:00
Bryan Schumaker	2311b9439c	NFS: Don't pass mount data to nfs_fscache_get_super_cookie() I intend on creating a single nfs_fs_mount() function used by all our mount paths. To avoid checking between new mounts and clone mounts, I instead pass both structures to a new function in super.c that finds the cache key and then looks up the super cookie. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:26 -07:00
Bryan Schumaker	bae36241be	NFS: Create a single nfs_get_root() This patch splits out the NFS v4 specific functionality of nfs4_get_root() into its own rpc_op called by the generic client, and leaves nfs4_proc_get_rootfh() as its own stand alone function. This also allows me to change nfs4_remote_mount(), nfs4_xdev_mount() and nfs4_remote_referral_mount() to use the generic client's nfs_get_root() function. Later patches in this series will collapse these functions into one common function, so using the same get_root() function everywhere simplifies future changes. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:26 -07:00
Bryan Schumaker	3028eb2b32	NFS: Rename nfs4_proc_get_root() This function is really getting the root filehandle and not the root dentry of the filesystem. I also removed the rpc_ops lookup from nfs4_get_rootfh() under the assumption that if we reach this function then we already know we are using NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-05-14 17:30:25 -07:00
Jeff Liu	3fe3e6b182	xfs: introduce SEEK_DATA/SEEK_HOLE support This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-05-14 16:21:05 -05:00
Ben Myers	e700a06c71	xfs: make xfs_extent_busy_trim not static Commit e459df5, 'xfs: move busy extent handling to it's own file' moved some code from xfs_alloc.c into xfs_extent_busy.c for convenience in userspace code merges. One of the functions moved is xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined STATIC. Unfortunately this function is still used in xfs_alloc.c, and this results in an undefined symbol in xfs.ko. Make xfs_extent_busy_trim not static and add its prototype to xfs_extent_busy.h. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>	2012-05-14 16:21:04 -05:00

... 5 6 7 8 9 ...

27593 Commits