linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Aristeu Rozanski	38f3865744	xattr: extract simple_xattr code from tmpfs Extract in-memory xattr APIs from tmpfs. Will be used by cgroup. $ size vmlinux.o text data bss dec hex filename 4658782 880729 5195032 10734543 a3cbcf vmlinux.o $ size vmlinux.o text data bss dec hex filename 4658957 880729 5195032 10734718 a3cc7e vmlinux.o v7: - checkpatch warnings fixed - Implement the changes requested by Hugh Dickins: - make simple_xattrs_init and simple_xattrs_free inline - get rid of locking and list reinitialization in simple_xattrs_free, they're not needed v6: - no changes v5: - no changes v4: - move simple_xattrs_free() to fs/xattr.c v3: - in kmem_xattrs_free(), reinitialize the list - use simple_xattr_* prefix - introduce simple_xattr_add() to prevent direct list usage Original-patch-by: Li Zefan <lizefan@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lennart Poettering <lpoetter@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-08-24 15:55:33 -07:00
David S. Miller	e6acb38480	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace This is an initial merge in of Eric Biederman's work to start adding user namespace support to the networking. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 18:54:37 -04:00
Jeff Liu	b686d1f79a	xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents xfs_seek_hole() refinement with hole searching from page cache for unwritten extent. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-24 13:57:10 -05:00
Jeff Liu	52f1acc8b5	xfs: xfs_seek_data() refinement with unwritten extents check up from page cache xfs_seek_data() refinement with unwritten extents check up from page cache. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-24 13:56:29 -05:00
Jeff Liu	d126d43f63	xfs: Introduce a helper routine to probe data or hole offset from page cache Introduce helpers to probe data or hole offset from page cache. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-24 13:55:09 -05:00
Jeff Liu	834ab12228	xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole() The type is already indicated by the function naming explicitly, so this argument can be omitted from those calls. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-24 13:48:05 -05:00
Carlos Maiolino	e599b3253c	xfs: fix race while discarding buffers [V4] While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with buffers from lru list), there is a possibility to have xfs_buf_stale() racing with it, and removing buffers from dispose list before xfs_buftarg_shrink() does it. This happens because xfs_buftarg_shrink() handle the dispose list without locking and the test condition in xfs_buf_stale() checks for the buffer being in any list: if (!list_empty(&bp->b_lru)) If the buffer happens to be on dispose list, this causes the buffer counter of lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink() and another in xfs_buf_stale()) causing a wrong account usage of the lru list. This may cause xfs_buftarg_shrink() to return a wrong value to the memory shrinker shrink_slab(), and such account error may also cause an underflowed value to be returned; since the counter is lower than the current number of items in the lru list, a decrement may happen when the counter is 0, causing an underflow on the counter. The fix uses a new flag field (and a new buffer flag) to serialize buffer handling during the shrink process. The new flag field has been designed to use btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism. dchinner, sandeen, aquini and aris also deserve credits for this. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-24 13:46:10 -05:00
Linus Torvalds	62688e5b64	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF, ext3 & reiserfs fixes from Jan Kara: "A couple of fixes (udf, reiserfs, ext3) that accumulated over my vacation." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: fix retun value on error path in udf_load_logicalvol jbd: don't write superblock when unmounting an ro filesystem reiserfs: fix deadlocks with quotas quota: Move down dqptr_sem read after initializing default warn[] type at __dquot_alloc_space(). UDF: During mount free lvid_bh before rescanning with different blocksize udf: fix udf_setsize() for file data in ICB	2012-08-23 21:56:22 -07:00
Linus Torvalds	f4673d6f18	UBIFS fixes for 3.6: 1. Fix crash on error which prevents emulated power-cut testing. 2. Fix log reply regression introduced in 3.6-rc1. 3. Fix UBIFS complaints about too small debug buffer size which. 4. Fix error message spelling, and remove incorrect commentary. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQNdHYAAoJECmIfjd9wqK0t+cQANEkF1Xgs8cd/5U++tQWCGyu EInLm+416ytTy4Vg2bLwlJ4OnUhgtTSTeKovtp3yoSKytxB4vf50qI5P/gInXSxo mI1KkvHZLraFzFPGtnbXGCg0YS8xZLIhw2p8VDuNFcxZIEiuAGZ2qMC9tPzzzle6 em/yDIumR//oioFHOqVNu6wKE3do9QY5fo0meqE6QTliKppSDhWcO9MCfodkFOCp WtlD2dfIQv3SdaZW+X8skbqMFQ9KJRAzk59hl1Cm2jlXEHEHYmvGwqTCSOk8W/0c 3ZSgUVW1YKBYq25yA3Xdz6aDCgRT6LYsi/EH0VNNMlt/H9RfcCNTGe96ioTcgsOT FroLZZUK9PiQYmqUqnk8eqmUyq888vKZY3Sbxt5/bbZhSoK6aYzNgfLlkZNfnHDL unzpyPqYq9VahuTgY5ravDclPKCc2p7xeWJYeOE9JtUY6/fVKJh9qERrtVoNP+nD LjS89G/tNBn9ZPk8KyXgikJ4JZ7sw30vySmBdGtx+ckzXqKTsDtLznvdSU3/XiJM zdqYV3gulgoJtIPXFVuI3i3GeKyXuJlPpjNimuFn+9mNElZfrvs6RcGbQ3Df/3Mc omfTrRxS+vRadGC9MZoIMCV27p7+cEynbLLVXtvPebwqiO1tV1fCspdJGjJmkbvL 14hi7hTYvYCtMR+m8i2+ =VvYb -----END PGP SIGNATURE----- Merge tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifs Pull UBIFS fixes from Artem Bityutskiy: - Fix crash on error which prevents emulated power-cut testing. - Fix log reply regression introduced in 3.6-rc1. - Fix UBIFS complaints about too small debug buffer size which. - Fix error message spelling, and remove incorrect commentary. * tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifs: UBIFS: fix error messages spelling UBIFS: fix complaints about too small debug buffer size UBIFS: fix replay regression UBIFS: fix crash on error path UBIFS: remove stale commentary	2012-08-23 21:50:40 -07:00
Tomas Racek	a672e1be30	xfs: check for possible overflow in xfs_ioc_trim If range.start or range.minlen is bigger than filesystem size, return invalid value error. This fixes possible overflow in BTOBB macro when passed value was nearly ULLONG_MAX. Signed-off-by: Tomas Racek <tracek@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-23 14:48:44 -05:00
Christoph Hellwig	7612903099	xfs: unlock the AGI buffer when looping in xfs_dialloc Also update some commens in the area to make the code easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-23 14:48:32 -05:00
Dave Chinner	0b9e3f6d84	xfs: fix uninitialised variable in xfs_rtbuf_get() Results in this assert failure in generic/090: XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363 ..... Call Trace: [<ffffffff814680db>] xfs_bmapi_read+0x6b/0x370 [<ffffffff814b64b2>] xfs_rtbuf_get+0x42/0x130 [<ffffffff814b6f09>] xfs_rtget_summary+0x89/0x120 [<ffffffff814b7bfe>] xfs_rtallocate_extent_size+0xce/0x340 [<ffffffff814b89f0>] xfs_rtallocate_extent+0x240/0x290 [<ffffffff81462c1a>] xfs_bmap_rtalloc+0x1ba/0x340 [<ffffffff81463a65>] xfs_bmap_alloc+0x35/0x40 [<ffffffff8146f111>] xfs_bmapi_allocate+0xf1/0x350 [<ffffffff8146f9de>] xfs_bmapi_write+0x66e/0xa60 [<ffffffff8144538a>] xfs_iomap_write_direct+0x22a/0x3f0 [<ffffffff8143707b>] __xfs_get_blocks+0x38b/0x5d0 [<ffffffff814372d4>] xfs_get_blocks_direct+0x14/0x20 [<ffffffff811b0081>] do_blockdev_direct_IO+0xf71/0x1eb0 [<ffffffff811b1015>] __blockdev_direct_IO+0x55/0x60 [<ffffffff814355ca>] xfs_vm_direct_IO+0x11a/0x1e0 [<ffffffff8112d617>] generic_file_direct_write+0xd7/0x1b0 [<ffffffff8143e16c>] xfs_file_dio_aio_write+0x13c/0x320 [<ffffffff8143e6f2>] xfs_file_aio_write+0x1c2/0x1d0 [<ffffffff81174a07>] do_sync_write+0xa7/0xe0 [<ffffffff81175288>] vfs_write+0xa8/0x160 [<ffffffff81175702>] sys_pwrite64+0x92/0xb0 [<ffffffff81b68f69>] system_call_fastpath+0x16/0x1b Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-23 14:48:16 -05:00
Artem Bityutskiy	4b3e58a6d6	UBIFS: improve scanning debug output Include LEB number and offset in scanning debugging output. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-23 17:29:35 +03:00
Hugh Dickins	676ce6d5ca	block: replace __getblk_slow misfix by grow_dev_page fix Commit `91f68c89d8` ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert `91f68c89d8`, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-23 12:17:36 +02:00
Linus Torvalds	f753c4ec15	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "Jim's fix closes a narrow race introduced with the msgr changes. One fix resolves problems with debugfs initialization that Yan found when multiple client instances are created (e.g., two clusters mounted, or rbd + cephfs), another one fixes problems with mounting a nonexistent server subdirectory, and the last one fixes a divide by zero error from unsanitized ioctl input that Dan Carpenter found." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: avoid divide by zero in __validate_layout() libceph: avoid truncation due to racing banners ceph: tolerate (and warn on) extraneous dentry from mds libceph: delay debugfs initialization until we learn global_id	2012-08-22 09:58:05 -07:00
Linus Torvalds	ad746be969	NFS client bugfixes for Linux 3.6 - NFSv3 mounts need to fail if the FSINFO rpc call fails - Ensure that the NFS commit cache gets torn down when we unload the NFS module. - Fix memory scribble issues when interrupting a LAYOUTGET rpc call - Fix NFSv4 legacy idmapper regressions - Fix issues with the NFSv4 getacl command - Fix a regression when using the legacy "mount -t nfs4" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQNPyKAAoJEGcL54qWCgDyQlAP/AmnLaJmKMPj5P4GCiA/uxMV sr0XSSVAv+OgUtRxLNO5PDyunVIIEjwA5nGa2RZEG14+eYwNEeQehaGoaGqoKLhh ATOCMvHHEs5qSKLckWsZWflf6U/MLimYS3D/hHVUtP0QWbBMtol3ImABs2ht/VUc HW0wQoEG+5mhbhC87d4ku5E+OHQzYeNNmdX2VkKOmT0CwRd/bfsHxlGwMmcHldP2 TSge+MRhoKNycpy0j7nugHj2pZ12UiX+O8371OlAketnPk0OA+6RNGMyNo49IQAn VEN9MFWc4ypH1+hJ0OvhzmUA6Zn2/lyQXUtwM1DcXeBmLI4aClb3bT3G3xKNh09O yLEvkh2aUUlmoTSgP7vDK/Ui3QwVnsaJULvg+gywQJG6qTSFgUQfErNKgJmabHQb d0EuumlSGnJ7qdC5nmrUp+M/CpbfGh15ax/JnTRGVK7C3jeCm18vc2np+z4EUDp/ USzrvuBu1B8eI0nQNoht0BeNf7sEJGgFIn8BJzxDakP8buXPMqEvFwA8c1JmMeBX E6pbG2jHqPdrTJtxQTpMRqhA0MxbFlVNi5cCs6U7ifiBqbRWEXyNVgAHTwoIxIrn ASbL0s8gZH+EMEXNOmPmFiO2NUjBCeSzAjrsTmSpWa13YmGfiroZN8N/iPzLnFf+ FR3SFUnoJHq7x4uLZoAX =lqfh -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - NFSv3 mounts need to fail if the FSINFO rpc call fails - Ensure that the NFS commit cache gets torn down when we unload the NFS module. - Fix memory scribble issues when interrupting a LAYOUTGET rpc call - Fix NFSv4 legacy idmapper regressions - Fix issues with the NFSv4 getacl command - Fix a regression when using the legacy "mount -t nfs4" * tag 'nfs-for-3.6-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: Ensure that do_proc_get_root() reports errors correctly NFSv4: Ensure that nfs4_alloc_client cleans up on error. NFS: return -ENOKEY when the upcall fails to map the name NFS: Clear key construction data if the idmap upcall fails NFSv4: Don't use private xdr_stream fields in decode_getacl NFSv4: Fix the acl cache size calculation NFSv4: Fix pointer arithmetic in decode_getacl NFS: Alias the nfs module to nfs4 NFS: Fix a regression when loading the NFS v4 module NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done pnfs-obj: Better IO pattern in case of unaligned offset NFS41: add pg_layout_private to nfs_pageio_descriptor pnfs: nfs4_proc_layoutget returns void pnfs: defer release of pages in layoutget nfs: tear down caches in nfs_init_writepagecache when allocation fails	2012-08-22 09:57:25 -07:00
Linus Torvalds	467e9e51d0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull assorted fixes - mostly vfs - from Al Viro: "Assorted fixes, with an unexpected detour into vfio refcounting logics (fell out when digging in an analog of eventpoll race in there)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: task_work: add a scheduling point in task_work_run() fs: fix fs/namei.c kernel-doc warnings eventpoll: use-after-possible-free in epoll_create1() vfio: grab vfio_device reference before exposing the sucker via fd_install() vfio: get rid of vfio_device_put()/vfio_group_get_device* races vfio: get rid of open-coding kref_put_mutex introduce kref_put_mutex() vfio: don't dereference after kfree... mqueue: lift mnt_want_write() outside ->i_mutex, clean up a bit	2012-08-22 09:56:06 -07:00
Artem Bityutskiy	db0f896906	UBIFS: always print full error reports Even when we are emulating power cuts, otherwise it is difficult to investigate failures during emulated power cuts testing. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-22 17:41:44 +03:00
Artem Bityutskiy	c45640d87b	UBIFS: print PID in debug messages Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-22 17:41:44 +03:00
Artem Bityutskiy	69f9025894	UBIFS: fix error messages spelling Corruptio -> corruption. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-22 17:41:09 +03:00
Randy Dunlap	55852635a8	fs: fix fs/namei.c kernel-doc warnings Fix kernel-doc warnings in fs/namei.c: Warning(fs/namei.c:360): No description found for parameter 'inode' Warning(fs/namei.c:672): No description found for parameter 'nd' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:30:10 -04:00
Al Viro	98022748f6	eventpoll: use-after-possible-free in epoll_create1() As soon as we'd installed the file into descriptor table, it can get closed by another thread. Freeing ep in process... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-22 10:26:55 -04:00
Artem Bityutskiy	65b455b123	UBIFS: fix complaints about too small debug buffer size When debugging is enabled, we use a temporary on-stack buffer for formatting the key strings like "(11368871, direntry, 0xcd0750)". The buffer size is 32 bytes and sometimes it is not enough to fit the key string - e.g., when inode numbers are high. This is not fatal, but the key strings are incomplete and UBIFS complains like this: UBIFS assert failed in dbg_snprintf_key at 137 (pid 1) This is a regression caused by "515315a UBIFS: fix key printing". Fix the issue by increasing the buffer to 48 bytes. Reported-by: Michael Hench <michaelhench@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Tested-by: Michael Hench <michaelhench@gmail.com> Cc: stable@vger.kernel.org [v3.3+]	2012-08-22 11:51:33 +03:00
Sage Weil	45f2e081f5	ceph: avoid divide by zero in __validate_layout() If "l->stripe_unit" is zero the the mod on the next line will cause a divide by zero bug. This comes from the copy_from_user() in ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means "do not change") so avoid the % check in that case. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-08-21 15:55:28 -07:00
Sage Weil	6c5e50fa61	ceph: tolerate (and warn on) extraneous dentry from mds If the MDS gives us a dentry and we weren't prepared to handle it, WARN_ON_ONCE instead of crashing. Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-08-21 15:55:25 -07:00
Artem Bityutskiy	c212f4020d	UBIFS: fix replay regression Commit "d51f17e UBIFS: simplify reply code a bit" introduces a bug with the following symptoms: UBIFS error (pid 1): replay_log_leb: first CS node at LEB 3:0 has wrong commit number 0 expected 1 The issue is that we start replaying the log from UBIFS_LOG_LNUM instead of c->lhead_lnum. This patch fixes that. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-21 15:25:25 +03:00
Artem Bityutskiy	11e3be0be2	UBIFS: fix crash on error path This patch fixes a regression introduced by "4994297 UBIFS: make ubifs_lpt_init clean-up in case of failure" which I've hit while running the 'integck -p' test. When remount the file-system from R/O mode to R/W mode and 'lpt_init_wr()' fails, we free _all_ LPT resources by calling 'ubifs_lpt_free(c, 0)', even those needed for R/O mode. This leads to subsequent crashes, e.g., if we try to unmount the file-system. Cc: stable@vger.kernel.org [v3.5+] Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-21 15:25:24 +03:00
Artem Bityutskiy	73e8712aa0	UBIFS: remove stale commentary Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-08-21 15:25:24 +03:00
J. Bruce Fields	39307655a1	nfsd4: fix security flavor of NFSv4.0 callback Commit `d5497fc693` "nfsd4: move rq_flavor into svc_cred" forgot to remove cl_flavor from the client, leaving two places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored. After that patch, the latter was the one that was updated, but the former was the one that the callback used. Symptoms were a long delay on utime(). This is because the utime() generated a setattr which recalled a delegation, but the cb_recall was ignored by the client because it had the wrong security flavor. Cc: stable@vger.kernel.org Tested-by: Jamie Heilman <jamie@audible.transient.net> Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-08-20 18:38:36 -04:00
Tejun Heo	43829731dd	workqueue: deprecate flush[_delayed]_work_sync() flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>	2012-08-20 14:51:24 -07:00
Al Viro	0e665d5d11	vfs: missed source of ->f_pos races compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-20 10:11:47 -07:00
Sage Weil	d1c338a509	libceph: delay debugfs initialization until we learn global_id The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-08-20 10:03:15 -07:00
Trond Myklebust	0866004304	NFSv3: Ensure that do_proc_get_root() reports errors correctly If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-20 12:52:42 -04:00
Trond Myklebust	7653f6ff4e	NFSv4: Ensure that nfs4_alloc_client cleans up on error. Any pointer that was allocated through nfs_alloc_client() needs to be freed via a call to nfs_free_client(). Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-20 12:12:29 -04:00
Pavel Shilovsky	ea7b4887e7	CIFS: Fix cifs_do_create error hadnling Commit `d2c127197d` caused a regression in cifs_do_create error handling. Fix this by closing a file handle in the case of a get_inode_info(_unix) error. Also remove unnecessary checks for newinode being NULL. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-08-19 22:30:18 -05:00
Steve French	985e4ff016	cifs: print error code if smb signature verification fails While trying to debug a SMB signature related issue with Windows Servers figured out it might be easier to debug if we print the error code from cifs_verify_signature(). Also, fix indendation while at it. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-08-19 22:30:13 -05:00
Pavel Shilovsky	7411286088	CIFS: Fix log messages in packet checking for SMB2 Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-08-19 22:30:07 -05:00
Steve French	b7ca692896	CIFS: Protect i_nlink from being negative that can cause warning messages. Pavel had initially suggested a smaller patch around drop_nlink, after a similar problem was discovered NFS. Protecting additional places where nlink is touched was suggested by Jeff Layton and is included in this. Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-08-19 22:30:00 -05:00
Linus Torvalds	20fb1936de	Merge branch 'vfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull vfs fixes from Miklos Szeredi. This mainly fixes some confusion about whether the open 'mode' variable passed around should contain the full file type (S_IFREG etc) information or just the permission mode. In particular, the lack of proper file type information had confused fuse. * 'vfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: fix propagation of atomic_open create error on negative dentry fuse: check create mode in atomic open vfs: pass right create mode to may_o_create() vfs: atomic_open(): fix create mode usage vfs: canonicalize create mode in build_open_flags()	2012-08-18 10:02:17 -07:00
Linus Torvalds	ef824bfba2	The following are all bug fixes and regressions. The most notable are the ones which cause problems for ext4 on RAID --- a performance problem when mounting very large filesystems, and a kernel OOPS when doing an rm -rf on large directory hierarchies on fast devices. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQLlVSAAoJENNvdpvBGATwhOYQAKx22YF9+SHnnVv2GtCQrsE6 N3acE/FoDYiO1/LRa5M6NDO3ZIL4Vqi6409LYFQky/SQL8ziM5CBeLOBD9qPTE1L AGrzWn4vzZcjEf90ZEBN99fS5Uj1A9Axz2denxy7Hfb2HcSXfcuuVQXpdS42cFo8 gwtlX4jxmPkbjRlAdeqYVBNuWpTZ0a+UyLc4A6v3aLcbzPSJvPYmI7mKCksiSySU yt0atjMDb56blAQJ2TITdAZN6rQShNzyok2pPfxaLusl5g0Gtjq8sSEPof1PQ1zq gDFc+kpZvUyPdwQzV3IL8+TodFFZ0x/2OhqoYCTKajROLHGjQFsdkb8EJbnNeDff EDxIjeVJR0kzSuSNWu+n5028lmEd9Sk7Ykr37cHeUxG8/0SADUqSDQYNhvbOPQsj iq5dwF79tKjuMqjJrABuWA1ZNgHBISXgyBmHLXgEk3LrgucT9UIg8Zlkhq480SYO JXhmkO2Ka4UwkVUShoWgRtEzRgUxhINBShs6g67zwm6slS4s2CWHnqhUn6EQe6+r DY/hoUA8KbdG3Cf5iJBFM2kUO68CDIXeJjocA7JvlouRgQSxmkOceIuk7DusAitM nHJKAtSNgC/z7yMoNi7YN0S5YYcCebmO1MEPzYSpPH07YwLJVNmh9Fk6BIfb7vi3 vJSQMBrgGrbaXrnhBA7z =Zulv -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "The following are all bug fixes and regressions. The most notable are the ones which cause problems for ext4 on RAID --- a performance problem when mounting very large filesystems, and a kernel OOPS when doing an rm -rf on large directory hierarchies on fast devices." * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix kernel BUG on large-scale rm -rf commands ext4: fix long mount times on very big file systems ext4: don't call ext4_error while block group is locked ext4: avoid kmemcheck complaint from reading uninitialized memory ext4: make sure the journal sb is written in ext4_clear_journal_err()	2012-08-17 08:04:47 -07:00
Ian Kent	d807ff838f	autofs4 - fix expire check In some cases when an autofs indirect mount is contained in a file system that is marked as shared (such as when systemd does the equivalent of "mount --make-rshared /" early in the boot), mounts stop expiring. When this happens the first expiry check on a mountpoint dentry in autofs_expire_indirect() sees a mountpoint dentry with a higher than minimal reference count. Consequently the dentry is condidered busy and the actual expiry check is never done. This particular check was originally meant as an optimisation to detect a path walk in progress but with the addition of rcu-walk it can be ineffective anyway. Removing the test allows automounts to expire again since the actual expire check doesn't rely on the dentry reference count. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-17 06:56:39 -07:00
Theodore Ts'o	89a4e48f84	ext4: fix kernel BUG on large-scale rm -rf commands Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit `968dee7722` was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-08-17 09:42:17 -04:00
Theodore Ts'o	0548bbb853	ext4: fix long mount times on very big file systems Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-08-17 09:23:00 -04:00
Theodore Ts'o	7a4c5de27e	ext4: don't call ext4_error while block group is locked While in ext4_validate_block_bitmap(), if an block allocation bitmap is found to be invalid, we call ext4_error() while the block group is still locked. This causes ext4_commit_super() to call a function which might sleep while in an atomic context. There's no need to keep the block group locked at this point, so hoist the ext4_error() call up to ext4_validate_block_bitmap() and release the block group spinlock before calling ext4_error(). The reported stack trace can be found at: http://article.gmane.org/gmane.comp.file-systems.ext4/33731 Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-08-17 09:06:06 -04:00
Tomas Racek	643bfc061c	xfs: check for possible overflow in xfs_ioc_trim If range.start or range.minlen is bigger than filesystem size, return invalid value error. This fixes possible overflow in BTOBB macro when passed value was nearly ULLONG_MAX. Signed-off-by: Tomas Racek <tracek@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-16 16:42:52 -05:00
Christoph Hellwig	c4982110ae	xfs: unlock the AGI buffer when looping in xfs_dialloc Also update some commens in the area to make the code easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-16 16:23:59 -05:00
Bryan Schumaker	12dfd08055	NFS: return -ENOKEY when the upcall fails to map the name This allows the normal error-paths to handle the error, rather than making a special call to complete_request_key() just for this instance. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Cc: stable@vger.kernel.org [>= 3.4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 17:20:06 -04:00
Bryan Schumaker	c5066945b7	NFS: Clear key construction data if the idmap upcall fails idmap_pipe_downcall already clears this field if the upcall succeeds, but if it fails (rpc.idmapd isn't running) the field will still be set on the next call triggering a BUG_ON(). This patch tries to handle all possible ways that the upcall could fail and clear the idmap key data for each one. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Cc: stable@vger.kernel.org [>= 3.4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 17:20:02 -04:00
Trond Myklebust	cff298c721	NFSv4: Don't use private xdr_stream fields in decode_getacl Instead of using the private field xdr->p from struct xdr_stream, use the public xdr_stream_pos(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 16:15:51 -04:00
Trond Myklebust	b291f1b1c8	NFSv4: Fix the acl cache size calculation Currently, we do not take into account the size of the 16 byte struct nfs4_cached_acl header, when deciding whether or not we should cache the acl data. Consequently, we will end up allocating an 8k buffer in order to fit a maximum size 4k acl. This patch adjusts the calculation so that we limit the cache size to 4k for the acl header+data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 16:15:50 -04:00
Trond Myklebust	519d3959e3	NFSv4: Fix pointer arithmetic in decode_getacl Resetting the cursor xdr->p to a previous value is not a safe practice: if the xdr_stream has crossed out of the initial iovec, then a bunch of other fields would need to be reset too. Fix this issue by using xdr_enter_page() so that the buffer gets page aligned at the bitmap _before_ we decode it. Also fix the confusion of the ACL length with the page buffer length by not adding the base offset to the ACL length... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-08-16 16:15:50 -04:00
bjschuma@gmail.com	425e776d93	NFS: Alias the nfs module to nfs4 This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 16:15:49 -04:00
bjschuma@gmail.com	1ae811ee27	NFS: Fix a regression when loading the NFS v4 module Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias pointing to nfs.ko, rather than nfs4.ko. This can prevent the v4 module from loading on mount, since the kernel sees that something named "nfs4" has already been loaded. To work around this, I've renamed the modules to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko". I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount -t nfs4` still works. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-16 16:15:49 -04:00
Ian Kent	a45440f05e	autofs4 - fix get_next_positive_subdir() Following a report of a crash during an automount expire I found that the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong. Not only is the locking wrong but the function is more complex than it needs to be. The function is meant to calculate (and dget) the next entry in the list of directories contained in the root of an autofs mount point (an autofs indirect mount to be precise). The main problem was that the d_lock of the owner of the list was not being taken when walking the list, which lead to list corruption under load. The only other lock that needs to be taken is against the next dentry candidate so it can be checked for usability. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-16 11:58:28 -07:00
Linus Torvalds	2eac9eb8a2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: verify all ioctl retry iov elements fuse: add missing INIT flag descriptions fuse: add missing INIT flags fuse: update attributes on aio_read fuse: invalidate inode mapping if mtime changes fuse: add FUSE_AUTO_INVAL_DATA init flag	2012-08-16 11:46:31 -07:00
Chris Wright	3cd52ab68b	debugfs: make __create_file static It's only used locally, no need to pollute global namespace. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-08-16 11:41:51 -07:00
Alex Elder	1ed845df60	xfs: kill struct declarations in xfs_mount.h I noticed that "struct xfs_mount_args" was still declared in "fs/xfs/xfs_mount.h". That struct doesn't even exist any more (and is obviously not referenced elsewhere in that header file). While in there, delete four other unneeded struct declarations in that file. Doing so highlights that "fs/xfs/xfs_trace.h" was relying indirectly on "xfs_mount.h" to be #included in order to declare "struct xfs_bmbt_irec", so add that declaration to resolve that issue. Signed-off-by: Alex Elder <elder@inktank.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-16 13:29:35 -05:00
Dave Chinner	a76cccbeef	xfs: fix uninitialised variable in xfs_rtbuf_get() Results in this assert failure in generic/090: XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363 ..... Call Trace: [<ffffffff814680db>] xfs_bmapi_read+0x6b/0x370 [<ffffffff814b64b2>] xfs_rtbuf_get+0x42/0x130 [<ffffffff814b6f09>] xfs_rtget_summary+0x89/0x120 [<ffffffff814b7bfe>] xfs_rtallocate_extent_size+0xce/0x340 [<ffffffff814b89f0>] xfs_rtallocate_extent+0x240/0x290 [<ffffffff81462c1a>] xfs_bmap_rtalloc+0x1ba/0x340 [<ffffffff81463a65>] xfs_bmap_alloc+0x35/0x40 [<ffffffff8146f111>] xfs_bmapi_allocate+0xf1/0x350 [<ffffffff8146f9de>] xfs_bmapi_write+0x66e/0xa60 [<ffffffff8144538a>] xfs_iomap_write_direct+0x22a/0x3f0 [<ffffffff8143707b>] __xfs_get_blocks+0x38b/0x5d0 [<ffffffff814372d4>] xfs_get_blocks_direct+0x14/0x20 [<ffffffff811b0081>] do_blockdev_direct_IO+0xf71/0x1eb0 [<ffffffff811b1015>] __blockdev_direct_IO+0x55/0x60 [<ffffffff814355ca>] xfs_vm_direct_IO+0x11a/0x1e0 [<ffffffff8112d617>] generic_file_direct_write+0xd7/0x1b0 [<ffffffff8143e16c>] xfs_file_dio_aio_write+0x13c/0x320 [<ffffffff8143e6f2>] xfs_file_aio_write+0x1c2/0x1d0 [<ffffffff81174a07>] do_sync_write+0xa7/0xe0 [<ffffffff81175288>] vfs_write+0xa8/0x160 [<ffffffff81175702>] sys_pwrite64+0x92/0xb0 [<ffffffff81b68f69>] system_call_fastpath+0x16/0x1b Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-08-16 12:53:12 -05:00
Sage Weil	62b2ce964b	vfs: fix propagation of atomic_open create error on negative dentry If ->atomic_open() returns -ENOENT, we take care to return the create error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1 and provides a negative dentry. This fixes a regression where an unprivileged open O_CREAT fails with ENOENT instead of EACCES, introduced with the new atomic_open code. It is tested by the open/08.t test in the pjd posix test suite, and was observed on top of fuse (backed by ceph-fuse). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-08-16 19:29:09 +02:00
Nikola Pajkovsky	68766a2edc	udf: fix retun value on error path in udf_load_logicalvol In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 14:23:23 +02:00
Jan Kara	2e84f2641e	jbd: don't write superblock when unmounting an ro filesystem This sequence: results in an IO error when unmounting the RO filesystem. The bug was introduced by: commit `9754e39c7b` Author: Jan Kara <jack@suse.cz> Date: Sat Apr 7 12:33:03 2012 +0200 jbd: Split updating of journal superblock and marking journal empty which lost some of the magic in journal_update_superblock() which used to test for a journal with no outstanding transactions. This is a port of a jbd2 fix by Eric Sandeen. CC: <stable@vger.kernel.org> # 3.4.x Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 13:53:30 +02:00
Miklos Szeredi	af109bca94	fuse: check create mode in atomic open Verify that the VFS is passing us a complete create mode with the S_IFREG to atomic open. Reported-by: Steve <steveamigauk@yahoo.co.uk> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2012-08-15 13:01:24 +02:00
Miklos Szeredi	38227f78a5	vfs: pass right create mode to may_o_create() Pass the umask-ed create mode to may_o_create() instead of the original one. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2012-08-15 13:01:24 +02:00
Miklos Szeredi	62b259d8b3	vfs: atomic_open(): fix create mode usage Don't mask S_ISREG off the create mode before passing to ->atomic_open(). Other methods (->create, ->mknod) also get the complete file mode and filesystems expect it. Reported-by: Steve <steveamigauk@yahoo.co.uk> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2012-08-15 13:01:24 +02:00
Miklos Szeredi	e68726ff72	vfs: canonicalize create mode in build_open_flags() Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) \| S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> CC: stable@vger.kernel.org	2012-08-15 13:01:24 +02:00
Eric W. Biederman	adb37c4c67	userns: Make seq_file's user namespace accessible struct file already has a user namespace associated with it in file->f_cred->user_ns, unfortunately because struct seq_file has no struct file backpointer associated with it, it is difficult to get at the user namespace in seq_file context. Therefore add a helper function seq_user_ns to return the associated user namespace and a user_ns field to struct seq_file to be used in implementing seq_user_ns. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:47:55 -07:00
Jeff Mahoney	48d1788493	reiserfs: fix deadlocks with quotas The BKL push-down for reiserfs made lock recursion a special case that needs to be handled explicitly. One of the cases that was unhandled is dropping the quota during inode eviction. Both reiserfs_evict_inode and reiserfs_write_dquot take the write lock, but when the journal lock is taken it only drops one the references. The locking rules are that the journal lock be acquired before the write lock so leaving the reference open leads to a ABBA deadlock. This patch pushes the unlock up before clear_inode and avoids the recursive locking. Another ABBA situation can occur when the write lock is dropped while reading the bitmap buffer while in the quota code. When the lock is reacquired, it will deadlock against dquot->dq_lock and dqopt->dqio_mutex in the dquot_acquire path. It's safe to retain the lock across the read and should be cached under write load. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 00:22:57 +02:00
Jeff Liu	6ea2eea1fa	quota: Move down dqptr_sem read after initializing default warn[] type at __dquot_alloc_space(). sb->s_dqopt->dqptr_sem is used to serialize ops using pointers from inode to dquots. But for __dquot_alloc_space(), it could be safely moved down after the default warn[] array got initialized. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 00:22:57 +02:00
Ashish Sangwan	dc141a402b	UDF: During mount free lvid_bh before rescanning with different blocksize If s_lvid_bh is not freed and set to NULL before re-scanning partition with default block size, we might end up using wrong lvid in case s_lvid_bh is not updated in udf_load_logicalvolint during rescan. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 00:22:56 +02:00
Ian Abbott	bb2b6d19ec	udf: fix udf_setsize() for file data in ICB If the new size is larger than the old size and the old file data was stored in the ICB (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) and the new size still fits in the ICB, skip the call to udf_extend_file() as it does not handle this i_alloc_type value (it calls BUG()). Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2012-08-15 00:21:58 +02:00
Tejun Heo	41f63c5359	workqueue: use mod_delayed_work() instead of cancel + queue Convert delayed_work users doing cancel_delayed_work() followed by queue_delayed_work() to mod_delayed_work(). Most conversions are straight-forward. Ones worth mentioning are, * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always use mod_delayed_work() and cancel loop in edac_mc_reset_delay_period() is dropped. * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether watchdog is active or not. @fan_watchdog_active and related code dropped. * drivers/power/charger-manager.c: Seemingly a lot of delayed_work_pending() abuse going on here. [delayed_]work_pending() are unsynchronized and racy when used like this. I converted one instance in fullbatt_handler(). Please conver the rest so that it invokes workqueue APIs for the intended target state rather than trying to game work item pending state transitions. e.g. if timer should be modified - call mod_delayed_work(), canceled - call cancel_delayed_work[_sync](). * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling() simplified. Note that round_jiffies() calls in this function are meaningless. round_jiffies() work on absolute jiffies not delta delay used by delayed_work. v2: Tomi pointed out that __cancel_delayed_work() users can't be safely converted to mod_delayed_work(). They could be calling it from irq context and if that happens while delayed_work_timer_fn() is running, it could deadlock. __cancel_delayed_work() users are dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Doug Thompson <dougthompson@xmission.com> Cc: David Airlie <airlied@linux.ie> Cc: Roland Dreier <roland@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Johannes Berg <johannes@sipsolutions.net>	2012-08-13 16:27:37 -07:00
Ying Xue	9c5bef5849	dlm: cleanup send_to_sock routine Remove unnecessary code form send_to_sock routine. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-13 10:03:18 -05:00
Linus Torvalds	15fc5deb1f	Merge branch 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs merge fix from Chris Mason: "This fixes a merge error in rc1. The calls to mnt_want_write should have been removed." * 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove mnt_want_write call in btrfs_mksubvol	2012-08-12 21:28:41 +03:00
Ying Xue	4dd40f0cd9	dlm: convert add_sock routine return value type to void Since add_sock() always returns a success code - 0, its return value type should be changed from integer to void. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-10 09:10:10 -05:00
Xue Ying	b4c798cf69	dlm: remove redundant variable assignments Once the tcp_create_listen_sock() is returned successfully, we will invoke add_sock() immediately. In add_sock(), the 'con' variable is assigned to 'sk_user_data', meanwhile, the 'sock' is also set to 'con->sock'. So it's unnecessary to do the same thing in tcp_create_listen_sock(). Signed-off-by: Xue Ying <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-10 09:10:10 -05:00
Alexander Block	e00da2067b	Btrfs: remove mnt_want_write call in btrfs_mksubvol We got a recursive lock in mksubvol because the caller already held a lock. I think we got into this due to a merge error. Commit `a874a63` removed the mnt_want_write call from btrfs_mksubvol and added a replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid. Commit `e7848683` however tried to move all calls to mnt_want_write above i_mutex. So somewhere while merging this, it got mixed up. The solution is to remove the mnt_want_write call completely from mksubvol. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-08-09 11:01:54 -04:00
Fengguang Wu	647d1e4c52	block: move down direct IO plugging Move unplugging for direct I/O from around ->direct_IO() down to do_blockdev_direct_IO(). This implicitly adds plugging for direct writes. CC: Li Shaohua <shli@fusionio.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-09 15:23:09 +02:00
Alexey Khoroshilov	389d7b26d9	bio: Fix potential memory leak in bio_find_or_create_slab() Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-09 15:19:25 +02:00
Trond Myklebust	47fbf7976e	NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done Ever since commit `0a57cdac3f` (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>=3.5]	2012-08-08 16:03:13 -04:00
David Teigland	475f230c60	dlm: fix unlock balance warnings The in_recovery rw_semaphore has always been acquired and released by different threads by design. To work around the "BUG: bad unlock balance detected!" messages, adjust things so the dlm_recoverd thread always does both down_write and up_write. Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-08 11:33:49 -05:00
David Teigland	6ad2291624	dlm: fix uninitialized spinlock Use DEFINE_SPINLOCK for global dlm_cb_seq_spin. Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-08 11:33:43 -05:00
David Teigland	36b71a8bfb	dlm: fix deadlock between dlm_send and dlm_controld A deadlock sometimes occurs between dlm_controld closing a lowcomms connection through configfs and dlm_send looking up the address for a new connection in configfs. dlm_controld does a configfs rmdir which calls dlm_lowcomms_close which waits for dlm_send to cancel work on the workqueues. The dlm_send workqueue thread has called tcp_connect_to_sock which calls dlm_nodeid_to_addr which does a configfs lookup and blocks on a lock held by dlm_controld in the rmdir path. The solution here is to save the node addresses within the lowcomms code so that the lowcomms workqueue does not need to step through configfs to get a node address. dlm_controld: wait_for_completion+0x1d/0x20 __cancel_work_timer+0x1b3/0x1e0 cancel_work_sync+0x10/0x20 dlm_lowcomms_close+0x4c/0xb0 [dlm] drop_comm+0x22/0x60 [dlm] client_drop_item+0x26/0x50 [configfs] configfs_rmdir+0x180/0x230 [configfs] vfs_rmdir+0xbd/0xf0 do_rmdir+0x103/0x120 sys_rmdir+0x16/0x20 dlm_send: mutex_lock+0x2b/0x50 get_comm+0x34/0x140 [dlm] dlm_nodeid_to_addr+0x18/0xd0 [dlm] tcp_connect_to_sock+0xf4/0x2d0 [dlm] process_send_sockets+0x1d2/0x260 [dlm] worker_thread+0x170/0x2a0 Signed-off-by: David Teigland <teigland@redhat.com>	2012-08-08 11:33:35 -05:00
Zach Brown	fb6ccff667	fuse: verify all ioctl retry iov elements Commit `7572777eef` attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: <stable@vger.kernel.org> [2.6.37+]	2012-08-06 18:19:24 +02:00
Theodore Ts'o	7e731bc9a1	ext4: avoid kmemcheck complaint from reading uninitialized memory Commit `03179fe923` introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-08-05 23:28:16 -04:00
Theodore Ts'o	d796c52ef0	ext4: make sure the journal sb is written in ext4_clear_journal_err() After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-08-05 19:04:57 -04:00
Al Viro	fe7c80518e	missed mnt_drop_write() in do_dentry_open() This one ought to be __mnt_drop_write(), to match __mnt_want_write() in the beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:41 +04:00
Artem Bityutskiy	5c57f20b82	UBIFS: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from UBIFS comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:41 +04:00
Artem Bityutskiy	e76e0ec984	gfs2: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from gfs comments. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:40 +04:00
Artem Bityutskiy	166ac34b74	nilfs2: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ntfs. Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:38 +04:00
Artem Bityutskiy	50640bcc0a	hfs: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from hfs. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:38 +04:00
Artem Bityutskiy	0d5c3eba2e	vfs: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from vfs comments. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:37 +04:00
Artem Bityutskiy	12810ad708	jbd/jbd2: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from various jbd and jbd2. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:36 +04:00
Artem Bityutskiy	b257031408	btrfs: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from btrfs comments. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:35 +04:00
Artem Bityutskiy	34eaadaf22	btrfs: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from btrfs. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:35 +04:00
Artem Bityutskiy	f6463b0da6	ext4: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from ext4 comments. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:34 +04:00
Artem Bityutskiy	7652bdfcb5	ext4: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:33 +04:00
Artem Bityutskiy	d3009c6cff	ext3: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ext3. Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:32 +04:00
Eric W. Biederman	81abe27b10	userns: Fix link restrictions to use uid_eq Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-08-03 19:23:45 -07:00
Artem Bityutskiy	f0cd2dbb6c	vfs: kill write_super and sync_supers Finally we can kill the 'sync_supers' kernel thread along with the '->write_super()' superblock operation because all the users are gone. Now every file-system is supposed to self-manage own superblock and its dirty state. The nice thing about killing this thread is that it improves power management. Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up every 5 seconds no matter what - even if there were no dirty superblocks and even if there were no file-systems using this service (e.g., btrfs and journalled ext4 do not need it). So it was wasting power most of the time. And because the thread was in the core of the kernel, all systems had to have it. So I am quite happy to make it go away. Interestingly, this thread is a left-over from the pdflush kernel thread which was a self-forking kernel thread responsible for all the write-back in old Linux kernels. It was turned into per-block device BDI threads, and 'sync_supers' was a left-over. Thus, R.I.P, pdflush as well. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 01:24:44 +04:00
Linus Torvalds	d42d1dabf3	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd Pull exofs update from Boaz Harrosh: "They are all mostly fixes, except the most important patch by Artem Bityutskiy which removes the use of s_dirt. After this patch s_dirt can be completely removed from the tree." * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Fix out-of-bounds access in _ios_obj() exofs: Use proper max_IO calculations from ore exofs: Fix __r4w_get_page when offset is beyond i_size exofs: stop using s_dirt exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page)	2012-08-03 13:24:07 -07:00
Boaz Harrosh	7de6e28417	pnfs-obj: Better IO pattern in case of unaligned offset Depending on layout and ARCH, ORE has some limits on max IO sizes which is communicated on (what else) ore_layout->max_io_length, which is always stripe aligned. This was considered as the pg_test boundary for splitting and starting a new IO. But in the case of a long IO where the start offset is not aligned what would happen is that both end of IO[N] and start of IO[N+1] would be unaligned, causing each IO boundary parity unit to be calculated and written twice. So what we do in this patch is split the very start of an unaligned IO, up to a stripe boundary, and then next IO's can continue fully aligned til the end. We might be sacrificing the case where the full unaligned IO would fit within a single max_io_length, but the sacrifice is well worth the elimination of double calculation and parity units IO. Actually the sacrificing is marginal and is almost unmeasurable. TODO: If we know the total expected linear segment that will be received, at pg_init, we could use that information in many places: 1. blocks-layout get_layout write segment size 2. Better mds-threshold 3. In above situation for a better clean split I will do this in future submission. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:42:51 -04:00
Peng Tao	f616638409	NFS41: add pg_layout_private to nfs_pageio_descriptor To allow layout driver to pass private information around pg_init/pg_doio. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:41:18 -04:00
Idan Kedar	21d1f58aed	pnfs: nfs4_proc_layoutget returns void since the only user of nfs4_proc_layoutget is send_layoutget, which ignores its return value, there is no reason to return any value. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:39:06 -04:00
Idan Kedar	8554116e17	pnfs: defer release of pages in layoutget we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:38:54 -04:00
Jeff Layton	3dd4765fce	nfs: tear down caches in nfs_init_writepagecache when allocation fails ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-08-02 17:36:07 -04:00
Linus Torvalds	c8924234bd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull two ceph fixes from Sage Weil: "The first patch fixes up the old crufty open intent code to use the atomic_open stuff properly, and the second fixes a possible null deref and memory leak with the crypto keys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix crypto key null deref, memory leak ceph: simplify+fix atomic_open	2012-08-02 10:57:31 -07:00
Linus Torvalds	410fc4ce8a	- Fixes a bug when the lower filesystem mount options include 'acl', but the eCryptfs mount options do not - Cleanups in the messaging code - Better handling of empty files in the lower filesystem to improve usability. Failed file creations are now cleaned up and empty lower files are converted into eCryptfs during open(). - The write-through cache changes are being reverted due to bugs that are not easy to fix. Stability outweighs the performance enhancements here. - Improvement to the mount code to catch unsupported ciphers specified in the mount options -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCgAGBQJQGhVWAAoJENaSAD2qAscKjvcQAMgF66sl8KjwzFCgojh30a4u 1hCXltxUCLjXxjXxyygUHb/pH0xdvC+Ss3FTtsPXAjgYm2lXjjoOmVG8WAvwHHx1 2ZjDo+8fQ4XA8Rl9kYuvt/abF0IssNRK3csTWplR7lpoQ8AWbkpkag1I4WZhibey cgs/zECl8ACTJ5zQ+AyRGnrssq4jI7xZAKWLK0+KKk7S9yIRI7K/xdz1xK39jGK6 N09Dw3VWY/bMcMq77ZXBtyHdP7KR7wKUtCeQttmCvdf20Ocy6AXzr1FRKvUxionF Sf31tJim0u9OO8hmy8cjCyWEy9LHnXnSd/5vn+Qd9ok9GvuiYmKw07rbXi/gjhBX ai5PKtl05WiQgp80BybUYfIY1Hq71MsppNi6h9Zgiid5rEvWWvCBWBWP95G8DTmC 6TwLaCG0rh8uuZyeiVrs3xZQ2IG5Zmu0CX3XGyfsaLvqmQWhtT5ZQVMeMQEEBxyQ ur9SSU2O/nC8ceLB7fzGmZPTLZUWOuYQnd24NJNK+7j0P+Km7pqmDYpCdwmFpx3C CQ0gGaJGHeycvBF327bwxPmsPdO4fy+nmEL8vrEXPTyQ3ZVAPvxZK9t8Jpk4UFOl JSTWFiK0mvgE+5dX2kB0nzN7iD7hcMgmozht50qQP3OUkI1kkBrEHgHVO3KzgUIA +aRAljeLdelq44JBxjiB =4vDd -----END PGP SIGNATURE----- Merge tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: - Fixes a bug when the lower filesystem mount options include 'acl', but the eCryptfs mount options do not - Cleanups in the messaging code - Better handling of empty files in the lower filesystem to improve usability. Failed file creations are now cleaned up and empty lower files are converted into eCryptfs during open(). - The write-through cache changes are being reverted due to bugs that are not easy to fix. Stability outweighs the performance enhancements here. - Improvement to the mount code to catch unsupported ciphers specified in the mount options * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: check for eCryptfs cipher support at mount eCryptfs: Revert to a writethrough cache model eCryptfs: Initialize empty lower files when opening them eCryptfs: Unlink lower inode when ecryptfs_create() fails eCryptfs: Make all miscdev functions use daemon ptr in file private_data eCryptfs: Remove unused messaging declarations and function eCryptfs: Copy up POSIX ACL and read-only flags from lower mount	2012-08-02 10:56:34 -07:00
Linus Torvalds	630103ea2c	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS update from Steve French: "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs. I am holding up a few more days on merging the remainder of the SMB2/SMB2.1 enablement although it is nearing review completion, in order to address some review comments from Jeff Layton on a few of the subsequent SMB2 patches, and also to debug an unrelated cifs problem that Pavel discovered." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Add SMB2 support for rmdir CIFS: Move rmdir code to ops struct CIFS: Add SMB2 support for mkdir operation CIFS: Separate protocol specific part from mkdir CIFS: Simplify cifs_mkdir call	2012-08-02 10:54:11 -07:00
Sage Weil	5ef50c3bec	ceph: simplify+fix atomic_open The initial ->atomic_open op was carried over from the old intent code, which was incomplete and didn't really work. Replace it with a fresh method. In particular: * always attempt to do an atomic open+lookup, both for the create case and for lookups of existing files. * fix symlink handling by returning 1 to the VFS so that we can follow the link to its destination. This fixes a longstanding ceph bug (#2392). Signed-off-by: Sage Weil <sage@inktank.com>	2012-08-02 09:11:19 -07:00
Boaz Harrosh	9e62bb4458	ore: Fix out-of-bounds access in _ios_obj() _ios_obj() is accessed by group_index not device_table index. The oc->comps array is only a group_full of devices at a time it is not like ore_comp_dev() which is indexed by a global device_table index. This did not BUG until now because exofs only uses a single COMP for all devices. But with other FSs like PanFS this is not true. This bug was only in the write_path, all other users were using it correctly [This is a bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 16:41:56 +03:00
Boaz Harrosh	be388f3d9a	exofs: Use proper max_IO calculations from ore exofs_max_io_pages should just use the ORE's calculated layout->max_io_length, And avoid unnecessary BUGs, calculations made here were also a layering violation. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 16:39:17 +03:00
Boaz Harrosh	4b74f6ea84	exofs: Fix __r4w_get_page when offset is beyond i_size It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global ZERO_PAGE, if offset is beyond i_size. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:58:22 +03:00
Artem Bityutskiy	66153f6e0f	exofs: stop using s_dirt Exofs has the '->write_super()' handler and makes some use of the '->s_dirt' superblock flag, but it really needs neither of them because it never sets 's_dirt' to one which means the VFS never calls its '->write_super()' handler. Thus, remove both. Note, I am trying to remove both 's_dirt' and 'write_super()' from VFS altogether once all users are gone. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:52:13 +03:00
Kautuk Consul	0e8d96dd2c	exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page) readpage_strip can be called from several code paths all of which require that the page be locked before any operations are carried out. Since we export the exofs_readpage callback to the VFS, add a BUG_ON to check for PageLocked(page) to make sure that this understanding is never compromised. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-08-02 14:52:12 +03:00
Jianpeng Ma	53362a05ae	fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices For regular file, write operaion used blk_plug function.But for block file,write operation did not use blk_plug. This patch is also for write-cache mode for block-device. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-08-02 09:50:39 +02:00
Linus Torvalds	a0e881b7c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is not dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...	2012-08-01 10:26:23 -07:00
J. Bruce Fields	068535f1fe	locks: remove unused lm_release_private In commit `3b6e2723f3` ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-01 09:01:46 -07:00
Linus Torvalds	ac694dbdbc	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...	2012-07-31 19:25:39 -07:00
Linus Torvalds	6dbb35b0a7	NFS client updates for Linux 3.6 Features include: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM cV9Ex8zh2Y4aiUXCy3uA =KSix -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO	2012-07-31 18:45:44 -07:00
Mel Gorman	192e501b04	nfs: prevent page allocator recursions with swap over NFS. GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO, just not of any filesystem data. The problem is that previously NOFS was correct because that avoids recursion into the NFS code. With swap-over-NFS, it is no longer correct as swap IO can lead to this recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:48 -07:00
Mel Gorman	a564b8f039	nfs: enable swap on NFS Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:48 -07:00
Mel Gorman	29418aa4bd	nfs: disable data cache revalidation for swapfiles The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
Mel Gorman	d56b4ddf77	nfs: teach the NFS client how to treat PG_swapcache pages Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
Minchan Kim	8e125cd855	vmscan: remove obsolete shrink_control comment `09f363c7` ("vmscan: fix shrinker callback bug in fs/super.c") fixed a shrinker callback which was returning -1 when nr_to_scan is zero, which caused excessive slab scanning. But `635697c6` ("vmscan: fix initial shrinker size handling") fixed the problem, again so we can freely return -1 although nr_to_scan is zero. So let's revert `09f363c7` because the comment added in `09f363c7` made an unnecessary rule. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:43 -07:00
Aneesh Kumar K.V	24669e5847	hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages Use a mmu_gather instead of a temporary linked list for accumulating pages when we unmap a hugepage range Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:40 -07:00
Wanpeng Li	3965c9ae47	mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads Since per-BDI flusher threads were introduced in 2.6, the pdflush mechanism is not used any more. But the old interface exported through /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless. For back-compatibility, printk warning information and return 2 to notify the users that the interface is removed. Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:40 -07:00
Linus Torvalds	08843b79fb	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...	2012-07-31 14:42:28 -07:00
Linus Torvalds	cc8362b1f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...	2012-07-31 14:35:28 -07:00
Jeff Layton	ad0fcd4eb6	nfs: explicitly reject LOCK_MAND flock() requests We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly return -EINVAL if someone requests it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-31 14:42:20 -04:00
NeilBrown	b042414feb	nfs: increase number of permitted callback connections. By default a sunrpc service is limited to (N+3)*20 connections where N is the number of threads. This is 80 when N==1. If this number is exceeded a warning is printed suggesting that the number of threads be increased. However with services which run a single thread, this is impossible. For such services there is a ->sv_maxconn setting that can be used to forcibly increase the limit, and silence the message. This is used by lockd. The nfs client uses a sunrpc service to handle callbacks and it too is single-threaded, so to avoid the useless messages, and to allow a reasonable number of concurrent connections, we need to set ->sv_maxconn. 1024 seems like a good number. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-31 12:33:29 -04:00
Jan Kara	d9c95bdd53	fs: Remove old freezing mechanism Now that all users are converted, we can remove functions, variables, and constants defined by the old freezing mechanism. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:54 +04:00
Jan Kara	1e8b212fe5	ext2: Implement freezing The only missing piece to make freezing work reliably with ext2 is to stop iput() of unlinked inode from deleting the inode on frozen filesystem. So add a necessary protection to ext2_evict_inode(). We also provide appropriate ->freeze_fs and ->unfreeze_fs functions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:53 +04:00
Jan Kara	b2b5ef5c8e	btrfs: Convert to new freezing mechanism We convert btrfs_file_aio_write() to use new freeze check. We also add proper freeze protection to btrfs_page_mkwrite(). We also add freeze protection to the transaction mechanism to avoid starting transactions on frozen filesystem. At minimum this is necessary to stop iput() of unlinked file to change frozen filesystem during truncation. Checks in cleaner_kthread() and transaction_kthread() can be safely removed since btrfs_freeze() will lock the mutexes and thus block the threads (and they shouldn't have anything to do anyway). CC: linux-btrfs@vger.kernel.org CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:52 +04:00
Jan Kara	2c22b337b5	nilfs2: Convert to new freezing mechanism We change nilfs_page_mkwrite() to provide proper freeze protection for writeable page faults (we must wait for frozen filesystem even if the page is fully mapped). We remove all vfs_check_frozen() checks since they are now handled by the generic code. CC: linux-nilfs@vger.kernel.org CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:52 +04:00
Jan Kara	fbf8fb7650	ntfs: Convert to new freezing mechanism Move check in ntfs_file_aio_write_nolock() to ntfs_file_aio_write() and use new freeze protection. CC: linux-ntfs-dev@lists.sourceforge.net CC: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:51 +04:00
Jan Kara	58ef6a75c3	fuse: Convert to new freezing mechanism Convert check in fuse_file_aio_write() to using new freeze protection. CC: fuse-devel@lists.sourceforge.net CC: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:50 +04:00
Jan Kara	39263d5e71	gfs2: Convert to new freezing mechanism We update gfs2_page_mkwrite() to use new freeze protection and the transaction code to use freeze protection while the transaction is running. That is needed to stop iput() of unlinked file from modifying the filesystem. The rest is handled by the generic code. CC: cluster-devel@redhat.com CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:50 +04:00
Jan Kara	fef6925cd4	ocfs2: Convert to new freezing mechanism Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze protection. We also protect several ioctl entry points which were missing the protection. Finally, we add freeze protection to the journaling mechanism so that iput() of unlinked inode cannot modify a frozen filesystem. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:49 +04:00
Jan Kara	d9457dc056	xfs: Convert to new freezing code Generic code now blocks all writers from standard write paths. So we add blocking of all writers coming from ioctl (we get a protection of ioctl against racing remount read-only as a bonus) and convert xfs_file_aio_write() to a non-racy freeze protection. We also keep freeze protection on transaction start to block internal filesystem writes such as removal of preallocated blocks. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> CC: xfs@oss.sgi.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:48 +04:00
Jan Kara	8e8ad8a57c	ext4: Convert to new freezing mechanism We remove most of frozen checks since upper layer takes care of blocking all writes. We have to handle protection in ext4_page_mkwrite() in a special way because we cannot use generic block_page_mkwrite(). Also we add a freeze protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify a frozen filesystem (we cannot easily instrument ext4_journal_start() / ext4_journal_stop() with freeze protection because we are missing the superblock pointer in ext4_journal_stop() in nojournal mode). CC: linux-ext4@vger.kernel.org CC: "Theodore Ts'o" <tytso@mit.edu> BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:48 +04:00
Jan Kara	14da920014	fs: Protect write paths by sb_start_write - sb_end_write There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:47 +04:00
Jan Kara	5d37e9e6de	fs: Skip atime update on frozen filesystem It is unexpected to block reading of frozen filesystem because of atime update. Also handling blocking on frozen filesystem because of atime update would make locking more complex than it already is. So just skip atime update when filesystem is frozen like we skip it when filesystem is remounted read-only. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:45:38 +04:00
Jan Kara	eb04c28288	fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:40:38 +04:00
Jan Kara	5accdf82ba	fs: Improve filesystem freezing handling vfs_check_frozen() tests are racy since the filesystem can be frozen just after the test is performed. Thus in write paths we can end up marking some pages or inodes dirty even though the file system is already frozen. This creates problems with flusher thread hanging on frozen filesystem. Another problem is that exclusion between ->page_mkwrite() and filesystem freezing has been handled by setting page dirty and then verifying s_frozen. This guaranteed that either the freezing code sees the faulted page, writes it, and writeprotects it again or we see s_frozen set and bail out of page fault. This works to protect from page being marked writeable while filesystem freezing is running but has an unpleasant artefact of leaving dirty (although unmodified and writeprotected) pages on frozen filesystem resulting in similar problems with flusher thread as the first problem. This patch aims at providing exclusion between write paths and filesystem freezing. We implement a writer-freeze read-write semaphore in the superblock. Actually, there are three such semaphores because of lock ranking reasons - one for page fault handlers (->page_mkwrite), one for all other writers, and one of internal filesystem purposes (used e.g. to track running transactions). Write paths which should block freezing (e.g. directory operations, ->aio_write(), ->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem takes the writer side. Only that we don't really want to bounce cachelines of the semaphores between CPUs for each write happening. So we implement the reader side of the semaphore as a per-cpu counter and the writer side is implemented using s_writers.frozen superblock field. [AV: microoptimize sb_start_write(); we want it fast in normal case] BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:30:13 +04:00
Linus Torvalds	2e3ee61348	Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQF0/wAAoJECvKgwp+S8JaszsP/16EO5F5mUCOFgncVRp+8R9U BxuKJ61j2R9ckHA+ngMEg72W5vJQds64cjywZnz6HMr0/+3tXUf4QBbU4/4sCeai 0lpK8MCKgp5KHHCxgO8zyoSaboankUgoDcbSmGJREV1WXoR8VWXsO9gXqiiH9XOe e8ADjds/YdxkQbOYDRgZKvLzwWS61K9Kwq5/56GASh2uflw7rkJZ38xqvGbo3YiQ IJJwOUYfJjFadIewYARQmkZZyWeAmtY0ADh15Z8pJt+iY4PgcDDlaWagwUH2Oaoi vhTFO4KnCjhSpc872et21g/jN/VrcQqzuUF/LUE9rW5irXeDZVCDrCQOuHQ+3Uo5 YuV3rpNABW/LU8AvtIwt9hUunaKUnrXaSluoL9LzH2VpH++JljQeR4yZ62Q5rpRs z4Bow25p7tlbcIJWzueMPdOFUr5s3P6XfQHoLVRWqN94eJ+Z1DPOgBlOain6cCUN oivPh4FxZfUscNmX8/7cHaTNEpGzJ0FzLPMNFnGQ2zwG0gk41fnMdWb19aVIE5GL /Q96TB24k/v+o1lbxGmyPi0L+Aq+NknvNm+p/YJHAAQrIpu5t2hPvka8/m7Nj7tu c3rM75RKiZkEI+3U6Ws1DhhQPtVcfNIVlNYQMGzlIndtK6T0ByueQ4eN5Z7ltjSE pS89rv2hyBw7yCaTU6ui =gZrM -----END PGP SIGNATURE----- Merge tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback updates from Wu Fengguang: "Use time based periods to age the writeback proportions, which can adapt equally well to fast/slow devices." Fix up trivial conflict in comment in fs/sync.c * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Fix some comment errors block: Convert BDI proportion calculations to flexible proportions lib: Fix possible deadlock in flexible proportion code lib: Proportions with flexible period	2012-07-30 22:14:04 -07:00
Linus Torvalds	1fad1e9a74	NFS client updates for Linux 3.6 Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg 0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4 QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0 NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC 4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL YNVVGhvNxccMU5EU9YR4 =Lc59 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...	2012-07-30 19:16:57 -07:00
Alex Elder	aa711ee340	ceph: define snap counts as u32 everywhere There are two structures in which a count of snapshots are maintained: struct ceph_snap_context { ... u32 num_snaps; ... } and struct ceph_snap_realm { ... u32 num_prior_parent_snaps; /* had prior to parent_since */ ... u32 num_snaps; ... } These fields never take on negative values (e.g., to hold special meaning), and so are really inherently unsigned. Furthermore they take their value from over-the-wire or on-disk formatted 32-bit values. So change their definition to have type u32, and change some spots elsewhere in the code to account for this change. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:47 -07:00
Alan Cox	21ec6ffa46	ceph: fix potential double free We re-run the loop but we don't re-set the attrs pointer back to NULL. Signed-off-by: Alan Cox <alan@linux.intel.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:35 -07:00
Sage Weil	a53aab645c	ceph: close old con before reopening on mds reconnect When we detect a mds session reset, close the old ceph_connection before reopening it. This ensures we clean up the old socket properly and keep the ceph_connection state correct. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:32 -07:00
Linus Torvalds	27c1ee3f92	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's first set of patches: "Non-MM patches: - lots of misc bits - tree-wide have_clk() cleanups - quite a lot of printk tweaks. I draw your attention to "printk: convert the format for KERN_<LEVEL> to a 2 byte pattern" which looks a bit scary. But afaict it's solid. - backlight updates - lib/ feature work (notably the addition and use of memweight()) - checkpatch updates - rtc updates - nilfs updates - fatfs updates (partial, still waiting for acks) - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc - new fault-injection feature work" * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) drivers/misc/lkdtm.c: fix missing allocation failure check lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table() fault-injection: add tool to run command with failslab or fail_page_alloc fault-injection: add selftests for cpu and memory hotplug powerpc: pSeries reconfig notifier error injection module memory: memory notifier error injection module PM: PM notifier error injection module cpu: rewrite cpu-notifier-error-inject module fault-injection: notifier error injection c/r: fcntl: add F_GETOWNER_UIDS option resource: make sure requested range is included in the root range include/linux/aio.h: cpp->C conversions fs: cachefiles: add support for large files in filesystem caching pps: return PTR_ERR on error in device_create taskstats: check nla_reserve() return sysctl: suppress kmemleak messages ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION ipc: compat: use signed size_t types for msgsnd and msgrcv ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC ipc: add COMPAT_SHMLBA support ...	2012-07-30 17:25:34 -07:00
Cyrill Gorcunov	1d151c337d	c/r: fcntl: add F_GETOWNER_UIDS option When we restore file descriptors we would like them to look exactly as they were at dumping time. With help of fcntl it's almost possible, the missing snippet is file owners UIDs. To be able to read their values the F_GETOWNER_UIDS is introduced. This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise returning -EINVAL. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:21 -07:00
Justin Lecher	98c350cda2	fs: cachefiles: add support for large files in filesystem caching Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:21 -07:00
Djalal Harouni	bc452b4b65	proc: do not allow negative offsets on /proc/<pid>/environ __mem_open() which is called by both /proc/<pid>/environ and /proc/<pid>/mem ->open() handlers will allow the use of negative offsets. /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ. Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open() to allow negative offsets only on /proc/<pid>/mem. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Djalal Harouni	e8905ec27e	proc: environ_read() make sure offset points to environment address range Currently the following offset and environment address range check in environ_read() of /proc/<pid>/environ is buggy: int this_len = mm->env_end - (mm->env_start + src); if (this_len <= 0) break; Large or negative offsets on /proc/<pid>/environ converted to 'unsigned long' may pass this check since '(mm->env_start + src)' can overflow and 'this_len' will be positive. This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since (mm->env_start + src) will point and read from another VMA. There are two fixes here plus some code cleaning: 1) Fix the overflow by checking if the offset that was converted to unsigned long will always point to the [mm->env_start, mm->env_end] address range. 2) Remove the truncation that was made to the result of the check, storing the result in 'int this_len' will alter its value and we can not depend on it. For kernels that have commit `b409e578d` ("proc: clean up /proc/<pid>/environ handling") which adds the appropriate ptrace check and saves the 'mm' at ->open() time, this is not a security issue. This patch is taken from the grsecurity patch since it was just made available. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Jovi Zhang	108ceeb020	coredump: fix wrong comments on core limits of pipe coredump case In commit `898b374af6` ("exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit"), the core limits recursive check value was changed from 0 to 1, but the corresponding comments were not updated. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Steven J. Magnani	deb8274a0c	fat: refactor shortname parsing Nearly identical shortname parsing is performed in fat_search_long() and __fat_readdir(). Extract this code into a function that may be called by both. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:20 -07:00
Steven J. Magnani	a943ed71c9	fat: accessors for msdos_dir_entry 'start' fields Simplify code by providing accessor functions for the directory entry start cluster fields. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Namjae Jeon	497d48bd27	hfsplus: use -ENOMEM when kzalloc() fails Use -ENOMEM return value instead of -EINVAL when kzalloc() fails. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vyacheslav Dubeyko	f5974c8f8c	nilfs2: add omitted comments for different structures in driver implementation Add omitted comments for different structures in driver implementation. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Ryusuke Konishi	572d8b3945	nilfs2: fix deadlock issue between chcp and thaw ioctls An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Ryusuke Konishi	fe0627e7b3	nilfs2: fix timing issue between rmcp and chcp ioctls The checkpoint deletion ioctl (rmcp ioctl) has potential for breaking snapshot because it is not fully exclusive with checkpoint mode change ioctl (chcp ioctl). The rmcp ioctl first tests if the specified checkpoint is a snapshot or not within nilfs_cpfile_delete_checkpoint function, and then calls nilfs_cpfile_delete_checkpoints function to actually invalidate the checkpoint only if it's not a snapshot. However, the checkpoint can be changed into a snapshot by the chcp ioctl between these two operations. In that case, calling nilfs_cpfile_delete_checkpoints() wrongly invalidates the snapshot, which leads to snapshot list corruption and snapshot count mismatch. This fixes the issue by changing nilfs_cpfile_delete_checkpoints() so that it reconfirms the target checkpoints are snapshot or not. This second check is exclusive with the chcp operation since it is protected by an existing semaphore. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Fernando Luis Vazquez Cao	278038ac53	nilfs2: remove references to long gone super operations ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove references to them in the NTFS code. Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vyacheslav Dubeyko	6b0f3393e3	nilfs2: add omitted comment for ns_mount_state field of the_nilfs structure Add omitted comment for ns_mount_state field of the_nilfs structure. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Vladimir Serbinenko	6ed6a722f9	minixfs: fix block limit check On minix2 and minix3 usually max_size is 7fffffff and the check in question prohibits creation of last block spanning right before 7fffffff, due to downward rounding during the division. Fix it by using multiplication instead. [akpm@linux-foundation.org: fix up code layout, use local `sb'] Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:19 -07:00
Akinobu Mita	6017b485ca	ext4: use memweight() Convert ext4_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT4FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following change. - Remove unnecessary map == NULL check in ext4_count_free() which always takes non-null pointer as the memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	10d470849a	ext3: use memweight() Convert ext3_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT3FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext3_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT3FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	ecd0afa3ce	ext2: use memweight() Convert ext2_count_free() to use memweight() instead of table lookup based counting clear bits implementation. This change only affects the code segments enabled by EXT2FS_DEBUG. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. This also includes the following changes. - Remove unnecessary map == NULL check in ext2_count_free() which always takes non-null pointer as the memory area. - Fix printk format warning that only reveals with EXT2FS_DEBUG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	a75613ec73	ocfs2: use memweight() Use memweight to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	0121ad62c2	affs: use memweight() Use memweight() to count the total number of bits set in memory area. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Akinobu Mita	9b58f6d4aa	qnx4fs: use memweight() Use memweight() to count the total number of bits clear in memory area. Note that this memweight() call can't be replaced with a single bitmap_weight() call, although the pointer to the memory area is aligned to long-word boundary. Because the size of the memory area may not be a multiple of BITS_PER_LONG, then it returns wrong value on big-endian architecture. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:16 -07:00
Joe Perches	533574c6bc	btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout Use the generic printk_get_level() to search a message for a kern_level. Add __printf to verify format and arguments. Fix a few messages that had mismatches in format and arguments. Add #ifdef CONFIG_PRINTK blocks to shrink the object size a bit when not using printk. [akpm@linux-foundation.org: whitespace tweak] Signed-off-by: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:14 -07:00
Kees Cook	54b501992d	coredump: warn about unsafe suid_dumpable / core_pattern combo When suid_dumpable=2, detect unsafe core_pattern settings and warn when they are seen. Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Kees Cook	9520628e8c	fs: make dumpable=2 require fully qualified path When the suid_dumpable sysctl is set to "2", and there is no core dump pipe defined in the core_pattern sysctl, a local user can cause core files to be written to root-writable directories, potentially with user-controlled content. This means an admin can unknowningly reintroduce a variation of CVE-2006-2451, allowing local users to gain root privileges. $ cat /proc/sys/fs/suid_dumpable 2 $ cat /proc/sys/kernel/core_pattern core $ ulimit -c unlimited $ cd / $ ls -l core ls: cannot access core: No such file or directory $ touch core touch: cannot touch `core': Permission denied $ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 & $ pid=$! $ sleep 1 $ kill -SEGV $pid $ ls -l core -rw------- 1 root kees 458752 Jun 21 11:35 core $ sudo strings core \| grep evil OHAI=evil-string-here While cron has been fixed to abort reading a file when there is any parse error, there are still other sensitive directories that will read any file present and skip unparsable lines. Instead of introducing a suid_dumpable=3 mode and breaking all users of mode 2, this only disables the unsafe portion of mode 2 (writing to disk via relative path). Most users of mode 2 (e.g. Chrome OS) already use a core dump pipe handler, so this change will not break them. For the situations where a pipe handler is not defined but mode 2 is still active, crash dumps will only be written to fully qualified paths. If a relative path is defined (e.g. the default "core" pattern), dump attempts will trigger a printk yelling about the lack of a fully qualified path. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Sasha Levin	779302e678	fs/xattr.c:getxattr(): improve handling of allocation failures This allocation can be as large as 64k. - Add __GFP_NOWARN so the falied kmalloc() is silent - Fall back to vmalloc() if the kmalloc() failed Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Fernando Luis Vazquez Cao	32b4560b04	ntfs: remove references to long gone super operations and unimplemented methods ->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove refereces to them in the NTFS code. Remove unnecessary comments about unimplemented methods while at it (suggested by Christoph Hellwig). Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-30 17:25:11 -07:00
Sage Weil	1fe60e51a3	libceph: move feature bits to separate header This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 16:23:22 -07:00
Bryan Schumaker	89d77c8fa8	NFS: Convert v4 into a module This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:52 -04:00
Bryan Schumaker	1c606fb74c	NFS: Convert v3 into a module This patch exports symbols and moves over the final structures needed by the v3 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set. The module (nfs3.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:46 -04:00
Bryan Schumaker	ddda8e0aa8	NFS: Convert v2 into a module The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:41 -04:00
Bryan Schumaker	fac1e8e4ef	NFS: Keep module parameters in the generic NFS client Otherwise we break backwards compatibility when v4 becomes a modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:31 -04:00
Bryan Schumaker	19d87ca362	NFS: Split out remaining NFS v4 inode functions Somehow I missed this in my previous patch series, but these functions are only needed by the v4 code and should be moved to a v4-only file. I wasn't exactly sure where I should put these functions, so I moved them into nfs4super.c where I could make them static. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:20 -04:00
Bryan Schumaker	6a74490dca	NFS: Pass super operations and xattr handlers in the nfs_subversion I can set all variables in the nfs_fill_super() function, allowing me to remove the nfs4_fill_super() function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:06:05 -04:00
Bryan Schumaker	1179acc6a3	NFS: Only initialize the ACL client in the v3 case v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to initialize the ACL client only when we are using v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:05:54 -04:00
Bryan Schumaker	ff9099f266	NFS: Create a try_mount rpc op I'm already looking up the nfs subversion in nfs_fs_mount(), so I have easy access to rpc_ops that used to be difficult to reach. This allows me to set up a different mount path for NFS v2/3 and NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:53 -04:00
Bryan Schumaker	e8f25e6d6d	NFS: Remove the NFS v4 xdev mount function I can now share this code with the v2 and v3 code by using the NFS subversion structure. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:45 -04:00
Bryan Schumaker	ab7017a3a0	NFS: Add version registering framework This patch adds in the code to track multiple versions of the NFS protocol. I created default structures for v2, v3 and v4 so that each version can continue to work while I convert them into kernel modules. I also removed the const parameter from the rpc_version array so that I can change it at runtime. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 19:04:17 -04:00
David Howells	a427b9ec4e	NFS: Fix a number of bugs in the idmapper Fix a number of bugs in the NFS idmapper code: (1) Only registered key types can be passed to the core keys code, so register the legacy idmapper key type. This is a requirement because the unregister function cleans up keys belonging to that key type so that there aren't dangling pointers to the module left behind - including the key->type pointer. (2) Rename the legacy key type. You can't have two key types with the same name, and (1) would otherwise require that. (3) complete_request_key() must be called in the error path of nfs_idmap_legacy_upcall(). (4) There is one idmap struct for each nfs_client struct. This means that idmap->idmap_key_cons is shared without the use of a lock. This is a problem because key_instantiate_and_link() - as called indirectly by idmap_pipe_downcall() - releases anyone waiting for the key to be instantiated. What happens is that idmap_pipe_downcall() running in the rpc.idmapd thread, releases the NFS filesystem in whatever thread that is running in to continue. This may then make another idmapper call, overwriting idmap_key_cons before idmap_pipe_downcall() gets the chance to call complete_request_key(). I think that reading idmap_key_cons only once, before key_instantiate_and_link() is called, and then caching the result in a variable is sufficient. Bug (4) is the cause of: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP CPU 1 Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246 RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80 RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50 RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0 R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80 R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006 FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710) Stack: ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8 Call Trace: [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc] [<ffffffff8117f833>] vfs_write+0xb3/0x180 [<ffffffff8117fb5a>] sys_write+0x4a/0x90 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8801a3645d40> CR2: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.4]	2012-07-30 18:57:39 -04:00
Jeff Layton	5cf02d09b5	nfs: skip commit in releasepage if we're freeing memory for fs-related reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-07-30 18:55:59 -04:00
Peng Tao	159e0561e3	pnfsblock: bail out partial page IO Current block layout driver read/write code assumes page aligned IO in many places. Add a checker to validate the assumption. Otherwise there would be data corruption like when application does open(O_WRONLY) and page unaliged write. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:52:06 -04:00
Jeff Layton	f44106e217	nfs: fix fl_type tests in NFSv4 code fl_type is not a bitmap. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:09:13 -04:00
Fred Isaman	c95908e4c5	NFS: fix pnfs regression with directio writes Commit `57208fa7e5` "NFS: Create an write_pageio_init() function" did not modify the calls in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:09:06 -04:00
Fred Isaman	59948db3be	NFS: fix pnfs regression with directio reads Commit `1abb50886a` "NFS: Create an read_pageio_init() function" did not modify the call in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:08:53 -04:00
Randy Dunlap	0add3e8567	nfs: fix stub return type warnings Fix numerous repeated warnings by making the stub function void instead of non-void: fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl': fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 17:30:24 -04:00
Jan Kara	4a55c1017b	nfsd: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: linux-nfs@vger.kernel.org CC: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:51 +04:00
Jan Kara	e7848683ae	btrfs: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:51 +04:00
Jan Kara	e24f17da35	fat: Push mnt_want_write() outside of i_mutex When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call outside of i_mutex as in other places. CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:50 +04:00
Jan Kara	c30dabfe5d	fs: Push mnt_want_write() outside of i_mutex Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes without it. This isn't really a problem because mnt_want_write() is a non-blocking operation (essentially has a trylock semantics) but when the function starts to handle also frozen filesystems, it will get a full lock semantics and thus proper lock ordering has to be established. So move all mnt_want_write() calls outside of i_mutex. One non-trivial case needing conversion is kern_path_create() / user_path_create() which didn't include mnt_want_write() but now needs to because it acquires i_mutex. Because there are virtual file systems which don't bother with freeze / remount-ro protection we actually provide both versions of the function - one which calls mnt_want_write() and one which does not. [AV: scratch the previous, mnt_want_write() has been moved to kern_path_create() by now] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:49 +04:00
Jan Kara	14ae417c6f	sysfs: Push file_update_time() into bin_page_mkwrite() CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:47 +04:00
Jan Kara	a63e9b2e76	gfs2: Push file_update_time() into gfs2_page_mkwrite() CC: Steven Whitehouse <swhiteho@redhat.com> CC: cluster-devel@redhat.com Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:46 +04:00
Jan Kara	120c2bcad8	9p: Push file_update_time() into v9fs_vm_page_mkwrite() CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:46 +04:00
Jan Kara	3ca9c3bd8a	ceph: Push file_update_time() into ceph_page_mkwrite() CC: Sage Weil <sage@newdream.net> CC: ceph-devel@vger.kernel.org Acked-by: Sage Weil <sage@newdream.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:45 +04:00
Jan Kara	5e8830dc85	fs: Push file_update_time() into __block_page_mkwrite() Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 01:02:44 +04:00
Al Viro	64894cf843	simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early The write ref to vfsmount taken in lookup_open()/atomic_open() is going to be dropped; we take the one to stay in dentry_open(). Just grab the temporary in caller if it looks like we are going to need it (create/truncate/writable open) and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write() inside, check that flag and treat "false" as "mnt_want_write() has just failed". mnt_want_write() is cheap and the things get considerably simpler and more robust that way - we get it and drop it in the same function, to start with, rather than passing a "has something in the guts of really scary functions taken it" back to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 00:53:35 +04:00
Linus Torvalds	37cd9600a9	xfs: update for 3.6-rc1 Numerous cleanups and several bug fixes. Here are some highlights: * Discontiguous directory buffer support * Inode allocator refactoring * Removal of the IO lock in inode reclaim * Implementation of .update_time * Fix for handling of EOF in xfs_vm_writepage * Fix for races in xfsaild, and idle mode is re-enabled * Fix for a crash in xfs_buf completion handlers on unmount. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQFtCvAAoJENaLyazVq6ZOIuEQAINJXb4SK9oBrdwGmq+Vsqf2 Eh4OmzZmdnSPrxfFGmqvyL9DdUBvGBuidwOcVLMAXGtzbxE9USK9NuKC5zN/hJip 8tIyv/8bqZ0aD4RJlHGN5zKFoQh/9Tag+JsaaqWstO8Ir1tA/5p04hDAz492btfT 49SvnV64sJ1fi7pmaJblMWMMtlWJjD6iOldaHwnKBQ3LKmcgy9sD9DY5HiGOTr1j ecKtucX7B8Q9oFLKHaKEwTYZRRYDNuTbqZmI6hlEcA5hT280jotsGA4q/aXx/gHS lZuBaqVtNFT5WCKm+j/et76tmTfIh0CSbo64ZfgSOESy2BkEVXHg5XJ1gDvPdV+L 6eBlUx3jaiNyFVHxVzFhzwKC/XdaITCd/ixFEogRDmoppDXencTCibLJXHNXxupN BCAyTLCxEJIE9WCeOMmwHA0450bMY4or13NGep57pIvG8GomtdG1WncTRIo84KV5 0W5ocaUTGP7ROsr+KF8U9C7H866OHzVFijA+vvcTy8GtsT/xOCFxuJrqPVb+kgD7 mIKaoK7iH6Kufu433TzsLEcUkF36gq/7NytPKjQhURLpZhxkHG3rq6LC0HXp6uuZ QgX5Y5Gl7SwDovIrndXmQXRnGrzvqHLguZl65+rB1CKggjemkLSdSLhryoNVjLU2 iB7/hvzOUdYFMRRz2mLc =2wkC -----END PGP SIGNATURE----- Merge tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs Pull xfs update from Ben Myers: "Numerous cleanups and several bug fixes. Here are some highlights: - Discontiguous directory buffer support - Inode allocator refactoring - Removal of the IO lock in inode reclaim - Implementation of .update_time - Fix for handling of EOF in xfs_vm_writepage - Fix for races in xfsaild, and idle mode is re-enabled - Fix for a crash in xfs_buf completion handlers on unmount." Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h} due to duplicate patches that had already been merged for 3.5. * tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits) xfs: wait for the write the superblock on unmount xfs: re-enable xfsaild idle mode and fix associated races xfs: remove iolock lock classes xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes xfs: do not take the iolock in xfs_inactive xfs: remove xfs_inactive_attrs xfs: clean up xfs_inactive xfs: do not read the AGI buffer in xfs_dialloc until nessecary xfs: refactor xfs_ialloc_ag_select xfs: add a short cut to xfs_dialloc for the non-NULL agbp case xfs: remove the alloc_done argument to xfs_dialloc xfs: split xfs_dialloc xfs: remove xfs_ialloc_find_free Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. xfs: remove xfs_inotobp xfs: merge xfs_itobp into xfs_imap_to_bp xfs: handle EOF correctly in xfs_vm_writepage xfs: implement ->update_time xfs: fix comment typo of struct xfs_da_blkinfo. xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks ...	2012-07-30 13:37:53 -07:00
Sage Weil	8842b3be96	ceph: clean up useless d_parent checks d_parent is never NULL, and IS_ROOT() is the proper way to check for a (non-self-referential) parent. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:54 -07:00
Al Viro	f8310c5920	fix O_EXCL handling for devices O_EXCL without O_CREAT has different semantics; it's "fail if already opened", not "fail if already exists". commit `71574865` broke that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-30 11:50:30 +04:00
Mark Tinguely	9a57fa8ee7	xfs: wait for the write the superblock on unmount v2: Add the xfs_buf_lock to xfs_quiesce_attr(). Add explaination why xfs_buf_lock() is used to wait for write. xfs_wait_buftarg() does not wait for the completion of the write of the uncached superblock. This write can race with the shutdown of the log and causes a panic if the write does not win the race. During the log write, xfsaild_push() will lock the buffer and set the XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait for the write to complete. The buffer's lock is held until the write is complete, so we can block on a xfs_buf_lock() request to be notified that the write is complete. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:34:19 -05:00
Brian Foster	8375f922aa	xfs: re-enable xfsaild idle mode and fix associated races xfsaild idle mode logic currently leads to a couple hangs: 1.) If xfsaild is rescheduled in during an incremental scan (i.e., tout != 0) and the target has been updated since the previous run, we can hit the new target and go into idle mode with a still populated ail. 2.) A wake up is only issued when the target is pushed forward. The wake up can race with xfsaild if it is currently in the process of entering idle mode, causing future wake up events to be lost. These hangs have been reproduced and verified as fixed by running xfstests 273 in a loop on a slightly modified upstream kernel. The kernel is modified to re-enable idle mode as previously implemented (when count == 0) and with a revert of commit `670ce93f`, which includes performance improvements that make this harder to reproduce. The solution, the algorithm for which has been outlined by Dave Chinner, is to modify xfsaild to enter idle mode only when the ail is empty and the push target has not been moved forward since the last push. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:27:57 -05:00
Christoph Hellwig	4f59af758f	xfs: remove iolock lock classes Content-Disposition: inline; filename=xfs-remove-iolock-classes Now that we never take the iolock during inode reclaim we don't need to play games with lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:23:51 -05:00
Christoph Hellwig	5a15322da1	xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes Same rational as the last patch - these inodes are not reachable, so don't bother with locking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:22:20 -05:00
Christoph Hellwig	0b56185b0d	xfs: do not take the iolock in xfs_inactive An inode that enters xfs_inactive has been removed from all global lists but the inode hash, and can't be recycled in xfs_iget before it has been marked reclaimable. Thus taking the iolock in here is not nessecary at all, and given the amount of lockdep false positives it has triggered already I'd rather remove the locking. The only change outside of xfs_inactive is relaxing an assert in xfs_itruncate_extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:16:49 -05:00
Christoph Hellwig	fe67be036f	xfs: remove xfs_inactive_attrs Remove this helper as the code flow is a lot more obvious when it gets merged into its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:15:33 -05:00
Christoph Hellwig	b373e98daa	xfs: clean up xfs_inactive The code to reserve log space and join the inode to the transaction is common for all cases, so don't duplicate it. Also remove the trivial xfs_inactive_symlink_local helper which can simply be opencode now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:13:09 -05:00
Christoph Hellwig	be60fe54b2	xfs: do not read the AGI buffer in xfs_dialloc until nessecary Refactor the AG selection loop in xfs_dialloc to operate on the in-memory perag data as much as possible. We only read the AGI buffer once we have selected an AG to allocate inodes now instead of for every AG considered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:10:54 -05:00
Christoph Hellwig	55d6af64cb	xfs: refactor xfs_ialloc_ag_select Loop over the in-core perag structures and prefer using pagi_freecount over going out to the AGI buffer where possible. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:08:13 -05:00
Christoph Hellwig	4bb61069d2	xfs: add a short cut to xfs_dialloc for the non-NULL agbp case In this case we already have selected an AG and know it has free space beause the buffer lock never got released. Jump directly into xfs_dialloc_ag and short cut the AG selection loop. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:03:23 -05:00
Christoph Hellwig	08358906ed	xfs: remove the alloc_done argument to xfs_dialloc We can simplify check the IO_agbp pointer for being non-NULL instead of passing another argument through two layers of function calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 16:00:31 -05:00
Christoph Hellwig	f2ecc5e453	xfs: split xfs_dialloc Move the actual allocation once we have selected an allocation group into a separate helper, and make xfs_dialloc a wrapper around it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-29 15:56:49 -05:00
Al Viro	bf8848918d	lockd: handle lockowner allocation failure in nlmclnt_proc() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 23:17:39 +04:00
Al Viro	446945ab9a	lockd: shift grabbing a reference to nlm_host into nlm_alloc_call() It's used both for client and server hosts; we can't do nlmclnt_release_host() on failure exits, since the host might need nlmsvc_release_host(), with BUG_ON() for calling the wrong one. Makes life simpler for callers, actually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 23:09:57 +04:00
Kees Cook	a51d9eaa41	fs: add link restriction audit reporting Adds audit messages for unexpected link restriction violations so that system owners will have some sort of potentially actionable information about misbehaving processes. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:43:08 +04:00
Kees Cook	800179c9b8	fs: add link restrictions This adds symlink and hardlink restrictions to the Linux VFS. Symlinks: A long-standing class of security issues is the symlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given symlink (i.e. a root process follows a symlink belonging to another user). For a likely incomplete list of hundreds of examples across the years, please see: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp The solution is to permit symlinks to only be followed when outside a sticky world-writable directory, or when the uid of the symlink and follower match, or when the directory owner matches the symlink's owner. Some pointers to the history of earlier discussion that I could find: 1996 Aug, Zygo Blaxell http://marc.info/?l=bugtraq&m=87602167419830&w=2 1996 Oct, Andrew Tridgell http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html 1997 Dec, Albert D Cahalan http://lkml.org/lkml/1997/12/16/4 2005 Feb, Lorenzo Hernández García-Hierro http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html 2010 May, Kees Cook https://lkml.org/lkml/2010/5/30/144 Past objections and rebuttals could be summarized as: - Violates POSIX. - POSIX didn't consider this situation and it's not useful to follow a broken specification at the cost of security. - Might break unknown applications that use this feature. - Applications that break because of the change are easy to spot and fix. Applications that are vulnerable to symlink ToCToU by not having the change aren't. Additionally, no applications have yet been found that rely on this behavior. - Applications should just use mkstemp() or O_CREATE\|O_EXCL. - True, but applications are not perfect, and new software is written all the time that makes these mistakes; blocking this flaw at the kernel is a single solution to the entire class of vulnerability. - This should live in the core VFS. - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135) - This should live in an LSM. - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188) Hardlinks: On systems that have user-writable directories on the same partition as system files, a long-standing class of security issues is the hardlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given hardlink (i.e. a root process follows a hardlink created by another user). Additionally, an issue exists where users can "pin" a potentially vulnerable setuid/setgid file so that an administrator will not actually upgrade a system fully. The solution is to permit hardlinks to only be created when the user is already the existing file's owner, or if they already have read/write access to the existing file. Many Linux users are surprised when they learn they can link to files they have no access to, so this change appears to follow the doctrine of "least surprise". Additionally, this change does not violate POSIX, which states "the implementation may require that the calling process has permission to access the existing file"[1]. This change is known to break some implementations of the "at" daemon, though the version used by Fedora and Ubuntu has been fixed[2] for a while. Otherwise, the change has been undisruptive while in use in Ubuntu for the last 1.5 years. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 This patch is based on the patches in Openwall and grsecurity, along with suggestions from Al Viro. I have added a sysctl to enable the protected behavior, and documentation. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:37:58 +04:00
Jeff Layton	3134f37e93	vfs: don't let do_last pass negative dentry to audit_inode I can reliably reproduce the following panic by simply setting an audit rule on a recent 3.5.0+ kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 PGD 7acd9067 PUD 7b8fb067 PMD 0 Oops: 0000 [#86] SMP Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] CPU 0 Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs RIP: 0010:[<ffffffff810d1250>] [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800 FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530) Stack: ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358 ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968 ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000 Call Trace: [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160 [<ffffffff810d4968>] __audit_inode+0x118/0x2d0 [<ffffffff811955a9>] do_last+0x999/0xe80 [<ffffffff81191fe8>] ? inode_permission+0x18/0x50 [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130 [<ffffffff81195b4a>] path_openat+0xba/0x420 [<ffffffff81196111>] do_filp_open+0x41/0xa0 [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120 [<ffffffff811855cd>] do_sys_open+0xed/0x1c0 [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300 [<ffffffff811856c1>] sys_open+0x21/0x30 [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b RSP <ffff88007aebfc38> CR2: 0000000000000040 The problem is that do_last is passing a negative dentry to audit_inode. The comments on lookup_open note that it can pass back a negative dentry if O_CREAT is not set. This patch fixes the oops, but I'm not clear on whether there's a better approach. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:27:03 +04:00
Al Viro	e4fad8e5d2	consolidate pipe file creation Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:19 +04:00
Al Viro	b5bcdda327	take grabbing f->f_path to do_dentry_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:18 +04:00
Al Viro	5c33b183a3	uninline file_free_rcu() What inline? Its only use is passing its address to call_rcu(), for fuck sake! Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:17 +04:00
Al Viro	0b1d90119a	ecryptfs_lookup_interpose(): allocate dentry_info first less work on failure that way Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:17 +04:00
Al Viro	bc65a1215e	sanitize ecryptfs_lookup() * ->lookup() never gets hit with . or .. * dentry it gets is unhashed, so unless we had gone and hashed it ourselves, there's no need to d_drop() the sucker. * wrong name printed in one of the printks (NULL, in fact) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:16 +04:00
Al Viro	a8104a9fcd	pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. One side effect - attempt to create a cross-device link on a read-only fs fails with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:15 +04:00
Al Viro	8e4bfca1d1	mknod: take sanity checks on mode into the very beginning Note that applying umask can't affect their results. While that affects errno in cases like mknod("/no_such_directory/a", 030000) yielding -EINVAL (due to impossible mode_t) instead of -ENOENT (due to inexistent directory), IMO that makes a lot more sense, POSIX allows to return either and any software that relies on getting -ENOENT instead of -EINVAL in that case deserves everything it gets. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:14 +04:00
Al Viro	921a1650de	new helper: done_path_create() releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:13 +04:00
Linus Torvalds	173f865474	The usual collection of bug fixes and optimizations. Perhaps of greatest note is a speed up for parallel, non-allocating DIO writes, since we no longer take the i_mutex lock in that case. For bug fixes, we fix an incorrect overhead calculation which caused slightly incorrect results for df(1) and statfs(2). We also fixed bugs in the metadata checksum feature. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQE1SzAAoJENNvdpvBGATwzOMQAKxVkaTqkMc+cUahLLUkFdGQ xnmigHufVuOvv+B8l1p6vbYx+qOftztqXF1WC41j8mTfEDs19jKXv3LI57TJ+bIW a/Kp1CjMicRs/HGhFPNWp1D+N8J95MTFP6Y9biXSmBBvefg2NSwxpg48yZtjUy1/ Zl0414AqMvTJyqKKOUa++oyl3XlawnzDZ+6a7QPKsrAaDOU5977W4y2tZkNFk84d qRjTfaiX13aVe7XupgQHe/jtk40Sj38pyGVGiAGlHOZhZtYKE6MzB8OreGiMTy8z rmJz/CQUsQ8+sbKYhAcDru+bMrKghbO0u78XRz9CY+YpVFF/Xs2QiXoV0ZOGkIm6 eokq7jEV0K+LtDEmM3PkmUPgXfYw5damTv8qWuBUFd4wtVE5x/DmK8AJVMidCAUN GkVR+rEbbEi7RCwsuac/aKB8baVQCTiJ5tfNTgWh9zll+9GZSk+U71Pp0KdcJGiS nxitAZ+20hZN2CQctlmaGbCPTPYCWU4hQ3IuMdTlQTQAs8S0y1FtylTRsXcC1eVR i1hBS/dVw5PVCaqoX79zYrByUymgX0ZaYY6seRT6+U9xPGDCSNJ9mjXZL3fttnOC d5gsx/pbMIAv52G5Hj6DfsXR2JFmmxsaIzsLtRvKi9q89d84XaZfbUsHYjn4Neym 5lTKaSQHU71cKCxrStHC =VAVB -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The usual collection of bug fixes and optimizations. Perhaps of greatest note is a speed up for parallel, non-allocating DIO writes, since we no longer take the i_mutex lock in that case. For bug fixes, we fix an incorrect overhead calculation which caused slightly incorrect results for df(1) and statfs(2). We also fixed bugs in the metadata checksum feature." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: undo ext4_calc_metadata_amount if we fail to claim space ext4: don't let i_reserved_meta_blocks go negative ext4: fix hole punch failure when depth is greater than 0 ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() ext4: weed out ext4_write_super ext4: remove unnecessary superblock dirtying ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() ext4: remove useless marking of superblock dirty ext4: fix ext4 mismerge back in January ext4: remove dynamic array size in ext4_chksum() ext4: remove unused variable in ext4_update_super() ext4: make quota as first class supported feature ext4: don't take the i_mutex lock when doing DIO overwrites ext4: add a new nolock flag in ext4_map_blocks ext4: split ext4_file_write into buffered IO and direct IO ext4: remove an unused statement in ext4_mb_get_buddy_page_lock() ext4: fix out-of-date comments in extents.c ext4: use s_csum_seed instead of i_csum_seed for xattr block ext4: use proper csum calculation in ext4_rename ext4: fix overhead calculation used by ext4_statfs() ...	2012-07-27 20:52:25 -07:00
Stanislav Kinsbursky	2c142baa7b	NFSd: make boot_time variable per network namespace NFSd's boot_time represents grace period start point in time. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	a51c84ed50	NFSd: make grace end flag per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	5630f7fa97	Lockd: move grace period management from lockd() to per-net functions Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	5ccb0066f2	LockD: pass actual network namespace to grace period management functions Passed network namespace replaced hard-coded init_net Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	db9c455341	LockD: manage grace list per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky	9695c7057f	SUNRPC: service request network namespace helper introduced This is a cleanup patch - makes code looks simplier. It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky	5e1533c788	NFSd: make nfsd4_manager allocated per network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky	08d44a35a9	LockD: make lockd manager allocated per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	66547b0251	LockD: manage grace period per network namespace Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	e2edaa98cb	Lockd: add more debug to host shutdown functions Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	d5850ff9ea	Lockd: host complaining function introduced Just a small cleanup. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:44 -04:00
Stanislav Kinsbursky	caa4e76b6f	LockD: manage used host count per networks namespace This patch introduces moves nrhosts in per-net data. It also adds kernel warning to nlm_shutdown_hosts_net() about remaining hosts in specified network namespace context. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	3cf7fb07e0	LockD: manage garbage collection timeout per networks namespace This patch moves next_gc to per-net data. Note: passed network can be NULL (when Lockd kthread is exiting of Lockd module is removing). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	27adaddc8d	LockD: make garbage collector network namespace aware. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
Stanislav Kinsbursky	b26411f85d	LockD: mark host per network namespace on garbage collect This is required for per-network NLM shutdown and cleanup. This patch passes init_net for a while. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:48:43 -04:00
J. Bruce Fields	99dbb8fe09	nfsd4: fix missing fault_inject.h include Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:30:12 -04:00
J. Bruce Fields	96d6d59cea	locks: move lease-specific code out of locks_delete_lock No point putting something only used by one caller into common code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 16:18:00 -04:00
Pavel Shilovsky	1a500f010f	CIFS: Add SMB2 support for rmdir Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:50 -05:00
Pavel Shilovsky	f958ca5d88	CIFS: Move rmdir code to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:47 -05:00
Pavel Shilovsky	a0e731839d	CIFS: Add SMB2 support for mkdir operation Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:43 -05:00
Pavel Shilovsky	f436720e94	CIFS: Separate protocol specific part from mkdir Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:40 -05:00
Pavel Shilovsky	ff691e9694	CIFS: Simplify cifs_mkdir call Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-27 15:17:16 -05:00
Linus Torvalds	84eda28060	Merge branch 'kmap_atomic' of git://github.com/congwang/linux Pull final kmap_atomic cleanups from Cong Wang: "This should be the final round of cleanup, as the definitions of enum km_type finally get removed from the whole tree. The patches have been in linux-next for a long time." * 'kmap_atomic' of git://github.com/congwang/linux: pipe: remove KM_USER0 from comments vmalloc: remove KM_USER0 from comments feature-removal-schedule.txt: remove kmap_atomic(page, km_type) tile: remove km_type definitions um: remove km_type definitions asm-generic: remove km_type definitions avr32: remove km_type definitions frv: remove km_type definitions powerpc: remove km_type definitions arm: remove km_type definitions highmem: remove the deprecated form of kmap_atomic tile: remove usage of enum km_type frv: remove the second parameter of kmap_atomic_primary() jbd2: remove the second argument of kmap_atomic	2012-07-27 11:26:48 -07:00
Filipe Brandenburger	3b6e2723f3	locks: prevent side-effects of locks_release_private before file_lock is initialized When calling fcntl(fd, F_SETLEASE, lck) [with lck=F_WRLCK or F_RDLCK], the custom signal or owner (if any were previously set using F_SETSIG or F_SETOWN fcntls) would be reset when F_SETLEASE was called for the second time on the same file descriptor. This bug is a regression of 2.6.37 and is described here: https://bugzilla.kernel.org/show_bug.cgi?id=43336 This patch reverts a commit from Oct 2004 (with subject "nfs4 lease: move the f_delown processing") which originally introduced the lm_release_private callback. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-27 09:39:55 -04:00
Stephen Rothwell	a1857ebe75	Btrfs: using vmalloc and friends needs vmalloc.h On powerpc, we don't get the implicit vmalloc.h include, and as a result the build fails noisily: fs/btrfs/send.c: In function 'fs_path_free': fs/btrfs/send.c:185:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] fs/btrfs/send.c: In function 'fs_path_ensure_buf': fs/btrfs/send.c:215:4: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration] fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c: In function 'iterate_dir_item': fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c: In function 'btrfs_ioctl_send': fs/btrfs/send.c:4463:17: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4469:17: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4475:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] fs/btrfs/send.c:4475:20: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/send.c:4483:21: warning: assignment makes pointer from integer without a cast [enabled by default] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-26 18:08:30 -07:00
Linus Torvalds	e2aed8dfa5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull large btrfs update from Chris Mason: "This pull request is very large, and the two main features in here have been under testing/devel for quite a while. We have subvolume quotas from the strato developers. This enables full tracking of how many blocks are allocated to each subvolume (and all snapshots) and you can set limits on a per-subvolume basis. You can also create quota groups and toss multiple subvolumes into a big group. It's everything you need to be a web hosting company and give each user their own subvolume. The userland side of the quotas is being refreshed, they'll send out details on where to grab it soon. Next is the kernel side of btrfs send/receive from Alexander Block. This leverages the same infrastructure as the quota code to figure out relationships between blocks and their owners. It can then compute the difference between two snapshots and sends the diffs in a neutral format into userland. The basic model: create a snapshot send that snapshot as the initial backup make changes create a second snapshot send the incremental as a backup delete the first snapshot (use the second snapshot for the next incremental) The receive portion is all in userland, and in the 'next' branch of my btrfs-progs repo. There's still some work to do in terms of optimizing the send side from kernel to userland. The really important part is figuring out how two snapshots are different, and this is where we are concentrating right now. The initial send of a dataset is a little slower than tar, but the incremental sends are dramatically faster than what rsync can do. On top of all of that, we have a nice queue of fixes, cleanups and optimizations." Fix up trivial modify/del conflict in fs/btrfs/ioctl.c Also fix up semantic conflict in fs/btrfs/send.c: the interface to dentry_open() changed in commit `765927b2d5` ("switch dentry_open() to struct path, make it grab references itself"), and since it now grabs whatever references it needs, we should no longer do the mntget() on the mnt (and we need to dput() the dentry reference we took). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits) Btrfs: uninit variable fixes in send/receive Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive Btrfs: add btrfs_compare_trees function Btrfs: introduce subvol uuids and times Btrfs: make iref_to_path non static Btrfs: add a barrier before a waitqueue_active check Btrfs: call the ordered free operation without any locks held Btrfs: Check INCOMPAT flags on remount and add helper function Btrfs: add helper for tree enumeration btrfs: allow cross-subvolume file clone Btrfs: improve multi-thread buffer read Btrfs: make btrfs's allocation smoothly with preallocation Btrfs: lock the transition from dirty to writeback for an eb Btrfs: fix potential race in extent buffer freeing Btrfs: don't return true in releasepage unless we actually freed the eb Btrfs: suppress printk() if all device I/O stats are zero Btrfs: remove unwanted printk() for btrfs device I/O stats Btrfs: rewrite BTRFS_SETGET_FUNCS Btrfs: zero unused bytes in inode item Btrfs: kill free_space pointer from inode structure ... Conflicts: fs/btrfs/ioctl.c	2012-07-26 14:48:55 -07:00
Linus Torvalds	548ed10228	dlm for 3.6 This set includes a major redesign of recording the master node for resources. The old dir hash table, which just held the master node for each resource, has been removed. The rsb hash table has always duplicated the master node value from the dir, and is now the single record of it. Having two full hash tables of all resources has always been a waste, especially since one just duplicated a single value from the other. Local requests will now often require one instead of two lengthy hash table searches. The other substantial change is made possible by the dirtbl removal, and fixes a long standing race between resource removal and lookup by reworking how removal is done. At the same time it improves the efficiency of removal by avoiding repeated searches through a hash bucket. The other commits include minor fixes and changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQEFuFAAoJEDgbc8f8gGmqklAQAL2gMCp3asEbbUa2FRd5cZQT IfELLBS0kqr6ghEDpV1XTVig8vCwhGoa9UC/jYEtqnmo02ImjCOvxM4wcSAjbEhc L0J/JwfQNPtAbs8NN7vo7p2n5DPCWcKr3YiksvrPpQdXXAVTAfC4SkK028hF6n8B LtMrADqsz+dTqg6UQTCRsMyZlLrFNPZiR5IRy72BQLmg+lcNNtARb0T++W2EfAAV 03RnEUQPxbrKNZpVAAdncX5NUZz/D+DB5DMLF2q+hfquuGvheizmZHipnhOexeKV YZcQA/232BH0kqbKTGoIyqgMlCeKTgL56lBMyax3naEbBu4s80ZPOXjuA4pUpx5p 24mCtXcozoZ9WCJZwSNUqVGfG22HKSMGXTVoXxteamL/8CQ9XRPAvH/UpHi38LBm pzQ+aZGHDXpDlMlQYTnD4n7NEiq8IV5p9sOZnK2EjSshoiWefQomx73wFpwqdRgm TYtb583BjdU1QYLYGiJTy9yqg693Ket3pSipOVoqH/qMGM+d7Z3cwoMvcoHTVkOf Z2XiFvfTjWcA+4givvUOWTd9FuFJ+KQr+mPo867lIJyuT0KDttVDwZyQPqN7m7ma ymS9tVmrmfVmq97XkypVDkZELchqaW41Y8BKQ4c5JxRDLQX6Di7Pq36S6XL7mrOf medHFc9e6jhrS45lvpw+ =1bGM -----END PGP SIGNATURE----- Merge tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updatesfrom David Teigland: "This set includes a major redesign of recording the master node for resources. The old dir hash table, which just held the master node for each resource, has been removed. The rsb hash table has always duplicated the master node value from the dir, and is now the single record of it. Having two full hash tables of all resources has always been a waste, especially since one just duplicated a single value from the other. Local requests will now often require one instead of two lengthy hash table searches. The other substantial change is made possible by the dirtbl removal, and fixes a long standing race between resource removal and lookup by reworking how removal is done. At the same time it improves the efficiency of removal by avoiding repeated searches through a hash bucket. The other commits include minor fixes and changes." * tag 'dlm-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix missing dir remove dlm: fix conversion deadlock from recovery dlm: use wait_event_timeout dlm: fix race between remove and lookup dlm: use idr instead of list for recovered rsbs dlm: use rsbtbl as resource directory	2012-07-26 14:03:42 -07:00
Linus Torvalds	98077a7205	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (40 commits) cifs: ensure that we always do cifsFileInfo_get under the spinlock CIFS: Make CAP_* checks protocol independent CIFS: Allow SMB2 statistics to be tracked CIFS: Move clear/print_stats code to ops struct CIFS: Add echo request support for SMB2 CIFS: Move echo code to osp struct CIFS: Add SMB2 support for async requests CIFS: Setup async request in ops struct CIFS: Add SMB2 support for build_path_to_root CIFS: Move building path to root to ops struct CIFS: Query SMB2 inode info CIFS: Move query inode info code to ops struct CIFS: Add SMB2 support for is_path_accessible CIFS: Move is_path_accessible to ops struct CIFS: Move informational tcon calls to ops struct CIFS: Move getting dfs referalls to ops struct CIFS: Process reconnects for SMB2 shares CIFS: Add tree connect/disconnect capability for SMB2 CIFS: Add session setup/logoff capability for SMB2 CIFS: Add capability to send SMB2 negotiate message ...	2012-07-26 14:00:52 -07:00
Josh Boyer	8ded2bbc18	posix_types.h: Cleanup stale __NFDBITS and related definitions Recently, glibc made a change to suppress sign-conversion warnings in FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the kernel's definition of __NFDBITS if applications #include <linux/types.h> after including <sys/select.h>. A build failure would be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2 flags to gcc. It was suggested that the kernel should either match the glibc definition of __NFDBITS or remove that entirely. The current in-kernel uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no uses of the related __FDELT and __FDMASK defines. Given that, we'll continue the cleanup that was started with commit `8b3d1cda4f` ("posix_types: Remove fd_set macros") and drop the remaining unused macros. Additionally, linux/time.h has similar macros defined that expand to nothing so we'll remove those at the same time. Reported-by: Jeff Law <law@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> CC: <stable@vger.kernel.org> Signed-off-by: Josh Boyer <jwboyer@redhat.com> [ .. and fix up whitespace as per akpm ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-26 13:36:43 -07:00
Linus Torvalds	fa93669a19	Driver core merge for 3.6-rc1 Here's the big driver core pull request for 3.6-rc1. Unlike 3.5, this kernel should be a lot tamer, with the printk changes now settled down. All we have here is some extcon driver updates, w1 driver updates, a few printk cleanups that weren't needed for 3.5, but are good to have now, and some other minor fixes/changes in the driver core. All of these have been in the linux-next releases for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAlARgIUACgkQMUfUDdst+ynDHgCfRNwIB9L+zZvjcKE5e1BhDbUl wVUAn398DFgbJ1+PjGkd1EMR2uVTh7Ou =MIFu -----END PGP SIGNATURE----- Merge tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core changes from Greg Kroah-Hartman: "Here's the big driver core pull request for 3.6-rc1. Unlike 3.5, this kernel should be a lot tamer, with the printk changes now settled down. All we have here is some extcon driver updates, w1 driver updates, a few printk cleanups that weren't needed for 3.5, but are good to have now, and some other minor fixes/changes in the driver core. All of these have been in the linux-next releases for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits) printk: Export struct log size and member offsets through vmcoreinfo Drivers: hv: Change the hex constant to a decimal constant driver core: don't trigger uevent after failure extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device sysfs: fail dentry revalidation after namespace change fix sysfs: fail dentry revalidation after namespace change extcon: spelling of detach in function doc extcon: arizona: Stop microphone detection if we give up on it extcon: arizona: Update cable reporting calls and split headset PM / Runtime: Do not increment device usage counts before probing kmsg - do not flush partial lines when the console is busy kmsg - export "continuation record" flag to /dev/kmsg kmsg - avoid warning for CONFIG_PRINTK=n compilations kmsg - properly print over-long continuation lines driver-core: Use kobj_to_dev instead of re-implementing it driver-core: Move kobj_to_dev from genhd.h to device.h driver core: Move deferred devices to the end of dpm_list before probing driver core: move uevent call to driver_register driver core: fix shutdown races with probe/remove(v3) Extcon: Arizona: Add driver for Wolfson Arizona class devices ...	2012-07-26 11:25:33 -07:00
Linus Torvalds	b13bc8dda8	Staging tree patches for 3.6-rc1 Here's the big staging tree merge for the 3.6-rc1 merge window. There are some patches in here outside of drivers/staging/, notibly the iio code (which is still stradeling the staging / not staging boundry), the pstore code, and the tracing code. All of these have gotten ackes from the various subsystem maintainers to be included in this tree. The pstore and tracing patches are related, and are coming here as they replace one of the android staging drivers. Otherwise, the normal staging mess. Lots of cleanups and a few new drivers (some iio drivers, and the large csr wireless driver abomination.) Note, you will get a merge issue with the following files: drivers/staging/comedi/drivers/s626.h drivers/staging/gdm72xx/netlink_k.c both of which should be trivial for you to handle. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAlAQiD8ACgkQMUfUDdst+ykxhgCeMUjvc+1RTtSprzvkzpejgoUU 6A4AnAleWMnkaCD8vruGnRdGl/Qtz51+ =mN6M -----END PGP SIGNATURE----- Merge tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging tree patches from Greg Kroah-Hartman: "Here's the big staging tree merge for the 3.6-rc1 merge window. There are some patches in here outside of drivers/staging/, notibly the iio code (which is still stradeling the staging / not staging boundry), the pstore code, and the tracing code. All of these have gotten acks from the various subsystem maintainers to be included in this tree. The pstore and tracing patches are related, and are coming here as they replace one of the android staging drivers. Otherwise, the normal staging mess. Lots of cleanups and a few new drivers (some iio drivers, and the large csr wireless driver abomination.) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up trivial conflicts in drivers/staging/comedi/drivers/s626.h and drivers/staging/gdm72xx/netlink_k.c * tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1108 commits) staging: csr: delete a bunch of unused library functions staging: csr: remove csr_utf16.c staging: csr: remove csr_pmem.h staging: csr: remove CsrPmemAlloc staging: csr: remove CsrPmemFree() staging: csr: remove CsrMemAllocDma() staging: csr: remove CsrMemCalloc() staging: csr: remove CsrMemAlloc() staging: csr: remove CsrMemFree() and CsrMemFreeDma() staging: csr: remove csr_util.h staging: csr: remove CsrOffSetOf() stating: csr: remove unneeded #includes in csr_util.c staging: csr: make CsrUInt16ToHex static staging: csr: remove CsrMemCpy() staging: csr: remove CsrStrLen() staging: csr: remove CsrVsnprintf() staging: csr: remove CsrStrDup staging: csr: remove CsrStrChr() staging: csr: remove CsrStrNCmp staging: csr: remove CsrStrCmp ...	2012-07-26 11:14:49 -07:00
Chris Mason	b24baf6917	Btrfs: uninit variable fixes in send/receive Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 19:21:10 -04:00
Chris Mason	113c1cb530	Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 19:19:10 -04:00
Alexander Block	31db9f7c23	Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:30:19 +02:00
Alexander Block	7069830a9e	Btrfs: add btrfs_compare_trees function This function is used to find the differences between two trees. The tree compare skips whole subtrees if it detects shared tree blocks and thus is pretty fast. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:30:14 +02:00
Alexander Block	8ea05e3a42	Btrfs: introduce subvol uuids and times This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>	2012-07-25 23:28:38 +02:00
Alexander Block	91cb916ca2	Btrfs: make iref_to_path non static Make iref_to_path non static (needed in send) and rename it to btrfs_iref_to_path Signed-off-by: Alexander Block <ablock84@googlemail.com>	2012-07-25 23:25:06 +02:00
Chris Mason	cd1cfc4915	Btrfs: add a barrier before a waitqueue_active check We were missing wakeups on the delayed ref waitqueue due to races on waitqueue_active. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:15:08 -04:00
Chris Mason	e9fbcb4220	Btrfs: call the ordered free operation without any locks held Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> cc: stable@kernel.org	2012-07-25 16:15:07 -04:00
Mitch Harder	2b0ce2c290	Btrfs: Check INCOMPAT flags on remount and add helper function In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:14:31 -04:00
Chris Mason	b478b2baa3	Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-25 16:11:38 -04:00
Jeff Layton	764a1b1ace	cifs: ensure that we always do cifsFileInfo_get under the spinlock The readpages bug is a regression that was introduced in `6993f74a5`. This also fixes a couple of similar bugs in the uncached read and write codepaths. Also, prevent this sort of thing in the future by having cifsFileInfo_get take the spinlock itself, and adding a _locked variant for use in places that are already holding the lock. The _put code has always done that so this makes for a less confusing interface. Cc: <stable@vger.kernel.org> # 3.5.x Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-25 14:51:30 -05:00
Arne Jansen	e679376911	Btrfs: add helper for tree enumeration Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-25 17:33:18 +02:00
David Sterba	362a20c5e2	btrfs: allow cross-subvolume file clone Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-25 17:33:09 +02:00
Stanislav Kinsbursky	57c8b13e3c	NFSd: set nfsd_serv to NULL after service destruction In nfsd_destroy(): if (destroy) svc_shutdown_net(nfsd_serv, net); svc_destroy(nfsd_server); svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets nfsd_serv to NULL, causing a NULL dereference on the following line. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:21:31 -04:00
Stanislav Kinsbursky	19f7e2ca44	NFSd: introduce nfsd_destroy() helper Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:21:30 -04:00
J. Bruce Fields	a007c4c3e9	nfsd: add get_uint for u32's I don't think there's a practical difference for the range of values these interfaces should see, but it would be safer to be unambiguous. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:18:27 -04:00
Stanislav Kinsbursky	a6d88f293e	NFSd: fix locking in nfsd_forget_delegations() This patch adds recall_lock hold to nfsd_forget_delegations() to protect nfsd_process_n_delegations() call. Also, looks like it would be better to collect delegations to some local on-stack list, and then unhash collected list. This split allows to simplify locking, because delegation traversing is protected by recall_lock, when delegation unhash is protected by client_mutex. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:18:27 -04:00
Vivek Trivedi	5559b50acd	nfsd4: fix cr_principal comparison check in same_creds This fixes a wrong check for same cr_principal in same_creds Introduced by `8fbba96e5b` "nfsd4: stricter cred comparison for setclientid/exchange_id". Cc: stable@vger.kernel.org Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-25 09:05:30 -04:00
Linus Torvalds	801b03653f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse. * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Eliminate 64-bit divides GFS2: Reduce file fragmentation GFS2: kernel panic with small gfs2 filesystems - 1 RG GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs GFS2: Combine functions get_local_rgrp and gfs2_inplace_reserve GFS2: Add kobject release method GFS2: Size seq_file buffer more carefully GFS2: Use seq_vprintf for glocks debugfs file seq_file: Add seq_vprintf function and export it GFS2: Use lvbs for storing rgrp information with mount option GFS2: Cache last hash bucket for glock seq_files GFS2: Increase buffer size for glocks and glstats debugfs files GFS2: Fix error handling when reading an invalid block from the journal GFS2: Add "top dir" flag support GFS2: Fold quota data into the reservations struct GFS2: Extend the life of the reservations	2012-07-24 17:57:05 -07:00
Linus Torvalds	08d9329c29	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara: "Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs. I'm on vacation and scarcely checking email since we are expecting baby any day now but these fixes should be safe to go in and I don't want to delay them unnecessarily." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: avoid info leak on export isofs: avoid info leak on export udf: Improve table length check to avoid possible overflow ext3: Check return value of blkdev_issue_flush() jbd: Check return value of blkdev_issue_flush() udf: Do not decrement i_blocks when freeing indirect extent block udf: Fix memory leak when mounting ext2: cleanup the confused goto label UDF: Remove unnecessary variable "offset" from udf_fill_inode udf: stop using s_dirt ext3: force ro mount if ext3_setup_super() fails quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>	2012-07-24 17:40:44 -07:00
Linus Torvalds	d14b7a419a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Trivial updates all over the place as usual." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits) Fix typo in include/linux/clk.h . pci: hotplug: Fix typo in pci iommu: Fix typo in iommu video: Fix typo in drivers/video Documentation: Add newline at end-of-file to files lacking one arm,unicore32: Remove obsolete "select MISC_DEVICES" module.c: spelling s/postition/position/g cpufreq: Fix typo in cpufreq driver trivial: typo in comment in mksysmap mach-omap2: Fix typo in debug message and comment scsi: aha152x: Fix sparse warning and make printing pointer address more portable. Change email address for Steve Glendinning Btrfs: fix typo in convert_extent_bit via: Remove bogus if check netprio_cgroup.c: fix comment typo backlight: fix memory leak on obscure error path Documentation: asus-laptop.txt references an obsolete Kconfig item Documentation: ManagementStyle: fixed typo mm/vmscan: cleanup comment error in balance_pgdat mm: cleanup on the comments of zone_reclaim_stat ...	2012-07-24 13:34:56 -07:00
Pavel Shilovsky	29e20f9c65	CIFS: Make CAP_* checks protocol independent Since both CIFS and SMB2 use ses->capabilities (server->capabilities) field but flags are different we should make such checks protocol independent. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 14:12:03 -05:00
Pavel Shilovsky	d60622eb5a	CIFS: Allow SMB2 statistics to be tracked Since there are only 19 command codes, it also is easier to track by exact command code than it was for cifs. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:20 +04:00
Pavel Shilovsky	44c581866e	CIFS: Move clear/print_stats code to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:18 +04:00
Pavel Shilovsky	9094fad1ed	CIFS: Add echo request support for SMB2 Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:17 +04:00
Pavel Shilovsky	f6d7617862	CIFS: Move echo code to osp struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:15 +04:00
Pavel Shilovsky	c95b8eeda3	CIFS: Add SMB2 support for async requests Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:14 +04:00
Pavel Shilovsky	45740847e2	CIFS: Setup async request in ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:12 +04:00
Pavel Shilovsky	25e266320c	CIFS: Add SMB2 support for build_path_to_root Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:11 +04:00
Pavel Shilovsky	9224dfc2f9	CIFS: Move building path to root to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:10 +04:00
Pavel Shilovsky	be4cb9e3d4	CIFS: Query SMB2 inode info Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:08 +04:00
Pavel Shilovsky	1208ef1f76	CIFS: Move query inode info code to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:07 +04:00
Pavel Shilovsky	2503a0dba9	CIFS: Add SMB2 support for is_path_accessible that needs for a successful mount through SMB2 protocol. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:05 +04:00
Pavel Shilovsky	68889f269b	CIFS: Move is_path_accessible to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:04 +04:00
Pavel Shilovsky	af4281dc22	CIFS: Move informational tcon calls to ops struct and rename variables in cifs_mount. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:02 +04:00
Pavel Shilovsky	b669f33ca6	CIFS: Move getting dfs referalls to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:01 +04:00
Pavel Shilovsky	aa24d1e969	CIFS: Process reconnects for SMB2 shares Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:59 +04:00
Pavel Shilovsky	faaf946a7d	CIFS: Add tree connect/disconnect capability for SMB2 Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:58 +04:00
Pavel Shilovsky	5478f9ba9a	CIFS: Add session setup/logoff capability for SMB2 Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:57 +04:00
Pavel Shilovsky	ec2e4523fd	CIFS: Add capability to send SMB2 negotiate message and add negotiate request type to let set_credits know that we are only on negotiate stage and no need to make a decision about disabling echos and oplocks. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:55 +04:00
Pavel Shilovsky	3792c17328	CIFS: Respect SMB2 header/max header size Use SMB2 header size values for allocation and memset because they are bigger and suitable for both CIFS and SMB2. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:54 +04:00
Pavel Shilovsky	093b2bdad3	CIFS: Make demultiplex_thread work with SMB2 code Now we can process SMB2 messages: check message, get message id and wakeup awaiting routines. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:52 +04:00
Pavel Shilovsky	4b1241006c	CIFS: Fix a wrong pointer in atomic_open Commit `30d9049474` caused a regression in cifs open codepath. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:26 +04:00
Pavel Shilovsky	28ea5290d7	CIFS: Add SMB2 credits support For SMB2 protocol we can add more than one credit for one received request: it depends on CreditRequest field in SMB2 response header. Also we divide all requests by type: echoes, oplocks and others. Each type uses its own slot pull. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:23 -05:00
Pavel Shilovsky	2dc7e1c033	CIFS: Make transport routines work with SMB2 Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:20 -05:00
Steve French	ddfbefbd39	CIFS: Map SMB2 status codes to POSIX errors Add mapping table for 32 bit SMB2 status codes to linux errors. Note that SMB2 does not use DOS/OS2 errors (ever) so mapping to DOS/OS2 errors as a common network subset (as we do for cifs) doesn't help. And note that the set of status codes is much more complete here. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:15 -05:00
Pavel Shilovsky	b8030603d9	CIFS: Add SMB2 status codes Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:13 -05:00
Pavel Shilovsky	f7ec0d0bbc	CIFS: Rename 7 error codes to NT_ style and consider such codes as CIFS errors. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:10 -05:00
Pavel Shilovsky	6d5786a34d	CIFS: Rename Get/FreeXid and make them work with unsigned int Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:08 -05:00
Pavel Shilovsky	2e6e02ab6d	CIFS: Move protocol specific tcon/tdis code to ops struct and rename variables around the code changes. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:06 -05:00
Pavel Shilovsky	58c45c58a1	CIFS: Move protocol specific session setup/logoff code to ops struct Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:03 -05:00
Cong Wang	2164d33446	pipe: remove KM_USER0 from comments Signed-off-by: Cong Wang <amwang@redhat.com>	2012-07-24 15:27:34 +08:00
Pavel Shilovsky	286170aa24	CIFS: Move protocol specific negotiate code to ops struct Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:33:26 -05:00
Pavel Shilovsky	a891f0f895	CIFS: Extend credit mechanism to process request type Split all requests to echos, oplocks and others - each group uses its own credit slot. This is indicated by new flags CIFS_ECHO_OP and CIFS_OBREAK_OP that are not used now for CIFS. This change is required to support SMB2 protocol because of different processing of these commands. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:32:48 -05:00
Pavel Shilovsky	316cf94a91	CIFS: Move trans2 processing to ops struct Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:32:25 -05:00
Linus Torvalds	83c7f72259	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Notable highlights: - iommu improvements from Anton removing the per-iommu global lock in favor of dividing the DMA space into pools, each with its own lock, and hashed on the CPU number. Along with making the locking more fine grained, this gives significant improvements in multiqueue networking scalability. - Still from Anton, we know provide a vdso based variant of getcpu which makes sched_getcpu with the appropriate glibc patch something like 18 times faster. - More anton goodness (he's been busy !) in other areas such as a faster __clear_user and copy_page on P7, various perf fixes to improve sampling quality, etc... - One more step toward removing legacy i2c interfaces by using new device-tree based probing of platform devices for the AOA audio drivers - A nice series of patches from Michael Neuling that helps avoiding confusion between register numbers and litterals in assembly code, trying to enforce the use of "%rN" register names in gas rather than plain numbers. - A pile of FSL updates - The usual bunch of small fixes, cleanups etc... You may spot a change to drivers/char/mem. The patch got no comment or ack from outside, it's a trivial patch to allow the architecture to skip creating /dev/port, which we use to disable it on ppc64 that don't have a legacy brige. On those, IO ports 0...64K are not mapped in kernel space at all, so accesses to /dev/port cause oopses (and yes, distros -still- ship userspace that bangs hard coded ports such as kbdrate)." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/mpic: Create a revmap with enough entries for IPIs and timers Remove stale .rej file powerpc/iommu: Fix iommu pool initialization powerpc/eeh: Check handle_eeh_events() return value powerpc/85xx: Add phy nodes in SGMII mode for MPC8536/44/72DS & P2020DS powerpc/e500: add paravirt QEMU platform powerpc/mpc85xx_ds: convert to unified PCI init powerpc/fsl-pci: get PCI init out of board files powerpc/85xx: Update corenet64_smp_defconfig powerpc/85xx: Update corenet32_smp_defconfig powerpc/85xx: Rename P1021RDB-PC device trees to be consistent powerpc/watchdog: move booke watchdog param related code to setup-common.c sound/aoa: Adapt to new i2c probing scheme i2c/powermac: Improve detection of devices from device-tree powerpc: Disable /dev/port interface on systems without an ISA bridge of: Improve prom_update_property() function powerpc: Add "memory" attribute for mfmsr() powerpc/ftrace: Fix assembly trampoline register usage powerpc/hw_breakpoints: Fix incorrect pointer access powerpc: Put the gpr save/restore functions in their own section ...	2012-07-23 18:54:23 -07:00
Jeff Layton	7659624ffb	cifs: reinstate sec=ntlmv2 mount option sec=ntlmv2 as a mount option got dropped in the mount option overhaul. Cc: Sachin Prabhu <sprabhu@redhat.com> Cc: <stable@vger.kernel.org> # 3.4+ Reported-by: Günter Kukkukk <linux@kukkukk.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 20:48:24 -05:00
Linus Torvalds	93912fe69d	* Added another debugfs knob for forcing UBIFS R/O mode without flushing caches or finishing commit or any other I/O operation. I've originally added this knob in order to reproduce the free space fixup bug (see `c672793`) on nandsim. Without this knob I would have to do real power-cuts, which would make debugging much harder. Then I've decided to keep this knob because it is also useful for UBIFS power-cut recovery end error-paths testing. * Well-spotted fix from Julia. This bug did not cause real troubles for UBIFS, but nevertheless it could cause issues for someone trying to modify the orphans handling code. Kudos to coccinelle! * Minor cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQDQxSAAoJECmIfjd9wqK0W3QQAKCbDBNMJaj/idXcupatoOI2 4M3IBJ5mwfjrBeXy1kcsxxSadW7fJTy6FK7jfPpYq2iCLmFkh0Ba6QSgHHgCmb1v 3ExIR2+2hLl1YhliDU1q+k9H4fGDklq/CNgiTGDti0pdPczZfBKrksXQy8leqT8d MxzQDzQsCxVKIizHHMEr373s1y8QHhabFK1LU1wkXJK/p/bYS0mgD0BOjIrZKuCd W5L5XzIIbcXs1h91Lgdnr8fj5TmMtM1tQ890U2OaNPt+44Eh6Gjjku3N8RaWvaIe vXMpDNb00NwtJp0V2SIsez3VtaA1xJaW7bd4UcyNNvBy5aECNx8K0IbZKTUkiMSj 3xOiqZr/LWpPkAq3NL6kAZkgiam9Aez5Yc+XTna3HP79L/LYbXsJiOewlDwvGbrh 89+EatjLUVFh00BvzmmxuZkm2bWm3sPQiAKWqwc7ijMGwnPt+c6Eiepgw3XXVHgI qmw1Ddkh4VxdGXi8r3RMxg21MqFttM0astT5cmQeF//H6Wq3S2u8RFgZqaLpGc6s vcgx92V6BBrWzCZKxBQs3Bd21uSrVsOa/js5wORpuWZAGk1Yy1BFy1vFC0PuqmcI kvZa/bQPP3/PiHAI8qq2Min5baoG9jUl6ui1eM8KXECkzQSQJjxN1bth8kzZHuPY HMNoK0RVUt3V5xtYI7i7 =ecyI -----END PGP SIGNATURE----- Merge tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs Pull UBIFS updates from Artem Bityutskiy: - Added another debugfs knob for forcing UBIFS R/O mode without flushing caches or finishing commit or any other I/O operation. I've originally added this knob in order to reproduce the free space fixup bug (see commit c6727932cfdb: "UBIFS: fix a bug in empty space fix-up") on nandsim. Without this knob I would have to do real power-cuts, which would make debugging much harder. Then I've decided to keep this knob because it is also useful for UBIFS power-cut recovery end error-paths testing. - Well-spotted fix from Julia. This bug did not cause real troubles for UBIFS, but nevertheless it could cause issues for someone trying to modify the orphans handling code. Kudos to coccinelle! - Minor cleanups. * tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: remove invalid reference to list iterator variable UBIFS: simplify reply code a bit UBIFS: add debugfs knob to switch to R/O mode UBIFS: fix compilation warning	2012-07-23 15:50:52 -07:00
Jeff Layton	762a4206a3	cifs: rename cifs_sign_smb2 to cifs_sign_smbv "smb2" makes me think of the SMB2.x protocol, which isn't at all what this function is for... Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:34 -05:00
Jeff Layton	d971e0656b	cifs: remove bogus reset of smb_buf_length in smb_send routines There's a comment here about how we don't want to modify this length, but nothing in this function actually does. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:31 -05:00
Jeff Layton	c5fd363d77	cifs: move file_lock off stack in cifs_push_posix_locks struct file_lock is pretty large, so we really don't want that on the stack in a potentially long call chain. Reorganize the arguments to CIFSSMBPosixLock to eliminate the need for that. Eliminate the get_flag and simply use a non-NULL pLockInfo to indicate that this is a "get" operation. In order to do that, need to add a new loff_t argument for the start_offset. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:29 -05:00
Jeff Layton	ac3aa2f8ae	cifs: remove extraneous newlines from cERROR and cFYI calls Those macros add a newline on their own, so there's not any need to embed one in the message itself. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:26 -05:00
Jeff Layton	00401ff780	cifs: after upcalling for krb5 creds, invalidate key rather than revoking it Calling key_revoke here isn't ideal as further requests for the key will end up returning -EKEYREVOKED until it gets purged from the cache. What we really intend here is to force a new upcall on the next request_key. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:24 -05:00
Liu Bo	67c9684f48	Btrfs: improve multi-thread buffer read While testing with my buffer read fio jobs[1], I find that btrfs does not perform well enough. Here is a scenario in fio jobs: We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file, and all of them will race on add_to_page_cache_lru(), and if one thread successfully puts its page into the page cache, it takes the responsibility to read the page's data. And what's more, reading a page needs a period of time to finish, in which other threads can slide in and process rest pages: t1 t2 t3 t4 add Page1 read Page1 add Page2 \| read Page2 add Page3 \| \| read Page3 add Page4 \| \| \| read Page4 -----\|------------\|-----------\|-----------\|-------- v v v v bio bio bio bio Now we have four bios, each of which holds only one page since we need to maintain consecutive pages in bio. Thus, we can end up with far more bios than we need. Here we're going to a) delay the real read-page section and b) try to put more pages into page cache. With that said, we can make each bio hold more pages and reduce the number of bios we need. Here is some numbers taken from fio results: w/o patch w patch ------------- -------- --------------- READ: 745MB/s +25% 934MB/s [1]: [global] group_reporting thread numjobs=4 bs=32k rw=read ioengine=sync directory=/mnt/btrfs/ [READ] filename=foobar size=2000M invalidate=1 Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:10 -04:00
Liu Bo	df57dbe6bf	Btrfs: make btrfs's allocation smoothly with preallocation For backref walking, we've introduce delayed ref's sequence. However, it changes our preallocation behavior. The story is that when we preallocate an extent and then mark it written piece by piece, the ideal case should be that we don't need to COW the extent, which is why we use 'preallocate'. But we may not make use of preallocation, since when we check for cross refs on the extent, we may have two ref entries which have the same content except the sequence value, and we recognize them as cross refs and do COW to allocate another extent. So we end up with several pieces of space instead of an whole extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:10 -04:00
Josef Bacik	51561ffec9	Btrfs: lock the transition from dirty to writeback for an eb There is a small window where an eb can have no IO bits set on it, which could potentially result in extent_buffer_under_io() returning false when we want it to return true, which could result in not fun things happening. So in order to protect this case we need to hold the refs_lock when we make this transition to make sure we get reliable results out of extent_buffer_udner_io(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:09 -04:00
Josef Bacik	594831c4b2	Btrfs: fix potential race in extent buffer freeing This sounds sort of impossible but it is the only thing I can think of and at the very least it is theoretically possible so here it goes. If we are in try_release_extent_buffer we will check that the ref count on the extent buffer is 1 and not under IO, and then go down and clear the tree ref. If between this check and clearing the tree ref somebody else comes in and grabs a ref on the eb and the marks it dirty before try_release_extent_buffer() does it's tree ref clear we can end up with a dirty eb that will be freed while it is still dirty which will result in a panic. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:09 -04:00
Josef Bacik	e64860aa05	Btrfs: don't return true in releasepage unless we actually freed the eb I noticed while looking at an extent_buffer race that we will unconditionally return 1 if we get down to release_extent_buffer after clearing the tree ref. However we can easily race in here and get a ref on the eb and not actually free the eb. So make release_extent_buffer return 1 if it free'd the eb and 0 if not so we can be a little kinder to the vm. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:08 -04:00
Stefan Behrens	a98cdb85b9	Btrfs: suppress printk() if all device I/O stats are zero Code is added to suppress the I/O stats printing at mount time if all statistic values are zero. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-07-23 16:28:07 -04:00
Stefan Behrens	5021976d8d	Btrfs: remove unwanted printk() for btrfs device I/O stats People complained about the annoying kernel log message "btrfs: no dev_stats entry found ... (OK on first mount after mkfs)" everytime a filesystem is mounted for the first time after running mkfs. Since the distribution of the btrfs-progs is not synchronized to the kernel version, mkfs like it is now will be used also in the future. Then this message is not useful to find errors, it is just annoying. This commit removes the printk(). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-07-23 16:28:07 -04:00
Li Zefan	18077bb413	Btrfs: rewrite BTRFS_SETGET_FUNCS BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and btrfs_foo() functions, which read and write specific fields in the extent buffer. The total number of set/get functions is ~200, but in fact we only need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64. It results in redunction of ~37K bytes. text data bss dec hex filename 629661 12489 216 642366 9cd3e fs/btrfs/btrfs.o.orig 592637 12489 216 605342 93c9e fs/btrfs/btrfs.o Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:06 -04:00
Li Zefan	293f7e0740	Btrfs: zero unused bytes in inode item The otime field is not zeroed, so users will see random otime in an old filesystem with a new kernel which has otime support in the future. The reserved bytes are also not zeroed, and we'll have compatibility issue if we make use of those bytes. Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:05 -04:00
Li Zefan	b4d7c3c945	Btrfs: kill free_space pointer from inode structure Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type, which means every inode has the same BTRFS_I(inode)->free_space pointer. This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits). Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:05 -04:00
Anand Jain	d5b025d510	btrfs read error corrected message floods the console during recovery Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:04 -04:00
Jan Schmidt	e6466e354a	Btrfs: fix buffer leak in btrfs_next_old_leaf When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:03 -04:00
Liu Bo	f6175efab1	Btrfs: do not count in readonly bytes If a block group is ro, do not count its entries in when we dump space info. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:03 -04:00
Liu Bo	799ffc3c31	Btrfs: add ro notification to dump_space_info Block group has ro attributes, make dump_space_info show it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:02 -04:00
Liu Bo	cf7c1ef6e1	Btrfs: fix a bug of writting free space cache during balance Here is the whole story: 1) A free space cache consists of two parts: o free space cache inode, which is special becase it's stored in root tree. o free space info, which is stored as the above inode's file data. But we only build up another new inode and does not flush its free space info onto disk when we _clear and setup_ free space cache, and this ends up with that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN. And holding DC_SETUP means that we will not truncate this free space cache inode, which means the disk offset of its file extent will remain _unchanged_ at least until next transaction finishes committing itself. 2) We can set a block group readonly when we relocate the block group. However, if the readonly block group covers the disk offset where our free space cache inode is going to write, it will force the free space cache inode into cow_file_range() and it'll end up hitting a BUG_ON. 3) Due to the above analysis, we fix this bug by adding the missing dirty flag. 4) However, it's not over, there is still another case, nospace_cache. With nospace_cache, we do not want to set dirty flag, instead we just truncate free space cache inode and bail out with setting cache state DC_WRITTEN. We can benifit from it since it saves us another 'pre-allocation' part which usually costs a lot. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:02 -04:00
Liu Bo	0678938423	Btrfs: do not abort transaction in prealloc case During disk balance, we prealloc new file extent for file data relocation, but we may fail in 'no available space' case, and it leads to flipping btrfs into readonly. It is not necessary to bail out and abort transaction since we do have several ways to rescue ourselves from ENOSPC case. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:01 -04:00
Liu Bo	83eea1f1ba	Btrfs: kill root from btrfs_is_free_space_inode Since root can be fetched via BTRFS_I macro directly, we can save an args for btrfs_is_free_space_inode(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:00 -04:00
Liu Bo	51a8cf9d2d	Btrfs: fix btrfs_is_free_space_inode to recognize btree inode For btree inode, its root is also 'tree root', so btree inode can be misunderstood as a free space inode. We should add one more check for btree inode. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:00 -04:00
Stefan Behrens	c0901581ad	Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages() From btree_read_extent_buffer_pages(), currently repair_io_failure() can be called with mirror_num being zero when submit_one_bio() returned an error before. This used to cause a BUG_ON(!mirror_num) in repair_io_failure() and indeed this is not a case that needs the I/O repair code to rewrite disk blocks. This commit prevents calling repair_io_failure() in this case and thus avoids the BUG_ON() and malfunction. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:59 -04:00
Josef Bacik	f4c738c2e7	Btrfs: rework shrink_delalloc So shrink_delalloc has grown all sorts of cruft over the years thanks to many reworkings of how we track enospc. What happens now as we fill up the disk is we will loop for freaking ever hoping to reclaim a arbitrary amount of space of metadata, this was from when everybody flushed at the same time. Now we only have people flushing one at a time. So instead of trying to reclaim a huge amount of space, just try to flush a decent chunk of space, and stop looping as soon as we have enough free space to satisfy our reservation. This makes xfstests 224 go much faster. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:58 -04:00
Liu Bo	b9ca0664dc	Btrfs: do not set subvolume flags in readonly mode $ mkfs.btrfs /dev/sdb7 $ btrfstune -S1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs mount: block device /dev/sdb7 is write-protected, mounting read-only $ btrfs dev add /dev/sdb8 /mnt/btrfs/ Now we get a btrfs in which mnt flags has readonly but sb flags does not. So for those ioctls that only check sb flags with MS_RDONLY, it is going to be a problem. Setting subvolume flags is such an ioctl, we should use mnt_want_write_file() to check RO flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:58 -04:00
Liu Bo	e54bfa3104	Btrfs: use mnt_want_write_file instead of mnt_want_write mnt_want_write_file is faster when file has been opened for write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:57 -04:00
Liu Bo	768e9dfe82	Btrfs: remove redundant r/o check for superblock mnt_want_write() and mnt_want_write_file() will check sb->s_flags with MS_RDONLY, and we don't need to do it ourselves. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
Liu Bo	a874a63e13	Btrfs: check write access to mount earlier while creating snapshots Move check of write access to mount into upper functions so that we can use mnt_want_write_file instead, which is faster than mnt_want_write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
Liu Bo	287082b0bd	Btrfs: fix typo in cow_file_range_async and async_cow_submit It should be 10 * 1024 * 1024. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-23 16:27:55 -04:00
Josef Bacik	0e72110692	Btrfs: change how we indicate we're adding csums There is weird logic I had to put in place to make sure that when we were adding csums that we'd used the delalloc block rsv instead of the global block rsv. Part of this meant that we had to free up our transaction reservation before we ran the delayed refs since csum deletion happens during the delayed ref work. The problem with this is that when we release a reservation we will add it to the global reserve if it is not full in order to keep us going along longer before we have to force a transaction commit. By releasing our reservation before we run delayed refs we don't get the opportunity to drain down the global reserve for the work we did, so we won't refill it as often. This isn't a problem per-se, it just results in us possibly committing transactions more and more often, and in rare cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran out of space in our block rsv. This also helps us by holding onto space while the delayed refs run so we don't end up with as many people trying to do things at the same time, which again will help us not force commits or hit the use_block_rsv warnings. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:55 -04:00
Tsutomu Itoh	b995929515	Btrfs: return error of btrfs_update_inode() to caller We didn't check error of btrfs_update_inode(), but that error looks easy to bubble back up. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:54 -04:00
Dan Carpenter	23291a044c	Btrfs: fix error handling in __add_reloc_root() We dereferenced "node" in the error message after freeing it. Also btrfs_panic() can return so we should return an error code instead of continuing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-23 16:27:53 -04:00
Ilya Dryomov	44c44af2f4	Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting There used to be a BUG_ON(ret) there before EH patch (`79787eaa`) went in. Bail out with EINVAL. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-07-23 16:27:53 -04:00
Ilya Dryomov	fed425c742	Btrfs: do not return EINVAL instead of ENOMEM from open_ctree() When bailing from open_ctree() err is returned, not ret. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-07-23 16:27:52 -04:00
Josef Bacik	02db0844be	Btrfs: add DEVICE_READY ioctl This will be used in conjunction with btrfs device ready <dev>. This is needed for initrd's to have a nice and lightweight way to tell if all of the devices needed for a file system are in the cache currently. This keeps them from having to do mount+sleep loops waiting for devices to show up. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:42 -04:00
J. Bruce Fields	0ec4f431eb	locks: fix checking of fcntl_setlease argument The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually don't cause problems in mainline, as of Dave Jones's commit `8d657eb3b4` "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-23 12:46:01 -07:00
Josef Bacik	96c3f4331a	Btrfs: flush delayed inodes if we're short on space Those crazy gentoo guys have been complaining about ENOSPC errors on their portage volumes. This is because doing things like untar tends to create lots of new files which will soak up all the reservation space in the delayed inodes. Usually this gets papered over by the fact that we will try and commit the transaction, however if this happens in the wrong spot or we choose not to commit the transaction you will be screwed. So add the ability to expclitly flush delayed inodes to free up space. Please test this out guys to make sure it works since as usual I cannot reproduce. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 15:41:40 -04:00
David Sterba	b27f7c0c15	btrfs: join DEV_STATS ioctls to one Commit `c11d2c236c` (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-23 15:41:40 -04:00
Andrew Mahone	a43a211133	btrfs: ignore unfragmented file checks in defrag when compression enabled - rebased Rebased on btrfs-next and retested. Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip checks for adjacent extents and extent size when deciding whether to defrag, as these can prevent an uncompressed and unfragmented file from being compressed as requested. Signed-off-by: Andrew Mahone <andrew.mahone@gmail.com>	2012-07-23 15:41:39 -04:00
Dan Carpenter	e4b50e14c8	Btrfs: small naming cleanup in join_transaction() "root->fs_info" and "fs_info" are the same, but "fs_info" is prefered because it is shorter and that's what is used in the rest of the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-23 15:41:39 -04:00
Alexander Block	2bc5565286	Btrfs: don't update atime on RO subvolumes Before the update_time inode operation was indroduced, it was not possible to prevent updates of atime on RO subvolumes. VFS was only able to check for RO on the mount, but did not know anything about btrfs subvolumes. btrfs_update_time does now check if the root is RO and skip updating of times. Signed-off-by: Alexander Block <ablock84@googlemail.com>	2012-07-23 15:41:38 -04:00
Arnd Hannemann	063849eafd	Btrfs: allow mount -o remount,compress=no Btrfs allows to turn on compression on a mounted and used filesystem by issuing mount -o remount,compress=lzo. This patch allows to turn compression off again while the filesystem is mounted. As suggested by David Sterba if the compress-force option was set, it is implicitly cleared if compression is turned off. Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Arnd Hannemann <arnd@arndnet.de>	2012-07-23 15:41:38 -04:00
Josef Bacik	c5c3c5f31e	Btrfs: remove ->dirty_inode We do all of our inode updating when we change it, and now that we do ->update_time we don't need ->dirty_inode for atime updates anymore, so just remove it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-07-23 15:41:38 -04:00
Chris Mason	cbea5ac1ee	Btrfs: reduce calls to wake_up on uncontended locks The btrfs locks were unconditionally calling wake_up as the locks were released. This lead to extra thrashing on the waitqueue, especially for locks that were dominated by readers. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-23 15:36:18 -04:00
Chris Mason	e39e64ac0c	Btrfs: don't wait around for new log writers on an SSD Waiting on spindles improves performance, but ssds want all the IO as quickly as we can push it down. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-23 15:36:17 -04:00
Linus Torvalds	a66d2c8f7e	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull the big VFS changes from Al Viro: "This one is big and changes quite a few things around VFS. What's in there: - the first of two really major architecture changes - death to open intents. The former is finally there; it was very long in making, but with Miklos getting through really hard and messy final push in fs/namei.c, we finally have it. Unlike his variant, this one doesn't introduce struct opendata; what we have instead is ->atomic_open() taking preallocated struct file * and passing everything via its fields. Instead of returning struct file , it returns -E... on error, 0 on success and 1 in "deal with it yourself" case (e.g. symlink found on server, etc.). See comments before fs/namei.c:atomic_open(). That made a lot of goodies finally possible and quite a few are in that pile: ->lookup(), ->d_revalidate() and ->create() do not get struct nameidata anymore; ->lookup() and ->d_revalidate() get lookup flags instead, ->create() gets "do we want it exclusive" flag. With the introduction of new helper (kern_path_locked()) we are rid of all struct nameidata instances outside of fs/namei.c; it's still visible in namei.h, but not for long. Come the next cycle, declaration will move either to fs/internal.h or to fs/namei.c itself. [me, miklos, hch] - The second major change: behaviour of final fput(). Now we have __fput() done without any locks held by caller and not from deep in call stack. That obviously lifts a lot of constraints on the locking in there. Moreover, it's legal now to call fput() from atomic contexts (which has immediately simplified life for aio.c). We also don't need anti-recursion logics in __scm_destroy() anymore. There is a price, though - the damn thing has become partially asynchronous. For fput() from normal process we are guaranteed that pending __fput() will be done before the caller returns to userland, exits or gets stopped for ptrace. For kernel threads and atomic contexts it's done via schedule_work(), so theoretically we might need a way to make sure it's finished; so far only one such place had been found, but there might be more. There's flush_delayed_fput() (do all pending __fput()) and there's __fput_sync() (fput() analog doing __fput() immediately). I hope we won't need them often; see warnings in fs/file_table.c for details. [me, based on task_work series from Oleg merged last cycle] - sync series from Jan - large part of "death to sync_supers()" work from Artem; the only bits missing here are exofs and ext4 ones. As far as I understand, those are going via the exofs and ext4 trees resp.; once they are in, we can put ->write_super() to the rest, along with the thread calling it. - preparatory bits from unionmount series (from dhowells). - assorted cleanups and fixes all over the place, as usual. This is not the last pile for this cycle; there's at least jlayton's ESTALE work and fsfreeze series (the latter - in dire need of fixes, so I'm not sure it'll make the cut this cycle). I'll probably throw symlink/hardlink restrictions stuff from Kees into the next pile, too. Plus there's a lot of misc patches I hadn't thrown into that one - it's large enough as it is..." * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits) ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() switch dentry_open() to struct path, make it grab references itself spufs: shift dget/mntget towards dentry_open() zoran: don't bother with struct file * in zoran_map ecryptfs: don't reinvent the wheels, please - use struct completion don't expose I_NEW inodes via dentry->d_inode tidy up namei.c a bit unobfuscate follow_up() a bit ext3: pass custom EOF to generic_file_llseek_size() ext4: use core vfs llseek code for dir seeks vfs: allow custom EOF in generic_file_llseek code vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes vfs: Remove unnecessary flushing of block devices vfs: Make sys_sync writeout also block device inodes vfs: Create function for iterating over block devices vfs: Reorder operations during sys_sync quota: Move quota syncing to ->sync_fs method quota: Split dquot_quota_sync() to writeback and cache flushing part vfs: Move noop_backing_dev_info check from sync into writeback ...	2012-07-23 12:27:27 -07:00
Cong Wang	906adea153	jbd2: remove the second argument of kmap_atomic Signed-off-by: Cong Wang <amwang@redhat.com>	2012-07-23 14:11:22 +08:00
Prasad Joshi	9f0bbd8ca7	logfs: query block device for number of pages to send with bio The block device driver puts a limit on maximum number of pages that can be sent with the bio. Not all block devices can handle BIO_MAX_PAGES number of pages in bio. Specifically the virtio-blk diriver limits it to 126. When the LogFS file system was excersized in KVM, the following bug from do_virtblk_request() was observed static void do_virtblk_request(struct request_queue *q) { .... .... while ((req = blk_peek_request(q)) != NULL) { BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); .... .... } .... } The patch fixes the problem by querring the maximum number of pages in bio allowed from block device driver and then using those many pages during submit_bio. Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 10:32:11 +05:30
Prasad Joshi	41b93bc1ee	logfs: maintain the ordering of meta-inode destruction LogFS does not use a specialized area to maintain the inodes. The inodes information is kept in a specialized file called inode file. Similarly, the segment information is kept in a segment file. Since the segment file also has an inode which is kept in the inode file, the inode for segment file must be evicted before the inode for inode file. The change fixes the following BUG during unmount Pid: 2057, comm: umount Not tainted 3.5.0-rc6+ #25 Bochs Bochs RIP: 0010:[<ffffffffa005c5f2>] [<ffffffffa005c5f2>] move_page_to_btree+0x32/0x1f0 [logfs] Process umount (pid: 2057, threadinfo ...) Call Trace: [<ffffffff8112adca>] ? find_get_pages+0x2a/0x180 [<ffffffffa00549f5>] logfs_invalidatepage+0x85/0x90 [logfs] [<ffffffff81136c51>] truncate_inode_page+0xb1/0xd0 [<ffffffff81136dcf>] truncate_inode_pages_range+0x15f/0x490 [<ffffffff81558549>] ? printk+0x78/0x7a [<ffffffff81137185>] truncate_inode_pages+0x15/0x20 [<ffffffffa005b7fc>] logfs_evict_inode+0x6c/0x190 [logfs] [<ffffffff8155c75b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff8119e3d7>] evict+0xa7/0x1b0 [<ffffffff8119ea6e>] dispose_list+0x3e/0x60 [<ffffffff8119f1c4>] evict_inodes+0xf4/0x110 [<ffffffff81185b53>] generic_shutdown_super+0x53/0xf0 [<ffffffffa005d8f2>] logfs_kill_sb+0x52/0xf0 [logfs] [<ffffffff81185ec5>] deactivate_locked_super+0x45/0x80 [<ffffffff81186a4a>] deactivate_super+0x4a/0x70 [<ffffffff811a228e>] mntput_no_expire+0xde/0x140 [<ffffffff811a30ff>] sys_umount+0x6f/0x3a0 [<ffffffff8155d8e9>] system_call_fastpath+0x16/0x1b ---[ end trace 45f7752082cefafd ]--- Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 09:35:52 +05:30
Theodore Ts'o	03179fe923	ext4: undo ext4_calc_metadata_amount if we fail to claim space The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-23 00:00:20 -04:00
Brian Foster	97795d2a5b	ext4: don't let i_reserved_meta_blocks go negative If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-22 23:59:40 -04:00
Prasad Joshi	ddb24bbac3	logfs: create a pagecache page if it is not present While writing the partial journal entries we assumed that the page associated with the journal would always in locatable. This incorrect assumption resulted in the following BUG kernel BUG at /home/benixon/WD_SMR/kernels/linux-3.3.7-logfs/fs/logfs/journal.c:569! EIP is at logfs_write_area+0xb6/0x109 [logfs] EAX: 00000000 EBX: 00000000 ECX: ef6efea4 EDX: 00000000 ESI: 001b9000 EDI: f009e000 EBP: c3c13f14 ESP: c3c13ef0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process sync (pid: 1799, ti=c3c12000 task=f07825b0 task.ti=c3c12000) Stack: 01001000 c3c13f26 781b9000 00000000 f009e000 f7286000 f1f83400 f8445071 f1f83400 c3c13f30 f8445ae9 c3c13f20 0000100a 000ee000 f009e000 00000001 c3c13f5c f8445d17 c05eb0ee 00000000 f1f83400 ef718000 f009e25c ea9c3d80 Call Trace: [<f8445071>] ? account_shadow+0x16d/0x16d [logfs] [<f8445ae9>] logfs_write_je+0x2a/0x44 [logfs] [<f8445d17>] logfs_write_anchor+0x114/0x228 [logfs] [<c05eb0ee>] ? empty+0x5/0x5 [<f8444522>] logfs_sync_fs+0x1e/0x31 [logfs] [<c051be66>] __sync_filesystem+0x5d/0x6f [<c051be8d>] sync_one_sb+0x15/0x17 [<c04ff8b0>] iterate_supers+0x59/0x9a [<c051be78>] ? __sync_filesystem+0x6f/0x6f [<c051befc>] sys_sync+0x29/0x4f [<c084285f>] sysenter_do_call+0x12/0x28 EIP: [<f8445127>] logfs_write_area+0xb6/0x109 [logfs] SS:ESP 0068:c3c13ef0 ---[ end trace ef6e9ef52601a945 ]--- The fix is to create the pagecache page if it is not locatable. Reported-and-tested-by: Benixon Dhas <Benixon.Dhas@wdc.com> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 09:18:14 +05:30
Ashish Sangwan	968dee7722	ext4: fix hole punch failure when depth is greater than 0 Whether to continue removing extents or not is decided by the return value of function ext4_ext_more_to_rm() which checks 2 conditions: a) if there are no more indexes to process. b) if the number of entries are decreased in the header of "depth -1". In case of hole punch, if the last block to be removed is not part of the last extent index than this index will not be deleted, hence the number of valid entries in the extent header of "depth - 1" will remain as it is and ext4_ext_more_to_rm will return 0 although the required blocks are not yet removed. This patch fixes the above mentioned problem as instead of removing the extents from the end of file, it starts removing the blocks from the particular extent from which removing blocks is actually required and continue backward until done. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org	2012-07-22 22:49:08 -04:00
Artem Bityutskiy	b50924c2c6	ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() The '__ext4_handle_dirty_metadata()' does not need the 'now' argument anymore and we can kill it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:37:31 -04:00
Artem Bityutskiy	4d47603d97	ext4: weed out ext4_write_super We do not depend on VFS's '->write_super()' anymore and do not need the 's_dirt' flag anymore, so weed out 'ext4_write_super()' and 's_dirt'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:35:31 -04:00
Artem Bityutskiy	58c5873a76	ext4: remove unnecessary superblock dirtying This patch changes the 'ext4_handle_dirty_super()' function which submits the superblock for I/O in the following cases: 1. When creating the first large file on a file system without EXT4_FEATURE_RO_COMPAT_LARGE_FILE feature. 2. When re-sizing the file-system. 3. When creating an xattr on a file-system without the EXT4_FEATURE_COMPAT_EXT_ATTR feature. If the file-system has journal enabled, the superblock is written via the journal. We do not modify this path. If the file-system has no journal, this function, falls back to just marking the superblock as dirty using the 's_dirt' superblock flag. This means that it delays the actual superblock I/O submission by 5 seconds (default setting). Namely, the 'sync_supers()' kernel thread will call 'ext4_write_super()' later and will actually submit the superblock for I/O. And this is the behavior this patch modifies: we stop using 's_dirt' and just mark the superblock buffer as dirty right away. Indeed, all 3 cases above are extremely rare and it does not add any value to delay the I/O submission for them. Note: 'ext4_handle_dirty_super()' executes '__ext4_handle_dirty_super()' with 'now = 0'. This patch basically makes the 'now' argument unneeded and it will be deleted in one of the next patches. This patch also removes 's_dirt' condition on the unmount path because we never set it anymore, so we should not test it. Tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:33:31 -04:00
Jan Kara	044ce47fec	ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() The last user of ext4_mark_super_dirty() in ext4_file_open() is so rare it can well be modifying the superblock properly by journalling the change. Change it and get rid of ext4_mark_super_dirty() as it's not needed anymore. Artem: small amendments. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-22 20:31:31 -04:00
Jan Kara	97a7406880	ext4: remove useless marking of superblock dirty Commit `a0375156` properly notes that superblock doesn't need to be marked as dirty when only number of free inodes / blocks / number of directories changes since that is recomputed on each mount anyway. However that comment leaves some unnecessary markings as dirty in place. Remove these. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-22 20:29:31 -04:00
Al Viro	254706056b	ext4: fix ext4 mismerge back in January Duplicate caused, AFAICS, by mismerge in ff9cb1c4eead5e4c292e75cd3170a82d66944101> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-22 20:27:31 -04:00
Theodore Ts'o	3108b54bce	ext4: remove dynamic array size in ext4_chksum() The ext4_checksum() inline function was using a dynamic array size, which is not legal C. (It is a gcc extension). Remove it. Cc: "Darrick J. Wong" <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:25:31 -04:00
Theodore Ts'o	8a9918497b	ext4: remove unused variable in ext4_update_super() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:23:31 -04:00
Aditya Kali	7c319d3285	ext4: make quota as first class supported feature This patch adds support for quotas as a first class feature in ext4; which is to say, the quota files are stored in hidden inodes as file system metadata, instead of as separate files visible in the file system directory hierarchy. It is based on the proposal at: https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4 This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA which, when turned on, enables quota accounting at mount time iteself. Also, the quota inodes are stored in two additional superblock fields. Some changes introduced by this patch that should be pointed out are: 1) Two new ext4-superblock fields - s_usr_quota_inum and s_grp_quota_inum for storing the quota inodes in use. 2) Default quota inodes are: inode#3 for tracking userquota and inode#4 for tracking group quota. The superblock fields can be set to use other inodes as well. 3) If the QUOTA feature and corresponding quota inodes are set in superblock, the quota usage tracking is turned on at mount time. On 'quotaon' ioctl, the quota limits enforcement is turned on. 'quotaoff' ioctl turns off only the limits enforcement in this case. 4) When QUOTA feature is in use, the quota mount options 'quota', 'usrquota', 'grpquota' are ignored by the kernel. 5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize quota inodes. The default reserved inodes will not be visible to user as regular files. 6) The quota-tools will need to be modified to support hidden quota files on ext4. E2fsprogs will also include support for creating and fixing quota files. 7) Support is only for the new V2 quota file format. Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:21:31 -04:00
Zheng Liu	4bd809dbbf	ext4: don't take the i_mutex lock when doing DIO overwrites Aligned and overwrite direct I/O can be parallelized. In ext4_file_dio_write, we first check whether these conditions are satisfied or not. If so, we take i_data_sem and release i_mutex lock directly. Meanwhile iocb->private is set to indicate that this is a dio overwrite, and it will be handled in ext4_ext_direct_IO. [ Added fix from Dan Carpenter to fix locking bug on the error path. ] CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> CC: Robin Dong <hao.bigrat@gmail.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-22 20:19:31 -04:00
Al Viro	8cae6f7158	ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:55 +04:00
Al Viro	11e62a8fab	btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:43 +04:00
Al Viro	765927b2d5	switch dentry_open() to struct path, make it grab references itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:29 +04:00
Al Viro	3b8b487114	ecryptfs: don't reinvent the wheels, please - use struct completion ... and keep the sodding requests on stack - they are small enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:02 +04:00
Al Viro	8fc37ec54c	don't expose I_NEW inodes via dentry->d_inode d_instantiate(dentry, inode); unlock_new_inode(inode); is a bad idea; do it the other way round... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:58 +04:00
Al Viro	32a7991b6a	tidy up namei.c a bit locking/unlocking for rcu walk taken to a couple of inline helpers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:55 +04:00
Al Viro	3c0a616368	unobfuscate follow_up() a bit really convoluted test in there has grown up during struct mount introduction; what it checks is that we'd reached the root of mount tree.	2012-07-23 00:00:45 +04:00
Eric Sandeen	de9b942202	ext3: pass custom EOF to generic_file_llseek_size() Use the new custom EOF argument to generic_file_llseek_size so that SEEK_END will go to the max hash value for htree dirs in ext3 rather than to i_size_read() Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:30 +04:00
Eric Sandeen	ec7268ce21	ext4: use core vfs llseek code for dir seeks Use the new functionality in generic_file_llseek_size() to accept a custom EOF position, and un-cut-and-paste all the vfs llseek code from ext4. Also fix up comments on ext4_llseek() to reflect reality. Signed-off-by: Eric Sandeen <sandeen@redaht.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:28 +04:00
Eric Sandeen	e8b96eb503	vfs: allow custom EOF in generic_file_llseek code For ext3/4 htree directories, using the vfs llseek function with SEEK_END goes to i_size like for any other file, but in reality we want the maximum possible hash value. Recent changes in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c, but replicating this core code seems like a bad idea, especially since the copy has already diverged from the vfs. This patch updates generic_file_llseek_size to accept both a custom maximum offset, and a custom EOF position. With this in place, ext4_dir_llseek can pass in the appropriate maximum hash position for both maxsize and eof, and get what it wants. As far as I know, this does not fix any bugs - nfs in the kernel doesn't use SEEK_END, and I don't know of any user who does. But some ext4 folks seem keen on doing the right thing here, and I can't really argue. (Patch also fixes up some comments slightly) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:15 +04:00
Jan Kara	4ea425b63a	vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes wakeup_flusher_threads(0) will queue work doing complete writeback for each flusher thread. Thus there is not much point in submitting another work doing full inode WB_SYNC_NONE writeback by writeback_inodes_sb(). After this change it does not make sense to call nonblocking ->sync_fs and block device flush before calling sync_inodes_sb() because wakeup_flusher_threads() is completely asynchronous and thus these functions would be called in parallel with inode writeback running which will effectively void any work they do. So we move sync_inodes_sb() call before these two functions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:59:01 +04:00
Jan Kara	d0e91b13eb	vfs: Remove unnecessary flushing of block devices It is not necessary to write block devices twice. The reason why we first did flush and then proper sync is that for_each_bdev() { write_bdev() wait_for_completion() } is much slower than for_each_bdev() write_bdev() for_each_bdev() wait_for_completion() when there is bigger amount of data. But as is seen in the above, there's no real need to scan pages and submit them twice. We just need to separate the submission and waiting part. This patch does that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:53 +04:00
Jan Kara	a8c7176b6d	vfs: Make sys_sync writeout also block device inodes In case block device does not have filesystem mounted on it, sys_sync will just ignore it and doesn't writeout its dirty pages. This is because writeback code avoids writing inodes from superblock without backing device and blockdev_superblock is such a superblock. Since it's unexpected that sync doesn't writeout dirty data for block devices be nice to users and change the behavior to do so. So now we iterate over all block devices on blockdev_super instead of iterating over all superblocks when syncing block devices. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:49 +04:00
Jan Kara	5c0d6b60a0	vfs: Create function for iterating over block devices Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:45 +04:00
Jan Kara	b3de653105	vfs: Reorder operations during sys_sync Change the order of operations during sync from for_each_sb { writeback_inodes_sb(); sync_fs(nowait); __sync_blockdev(nowait); } for_each_sb { sync_inodes_sb(); sync_fs(wait); __sync_blockdev(wait); } to for_each_sb writeback_inodes_sb(); for_each_sb sync_fs(nowait); for_each_sb __sync_blockdev(nowait); for_each_sb sync_inodes_sb(); for_each_sb sync_fs(wait); for_each_sb __sync_blockdev(wait); This is a preparation for the following patches in this series. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:41 +04:00

... 6 7 8 9 10 ...

28620 Commits