OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Pavel Shilovsky	1208ef1f76	CIFS: Move query inode info code to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:07 +04:00
Pavel Shilovsky	2503a0dba9	CIFS: Add SMB2 support for is_path_accessible that needs for a successful mount through SMB2 protocol. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:05 +04:00
Pavel Shilovsky	68889f269b	CIFS: Move is_path_accessible to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:04 +04:00
Pavel Shilovsky	af4281dc22	CIFS: Move informational tcon calls to ops struct and rename variables in cifs_mount. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:02 +04:00
Pavel Shilovsky	b669f33ca6	CIFS: Move getting dfs referalls to ops struct Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:55:01 +04:00
Pavel Shilovsky	aa24d1e969	CIFS: Process reconnects for SMB2 shares Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:59 +04:00
Pavel Shilovsky	faaf946a7d	CIFS: Add tree connect/disconnect capability for SMB2 Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:58 +04:00
Pavel Shilovsky	5478f9ba9a	CIFS: Add session setup/logoff capability for SMB2 Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:57 +04:00
Pavel Shilovsky	ec2e4523fd	CIFS: Add capability to send SMB2 negotiate message and add negotiate request type to let set_credits know that we are only on negotiate stage and no need to make a decision about disabling echos and oplocks. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:55 +04:00
Pavel Shilovsky	3792c17328	CIFS: Respect SMB2 header/max header size Use SMB2 header size values for allocation and memset because they are bigger and suitable for both CIFS and SMB2. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:54 +04:00
Pavel Shilovsky	093b2bdad3	CIFS: Make demultiplex_thread work with SMB2 code Now we can process SMB2 messages: check message, get message id and wakeup awaiting routines. Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:52 +04:00
Pavel Shilovsky	4b1241006c	CIFS: Fix a wrong pointer in atomic_open Commit `30d9049474` caused a regression in cifs open codepath. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 21:54:26 +04:00
Pavel Shilovsky	28ea5290d7	CIFS: Add SMB2 credits support For SMB2 protocol we can add more than one credit for one received request: it depends on CreditRequest field in SMB2 response header. Also we divide all requests by type: echoes, oplocks and others. Each type uses its own slot pull. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:23 -05:00
Pavel Shilovsky	2dc7e1c033	CIFS: Make transport routines work with SMB2 Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:20 -05:00
Steve French	ddfbefbd39	CIFS: Map SMB2 status codes to POSIX errors Add mapping table for 32 bit SMB2 status codes to linux errors. Note that SMB2 does not use DOS/OS2 errors (ever) so mapping to DOS/OS2 errors as a common network subset (as we do for cifs) doesn't help. And note that the set of status codes is much more complete here. Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:15 -05:00
Pavel Shilovsky	b8030603d9	CIFS: Add SMB2 status codes Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:13 -05:00
Pavel Shilovsky	f7ec0d0bbc	CIFS: Rename 7 error codes to NT_ style and consider such codes as CIFS errors. Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:10 -05:00
Pavel Shilovsky	6d5786a34d	CIFS: Rename Get/FreeXid and make them work with unsigned int Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:08 -05:00
Pavel Shilovsky	2e6e02ab6d	CIFS: Move protocol specific tcon/tdis code to ops struct and rename variables around the code changes. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:06 -05:00
Pavel Shilovsky	58c45c58a1	CIFS: Move protocol specific session setup/logoff code to ops struct Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 10:25:03 -05:00
Cong Wang	2164d33446	pipe: remove KM_USER0 from comments Signed-off-by: Cong Wang <amwang@redhat.com>	2012-07-24 15:27:34 +08:00
Pavel Shilovsky	286170aa24	CIFS: Move protocol specific negotiate code to ops struct Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:33:26 -05:00
Pavel Shilovsky	a891f0f895	CIFS: Extend credit mechanism to process request type Split all requests to echos, oplocks and others - each group uses its own credit slot. This is indicated by new flags CIFS_ECHO_OP and CIFS_OBREAK_OP that are not used now for CIFS. This change is required to support SMB2 protocol because of different processing of these commands. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:32:48 -05:00
Pavel Shilovsky	316cf94a91	CIFS: Move trans2 processing to ops struct Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-24 00:32:25 -05:00
Linus Torvalds	83c7f72259	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Notable highlights: - iommu improvements from Anton removing the per-iommu global lock in favor of dividing the DMA space into pools, each with its own lock, and hashed on the CPU number. Along with making the locking more fine grained, this gives significant improvements in multiqueue networking scalability. - Still from Anton, we know provide a vdso based variant of getcpu which makes sched_getcpu with the appropriate glibc patch something like 18 times faster. - More anton goodness (he's been busy !) in other areas such as a faster __clear_user and copy_page on P7, various perf fixes to improve sampling quality, etc... - One more step toward removing legacy i2c interfaces by using new device-tree based probing of platform devices for the AOA audio drivers - A nice series of patches from Michael Neuling that helps avoiding confusion between register numbers and litterals in assembly code, trying to enforce the use of "%rN" register names in gas rather than plain numbers. - A pile of FSL updates - The usual bunch of small fixes, cleanups etc... You may spot a change to drivers/char/mem. The patch got no comment or ack from outside, it's a trivial patch to allow the architecture to skip creating /dev/port, which we use to disable it on ppc64 that don't have a legacy brige. On those, IO ports 0...64K are not mapped in kernel space at all, so accesses to /dev/port cause oopses (and yes, distros -still- ship userspace that bangs hard coded ports such as kbdrate)." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/mpic: Create a revmap with enough entries for IPIs and timers Remove stale .rej file powerpc/iommu: Fix iommu pool initialization powerpc/eeh: Check handle_eeh_events() return value powerpc/85xx: Add phy nodes in SGMII mode for MPC8536/44/72DS & P2020DS powerpc/e500: add paravirt QEMU platform powerpc/mpc85xx_ds: convert to unified PCI init powerpc/fsl-pci: get PCI init out of board files powerpc/85xx: Update corenet64_smp_defconfig powerpc/85xx: Update corenet32_smp_defconfig powerpc/85xx: Rename P1021RDB-PC device trees to be consistent powerpc/watchdog: move booke watchdog param related code to setup-common.c sound/aoa: Adapt to new i2c probing scheme i2c/powermac: Improve detection of devices from device-tree powerpc: Disable /dev/port interface on systems without an ISA bridge of: Improve prom_update_property() function powerpc: Add "memory" attribute for mfmsr() powerpc/ftrace: Fix assembly trampoline register usage powerpc/hw_breakpoints: Fix incorrect pointer access powerpc: Put the gpr save/restore functions in their own section ...	2012-07-23 18:54:23 -07:00
Jeff Layton	7659624ffb	cifs: reinstate sec=ntlmv2 mount option sec=ntlmv2 as a mount option got dropped in the mount option overhaul. Cc: Sachin Prabhu <sprabhu@redhat.com> Cc: <stable@vger.kernel.org> # 3.4+ Reported-by: Günter Kukkukk <linux@kukkukk.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 20:48:24 -05:00
Linus Torvalds	93912fe69d	* Added another debugfs knob for forcing UBIFS R/O mode without flushing caches or finishing commit or any other I/O operation. I've originally added this knob in order to reproduce the free space fixup bug (see `c672793`) on nandsim. Without this knob I would have to do real power-cuts, which would make debugging much harder. Then I've decided to keep this knob because it is also useful for UBIFS power-cut recovery end error-paths testing. * Well-spotted fix from Julia. This bug did not cause real troubles for UBIFS, but nevertheless it could cause issues for someone trying to modify the orphans handling code. Kudos to coccinelle! * Minor cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQDQxSAAoJECmIfjd9wqK0W3QQAKCbDBNMJaj/idXcupatoOI2 4M3IBJ5mwfjrBeXy1kcsxxSadW7fJTy6FK7jfPpYq2iCLmFkh0Ba6QSgHHgCmb1v 3ExIR2+2hLl1YhliDU1q+k9H4fGDklq/CNgiTGDti0pdPczZfBKrksXQy8leqT8d MxzQDzQsCxVKIizHHMEr373s1y8QHhabFK1LU1wkXJK/p/bYS0mgD0BOjIrZKuCd W5L5XzIIbcXs1h91Lgdnr8fj5TmMtM1tQ890U2OaNPt+44Eh6Gjjku3N8RaWvaIe vXMpDNb00NwtJp0V2SIsez3VtaA1xJaW7bd4UcyNNvBy5aECNx8K0IbZKTUkiMSj 3xOiqZr/LWpPkAq3NL6kAZkgiam9Aez5Yc+XTna3HP79L/LYbXsJiOewlDwvGbrh 89+EatjLUVFh00BvzmmxuZkm2bWm3sPQiAKWqwc7ijMGwnPt+c6Eiepgw3XXVHgI qmw1Ddkh4VxdGXi8r3RMxg21MqFttM0astT5cmQeF//H6Wq3S2u8RFgZqaLpGc6s vcgx92V6BBrWzCZKxBQs3Bd21uSrVsOa/js5wORpuWZAGk1Yy1BFy1vFC0PuqmcI kvZa/bQPP3/PiHAI8qq2Min5baoG9jUl6ui1eM8KXECkzQSQJjxN1bth8kzZHuPY HMNoK0RVUt3V5xtYI7i7 =ecyI -----END PGP SIGNATURE----- Merge tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs Pull UBIFS updates from Artem Bityutskiy: - Added another debugfs knob for forcing UBIFS R/O mode without flushing caches or finishing commit or any other I/O operation. I've originally added this knob in order to reproduce the free space fixup bug (see commit c6727932cfdb: "UBIFS: fix a bug in empty space fix-up") on nandsim. Without this knob I would have to do real power-cuts, which would make debugging much harder. Then I've decided to keep this knob because it is also useful for UBIFS power-cut recovery end error-paths testing. - Well-spotted fix from Julia. This bug did not cause real troubles for UBIFS, but nevertheless it could cause issues for someone trying to modify the orphans handling code. Kudos to coccinelle! - Minor cleanups. * tag 'upstream-3.6-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: remove invalid reference to list iterator variable UBIFS: simplify reply code a bit UBIFS: add debugfs knob to switch to R/O mode UBIFS: fix compilation warning	2012-07-23 15:50:52 -07:00
Jeff Layton	762a4206a3	cifs: rename cifs_sign_smb2 to cifs_sign_smbv "smb2" makes me think of the SMB2.x protocol, which isn't at all what this function is for... Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:34 -05:00
Jeff Layton	d971e0656b	cifs: remove bogus reset of smb_buf_length in smb_send routines There's a comment here about how we don't want to modify this length, but nothing in this function actually does. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:31 -05:00
Jeff Layton	c5fd363d77	cifs: move file_lock off stack in cifs_push_posix_locks struct file_lock is pretty large, so we really don't want that on the stack in a potentially long call chain. Reorganize the arguments to CIFSSMBPosixLock to eliminate the need for that. Eliminate the get_flag and simply use a non-NULL pLockInfo to indicate that this is a "get" operation. In order to do that, need to add a new loff_t argument for the start_offset. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:29 -05:00
Jeff Layton	ac3aa2f8ae	cifs: remove extraneous newlines from cERROR and cFYI calls Those macros add a newline on their own, so there's not any need to embed one in the message itself. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:26 -05:00
Jeff Layton	00401ff780	cifs: after upcalling for krb5 creds, invalidate key rather than revoking it Calling key_revoke here isn't ideal as further requests for the key will end up returning -EKEYREVOKED until it gets purged from the cache. What we really intend here is to force a new upcall on the next request_key. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-23 16:36:24 -05:00
Liu Bo	67c9684f48	Btrfs: improve multi-thread buffer read While testing with my buffer read fio jobs[1], I find that btrfs does not perform well enough. Here is a scenario in fio jobs: We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file, and all of them will race on add_to_page_cache_lru(), and if one thread successfully puts its page into the page cache, it takes the responsibility to read the page's data. And what's more, reading a page needs a period of time to finish, in which other threads can slide in and process rest pages: t1 t2 t3 t4 add Page1 read Page1 add Page2 \| read Page2 add Page3 \| \| read Page3 add Page4 \| \| \| read Page4 -----\|------------\|-----------\|-----------\|-------- v v v v bio bio bio bio Now we have four bios, each of which holds only one page since we need to maintain consecutive pages in bio. Thus, we can end up with far more bios than we need. Here we're going to a) delay the real read-page section and b) try to put more pages into page cache. With that said, we can make each bio hold more pages and reduce the number of bios we need. Here is some numbers taken from fio results: w/o patch w patch ------------- -------- --------------- READ: 745MB/s +25% 934MB/s [1]: [global] group_reporting thread numjobs=4 bs=32k rw=read ioengine=sync directory=/mnt/btrfs/ [READ] filename=foobar size=2000M invalidate=1 Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:10 -04:00
Liu Bo	df57dbe6bf	Btrfs: make btrfs's allocation smoothly with preallocation For backref walking, we've introduce delayed ref's sequence. However, it changes our preallocation behavior. The story is that when we preallocate an extent and then mark it written piece by piece, the ideal case should be that we don't need to COW the extent, which is why we use 'preallocate'. But we may not make use of preallocation, since when we check for cross refs on the extent, we may have two ref entries which have the same content except the sequence value, and we recognize them as cross refs and do COW to allocate another extent. So we end up with several pieces of space instead of an whole extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:10 -04:00
Josef Bacik	51561ffec9	Btrfs: lock the transition from dirty to writeback for an eb There is a small window where an eb can have no IO bits set on it, which could potentially result in extent_buffer_under_io() returning false when we want it to return true, which could result in not fun things happening. So in order to protect this case we need to hold the refs_lock when we make this transition to make sure we get reliable results out of extent_buffer_udner_io(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:09 -04:00
Josef Bacik	594831c4b2	Btrfs: fix potential race in extent buffer freeing This sounds sort of impossible but it is the only thing I can think of and at the very least it is theoretically possible so here it goes. If we are in try_release_extent_buffer we will check that the ref count on the extent buffer is 1 and not under IO, and then go down and clear the tree ref. If between this check and clearing the tree ref somebody else comes in and grabs a ref on the eb and the marks it dirty before try_release_extent_buffer() does it's tree ref clear we can end up with a dirty eb that will be freed while it is still dirty which will result in a panic. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:09 -04:00
Josef Bacik	e64860aa05	Btrfs: don't return true in releasepage unless we actually freed the eb I noticed while looking at an extent_buffer race that we will unconditionally return 1 if we get down to release_extent_buffer after clearing the tree ref. However we can easily race in here and get a ref on the eb and not actually free the eb. So make release_extent_buffer return 1 if it free'd the eb and 0 if not so we can be a little kinder to the vm. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:08 -04:00
Stefan Behrens	a98cdb85b9	Btrfs: suppress printk() if all device I/O stats are zero Code is added to suppress the I/O stats printing at mount time if all statistic values are zero. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-07-23 16:28:07 -04:00
Stefan Behrens	5021976d8d	Btrfs: remove unwanted printk() for btrfs device I/O stats People complained about the annoying kernel log message "btrfs: no dev_stats entry found ... (OK on first mount after mkfs)" everytime a filesystem is mounted for the first time after running mkfs. Since the distribution of the btrfs-progs is not synchronized to the kernel version, mkfs like it is now will be used also in the future. Then this message is not useful to find errors, it is just annoying. This commit removes the printk(). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>	2012-07-23 16:28:07 -04:00
Li Zefan	18077bb413	Btrfs: rewrite BTRFS_SETGET_FUNCS BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and btrfs_foo() functions, which read and write specific fields in the extent buffer. The total number of set/get functions is ~200, but in fact we only need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64. It results in redunction of ~37K bytes. text data bss dec hex filename 629661 12489 216 642366 9cd3e fs/btrfs/btrfs.o.orig 592637 12489 216 605342 93c9e fs/btrfs/btrfs.o Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:06 -04:00
Li Zefan	293f7e0740	Btrfs: zero unused bytes in inode item The otime field is not zeroed, so users will see random otime in an old filesystem with a new kernel which has otime support in the future. The reserved bytes are also not zeroed, and we'll have compatibility issue if we make use of those bytes. Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:05 -04:00
Li Zefan	b4d7c3c945	Btrfs: kill free_space pointer from inode structure Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type, which means every inode has the same BTRFS_I(inode)->free_space pointer. This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits). Signed-off-by: Li Zefan <lizefan@huawei.com>	2012-07-23 16:28:05 -04:00
Anand Jain	d5b025d510	btrfs read error corrected message floods the console during recovery Changing printk_in_rcu to printk_ratelimited_in_rcu will suffice Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:04 -04:00
Jan Schmidt	e6466e354a	Btrfs: fix buffer leak in btrfs_next_old_leaf When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:03 -04:00
Liu Bo	f6175efab1	Btrfs: do not count in readonly bytes If a block group is ro, do not count its entries in when we dump space info. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:03 -04:00
Liu Bo	799ffc3c31	Btrfs: add ro notification to dump_space_info Block group has ro attributes, make dump_space_info show it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:02 -04:00
Liu Bo	cf7c1ef6e1	Btrfs: fix a bug of writting free space cache during balance Here is the whole story: 1) A free space cache consists of two parts: o free space cache inode, which is special becase it's stored in root tree. o free space info, which is stored as the above inode's file data. But we only build up another new inode and does not flush its free space info onto disk when we _clear and setup_ free space cache, and this ends up with that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN. And holding DC_SETUP means that we will not truncate this free space cache inode, which means the disk offset of its file extent will remain _unchanged_ at least until next transaction finishes committing itself. 2) We can set a block group readonly when we relocate the block group. However, if the readonly block group covers the disk offset where our free space cache inode is going to write, it will force the free space cache inode into cow_file_range() and it'll end up hitting a BUG_ON. 3) Due to the above analysis, we fix this bug by adding the missing dirty flag. 4) However, it's not over, there is still another case, nospace_cache. With nospace_cache, we do not want to set dirty flag, instead we just truncate free space cache inode and bail out with setting cache state DC_WRITTEN. We can benifit from it since it saves us another 'pre-allocation' part which usually costs a lot. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:02 -04:00
Liu Bo	0678938423	Btrfs: do not abort transaction in prealloc case During disk balance, we prealloc new file extent for file data relocation, but we may fail in 'no available space' case, and it leads to flipping btrfs into readonly. It is not necessary to bail out and abort transaction since we do have several ways to rescue ourselves from ENOSPC case. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:01 -04:00
Liu Bo	83eea1f1ba	Btrfs: kill root from btrfs_is_free_space_inode Since root can be fetched via BTRFS_I macro directly, we can save an args for btrfs_is_free_space_inode(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:00 -04:00
Liu Bo	51a8cf9d2d	Btrfs: fix btrfs_is_free_space_inode to recognize btree inode For btree inode, its root is also 'tree root', so btree inode can be misunderstood as a free space inode. We should add one more check for btree inode. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:00 -04:00
Stefan Behrens	c0901581ad	Btrfs: avoid I/O repair BUG() from btree_read_extent_buffer_pages() From btree_read_extent_buffer_pages(), currently repair_io_failure() can be called with mirror_num being zero when submit_one_bio() returned an error before. This used to cause a BUG_ON(!mirror_num) in repair_io_failure() and indeed this is not a case that needs the I/O repair code to rewrite disk blocks. This commit prevents calling repair_io_failure() in this case and thus avoids the BUG_ON() and malfunction. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:59 -04:00
Josef Bacik	f4c738c2e7	Btrfs: rework shrink_delalloc So shrink_delalloc has grown all sorts of cruft over the years thanks to many reworkings of how we track enospc. What happens now as we fill up the disk is we will loop for freaking ever hoping to reclaim a arbitrary amount of space of metadata, this was from when everybody flushed at the same time. Now we only have people flushing one at a time. So instead of trying to reclaim a huge amount of space, just try to flush a decent chunk of space, and stop looping as soon as we have enough free space to satisfy our reservation. This makes xfstests 224 go much faster. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:58 -04:00
Liu Bo	b9ca0664dc	Btrfs: do not set subvolume flags in readonly mode $ mkfs.btrfs /dev/sdb7 $ btrfstune -S1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs mount: block device /dev/sdb7 is write-protected, mounting read-only $ btrfs dev add /dev/sdb8 /mnt/btrfs/ Now we get a btrfs in which mnt flags has readonly but sb flags does not. So for those ioctls that only check sb flags with MS_RDONLY, it is going to be a problem. Setting subvolume flags is such an ioctl, we should use mnt_want_write_file() to check RO flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:58 -04:00
Liu Bo	e54bfa3104	Btrfs: use mnt_want_write_file instead of mnt_want_write mnt_want_write_file is faster when file has been opened for write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:57 -04:00
Liu Bo	768e9dfe82	Btrfs: remove redundant r/o check for superblock mnt_want_write() and mnt_want_write_file() will check sb->s_flags with MS_RDONLY, and we don't need to do it ourselves. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
Liu Bo	a874a63e13	Btrfs: check write access to mount earlier while creating snapshots Move check of write access to mount into upper functions so that we can use mnt_want_write_file instead, which is faster than mnt_want_write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-07-23 16:27:56 -04:00
Liu Bo	287082b0bd	Btrfs: fix typo in cow_file_range_async and async_cow_submit It should be 10 * 1024 * 1024. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-23 16:27:55 -04:00
Josef Bacik	0e72110692	Btrfs: change how we indicate we're adding csums There is weird logic I had to put in place to make sure that when we were adding csums that we'd used the delalloc block rsv instead of the global block rsv. Part of this meant that we had to free up our transaction reservation before we ran the delayed refs since csum deletion happens during the delayed ref work. The problem with this is that when we release a reservation we will add it to the global reserve if it is not full in order to keep us going along longer before we have to force a transaction commit. By releasing our reservation before we run delayed refs we don't get the opportunity to drain down the global reserve for the work we did, so we won't refill it as often. This isn't a problem per-se, it just results in us possibly committing transactions more and more often, and in rare cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran out of space in our block rsv. This also helps us by holding onto space while the delayed refs run so we don't end up with as many people trying to do things at the same time, which again will help us not force commits or hit the use_block_rsv warnings. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:55 -04:00
Tsutomu Itoh	b995929515	Btrfs: return error of btrfs_update_inode() to caller We didn't check error of btrfs_update_inode(), but that error looks easy to bubble back up. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:54 -04:00
Dan Carpenter	23291a044c	Btrfs: fix error handling in __add_reloc_root() We dereferenced "node" in the error message after freeing it. Also btrfs_panic() can return so we should return an error code instead of continuing. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-23 16:27:53 -04:00
Ilya Dryomov	44c44af2f4	Btrfs: do not ignore errors from btrfs_cleanup_fs_roots() when mounting There used to be a BUG_ON(ret) there before EH patch (`79787eaa`) went in. Bail out with EINVAL. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-07-23 16:27:53 -04:00
Ilya Dryomov	fed425c742	Btrfs: do not return EINVAL instead of ENOMEM from open_ctree() When bailing from open_ctree() err is returned, not ret. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2012-07-23 16:27:52 -04:00
Josef Bacik	02db0844be	Btrfs: add DEVICE_READY ioctl This will be used in conjunction with btrfs device ready <dev>. This is needed for initrd's to have a nice and lightweight way to tell if all of the devices needed for a file system are in the cache currently. This keeps them from having to do mount+sleep loops waiting for devices to show up. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:27:42 -04:00
J. Bruce Fields	0ec4f431eb	locks: fix checking of fcntl_setlease argument The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually don't cause problems in mainline, as of Dave Jones's commit `8d657eb3b4` "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-23 12:46:01 -07:00
Josef Bacik	96c3f4331a	Btrfs: flush delayed inodes if we're short on space Those crazy gentoo guys have been complaining about ENOSPC errors on their portage volumes. This is because doing things like untar tends to create lots of new files which will soak up all the reservation space in the delayed inodes. Usually this gets papered over by the fact that we will try and commit the transaction, however if this happens in the wrong spot or we choose not to commit the transaction you will be screwed. So add the ability to expclitly flush delayed inodes to free up space. Please test this out guys to make sure it works since as usual I cannot reproduce. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 15:41:40 -04:00
David Sterba	b27f7c0c15	btrfs: join DEV_STATS ioctls to one Commit `c11d2c236c` (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: David Sterba <dsterba@suse.cz>	2012-07-23 15:41:40 -04:00
Andrew Mahone	a43a211133	btrfs: ignore unfragmented file checks in defrag when compression enabled - rebased Rebased on btrfs-next and retested. Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip checks for adjacent extents and extent size when deciding whether to defrag, as these can prevent an uncompressed and unfragmented file from being compressed as requested. Signed-off-by: Andrew Mahone <andrew.mahone@gmail.com>	2012-07-23 15:41:39 -04:00
Dan Carpenter	e4b50e14c8	Btrfs: small naming cleanup in join_transaction() "root->fs_info" and "fs_info" are the same, but "fs_info" is prefered because it is shorter and that's what is used in the rest of the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-23 15:41:39 -04:00
Alexander Block	2bc5565286	Btrfs: don't update atime on RO subvolumes Before the update_time inode operation was indroduced, it was not possible to prevent updates of atime on RO subvolumes. VFS was only able to check for RO on the mount, but did not know anything about btrfs subvolumes. btrfs_update_time does now check if the root is RO and skip updating of times. Signed-off-by: Alexander Block <ablock84@googlemail.com>	2012-07-23 15:41:38 -04:00
Arnd Hannemann	063849eafd	Btrfs: allow mount -o remount,compress=no Btrfs allows to turn on compression on a mounted and used filesystem by issuing mount -o remount,compress=lzo. This patch allows to turn compression off again while the filesystem is mounted. As suggested by David Sterba if the compress-force option was set, it is implicitly cleared if compression is turned off. Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Arnd Hannemann <arnd@arndnet.de>	2012-07-23 15:41:38 -04:00
Josef Bacik	c5c3c5f31e	Btrfs: remove ->dirty_inode We do all of our inode updating when we change it, and now that we do ->update_time we don't need ->dirty_inode for atime updates anymore, so just remove it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-07-23 15:41:38 -04:00
Chris Mason	cbea5ac1ee	Btrfs: reduce calls to wake_up on uncontended locks The btrfs locks were unconditionally calling wake_up as the locks were released. This lead to extra thrashing on the waitqueue, especially for locks that were dominated by readers. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-23 15:36:18 -04:00
Chris Mason	e39e64ac0c	Btrfs: don't wait around for new log writers on an SSD Waiting on spindles improves performance, but ssds want all the IO as quickly as we can push it down. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-07-23 15:36:17 -04:00
Linus Torvalds	a66d2c8f7e	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull the big VFS changes from Al Viro: "This one is big and changes quite a few things around VFS. What's in there: - the first of two really major architecture changes - death to open intents. The former is finally there; it was very long in making, but with Miklos getting through really hard and messy final push in fs/namei.c, we finally have it. Unlike his variant, this one doesn't introduce struct opendata; what we have instead is ->atomic_open() taking preallocated struct file * and passing everything via its fields. Instead of returning struct file , it returns -E... on error, 0 on success and 1 in "deal with it yourself" case (e.g. symlink found on server, etc.). See comments before fs/namei.c:atomic_open(). That made a lot of goodies finally possible and quite a few are in that pile: ->lookup(), ->d_revalidate() and ->create() do not get struct nameidata anymore; ->lookup() and ->d_revalidate() get lookup flags instead, ->create() gets "do we want it exclusive" flag. With the introduction of new helper (kern_path_locked()) we are rid of all struct nameidata instances outside of fs/namei.c; it's still visible in namei.h, but not for long. Come the next cycle, declaration will move either to fs/internal.h or to fs/namei.c itself. [me, miklos, hch] - The second major change: behaviour of final fput(). Now we have __fput() done without any locks held by caller and not from deep in call stack. That obviously lifts a lot of constraints on the locking in there. Moreover, it's legal now to call fput() from atomic contexts (which has immediately simplified life for aio.c). We also don't need anti-recursion logics in __scm_destroy() anymore. There is a price, though - the damn thing has become partially asynchronous. For fput() from normal process we are guaranteed that pending __fput() will be done before the caller returns to userland, exits or gets stopped for ptrace. For kernel threads and atomic contexts it's done via schedule_work(), so theoretically we might need a way to make sure it's finished; so far only one such place had been found, but there might be more. There's flush_delayed_fput() (do all pending __fput()) and there's __fput_sync() (fput() analog doing __fput() immediately). I hope we won't need them often; see warnings in fs/file_table.c for details. [me, based on task_work series from Oleg merged last cycle] - sync series from Jan - large part of "death to sync_supers()" work from Artem; the only bits missing here are exofs and ext4 ones. As far as I understand, those are going via the exofs and ext4 trees resp.; once they are in, we can put ->write_super() to the rest, along with the thread calling it. - preparatory bits from unionmount series (from dhowells). - assorted cleanups and fixes all over the place, as usual. This is not the last pile for this cycle; there's at least jlayton's ESTALE work and fsfreeze series (the latter - in dire need of fixes, so I'm not sure it'll make the cut this cycle). I'll probably throw symlink/hardlink restrictions stuff from Kees into the next pile, too. Plus there's a lot of misc patches I hadn't thrown into that one - it's large enough as it is..." * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits) ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() switch dentry_open() to struct path, make it grab references itself spufs: shift dget/mntget towards dentry_open() zoran: don't bother with struct file * in zoran_map ecryptfs: don't reinvent the wheels, please - use struct completion don't expose I_NEW inodes via dentry->d_inode tidy up namei.c a bit unobfuscate follow_up() a bit ext3: pass custom EOF to generic_file_llseek_size() ext4: use core vfs llseek code for dir seeks vfs: allow custom EOF in generic_file_llseek code vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes vfs: Remove unnecessary flushing of block devices vfs: Make sys_sync writeout also block device inodes vfs: Create function for iterating over block devices vfs: Reorder operations during sys_sync quota: Move quota syncing to ->sync_fs method quota: Split dquot_quota_sync() to writeback and cache flushing part vfs: Move noop_backing_dev_info check from sync into writeback ...	2012-07-23 12:27:27 -07:00
Cong Wang	906adea153	jbd2: remove the second argument of kmap_atomic Signed-off-by: Cong Wang <amwang@redhat.com>	2012-07-23 14:11:22 +08:00
Prasad Joshi	9f0bbd8ca7	logfs: query block device for number of pages to send with bio The block device driver puts a limit on maximum number of pages that can be sent with the bio. Not all block devices can handle BIO_MAX_PAGES number of pages in bio. Specifically the virtio-blk diriver limits it to 126. When the LogFS file system was excersized in KVM, the following bug from do_virtblk_request() was observed static void do_virtblk_request(struct request_queue *q) { .... .... while ((req = blk_peek_request(q)) != NULL) { BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); .... .... } .... } The patch fixes the problem by querring the maximum number of pages in bio allowed from block device driver and then using those many pages during submit_bio. Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 10:32:11 +05:30
Prasad Joshi	41b93bc1ee	logfs: maintain the ordering of meta-inode destruction LogFS does not use a specialized area to maintain the inodes. The inodes information is kept in a specialized file called inode file. Similarly, the segment information is kept in a segment file. Since the segment file also has an inode which is kept in the inode file, the inode for segment file must be evicted before the inode for inode file. The change fixes the following BUG during unmount Pid: 2057, comm: umount Not tainted 3.5.0-rc6+ #25 Bochs Bochs RIP: 0010:[<ffffffffa005c5f2>] [<ffffffffa005c5f2>] move_page_to_btree+0x32/0x1f0 [logfs] Process umount (pid: 2057, threadinfo ...) Call Trace: [<ffffffff8112adca>] ? find_get_pages+0x2a/0x180 [<ffffffffa00549f5>] logfs_invalidatepage+0x85/0x90 [logfs] [<ffffffff81136c51>] truncate_inode_page+0xb1/0xd0 [<ffffffff81136dcf>] truncate_inode_pages_range+0x15f/0x490 [<ffffffff81558549>] ? printk+0x78/0x7a [<ffffffff81137185>] truncate_inode_pages+0x15/0x20 [<ffffffffa005b7fc>] logfs_evict_inode+0x6c/0x190 [logfs] [<ffffffff8155c75b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff8119e3d7>] evict+0xa7/0x1b0 [<ffffffff8119ea6e>] dispose_list+0x3e/0x60 [<ffffffff8119f1c4>] evict_inodes+0xf4/0x110 [<ffffffff81185b53>] generic_shutdown_super+0x53/0xf0 [<ffffffffa005d8f2>] logfs_kill_sb+0x52/0xf0 [logfs] [<ffffffff81185ec5>] deactivate_locked_super+0x45/0x80 [<ffffffff81186a4a>] deactivate_super+0x4a/0x70 [<ffffffff811a228e>] mntput_no_expire+0xde/0x140 [<ffffffff811a30ff>] sys_umount+0x6f/0x3a0 [<ffffffff8155d8e9>] system_call_fastpath+0x16/0x1b ---[ end trace 45f7752082cefafd ]--- Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 09:35:52 +05:30
Theodore Ts'o	03179fe923	ext4: undo ext4_calc_metadata_amount if we fail to claim space The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-23 00:00:20 -04:00
Brian Foster	97795d2a5b	ext4: don't let i_reserved_meta_blocks go negative If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-22 23:59:40 -04:00
Prasad Joshi	ddb24bbac3	logfs: create a pagecache page if it is not present While writing the partial journal entries we assumed that the page associated with the journal would always in locatable. This incorrect assumption resulted in the following BUG kernel BUG at /home/benixon/WD_SMR/kernels/linux-3.3.7-logfs/fs/logfs/journal.c:569! EIP is at logfs_write_area+0xb6/0x109 [logfs] EAX: 00000000 EBX: 00000000 ECX: ef6efea4 EDX: 00000000 ESI: 001b9000 EDI: f009e000 EBP: c3c13f14 ESP: c3c13ef0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process sync (pid: 1799, ti=c3c12000 task=f07825b0 task.ti=c3c12000) Stack: 01001000 c3c13f26 781b9000 00000000 f009e000 f7286000 f1f83400 f8445071 f1f83400 c3c13f30 f8445ae9 c3c13f20 0000100a 000ee000 f009e000 00000001 c3c13f5c f8445d17 c05eb0ee 00000000 f1f83400 ef718000 f009e25c ea9c3d80 Call Trace: [<f8445071>] ? account_shadow+0x16d/0x16d [logfs] [<f8445ae9>] logfs_write_je+0x2a/0x44 [logfs] [<f8445d17>] logfs_write_anchor+0x114/0x228 [logfs] [<c05eb0ee>] ? empty+0x5/0x5 [<f8444522>] logfs_sync_fs+0x1e/0x31 [logfs] [<c051be66>] __sync_filesystem+0x5d/0x6f [<c051be8d>] sync_one_sb+0x15/0x17 [<c04ff8b0>] iterate_supers+0x59/0x9a [<c051be78>] ? __sync_filesystem+0x6f/0x6f [<c051befc>] sys_sync+0x29/0x4f [<c084285f>] sysenter_do_call+0x12/0x28 EIP: [<f8445127>] logfs_write_area+0xb6/0x109 [logfs] SS:ESP 0068:c3c13ef0 ---[ end trace ef6e9ef52601a945 ]--- The fix is to create the pagecache page if it is not locatable. Reported-and-tested-by: Benixon Dhas <Benixon.Dhas@wdc.com> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-07-23 09:18:14 +05:30
Ashish Sangwan	968dee7722	ext4: fix hole punch failure when depth is greater than 0 Whether to continue removing extents or not is decided by the return value of function ext4_ext_more_to_rm() which checks 2 conditions: a) if there are no more indexes to process. b) if the number of entries are decreased in the header of "depth -1". In case of hole punch, if the last block to be removed is not part of the last extent index than this index will not be deleted, hence the number of valid entries in the extent header of "depth - 1" will remain as it is and ext4_ext_more_to_rm will return 0 although the required blocks are not yet removed. This patch fixes the above mentioned problem as instead of removing the extents from the end of file, it starts removing the blocks from the particular extent from which removing blocks is actually required and continue backward until done. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org	2012-07-22 22:49:08 -04:00
Artem Bityutskiy	b50924c2c6	ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() The '__ext4_handle_dirty_metadata()' does not need the 'now' argument anymore and we can kill it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:37:31 -04:00
Artem Bityutskiy	4d47603d97	ext4: weed out ext4_write_super We do not depend on VFS's '->write_super()' anymore and do not need the 's_dirt' flag anymore, so weed out 'ext4_write_super()' and 's_dirt'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:35:31 -04:00
Artem Bityutskiy	58c5873a76	ext4: remove unnecessary superblock dirtying This patch changes the 'ext4_handle_dirty_super()' function which submits the superblock for I/O in the following cases: 1. When creating the first large file on a file system without EXT4_FEATURE_RO_COMPAT_LARGE_FILE feature. 2. When re-sizing the file-system. 3. When creating an xattr on a file-system without the EXT4_FEATURE_COMPAT_EXT_ATTR feature. If the file-system has journal enabled, the superblock is written via the journal. We do not modify this path. If the file-system has no journal, this function, falls back to just marking the superblock as dirty using the 's_dirt' superblock flag. This means that it delays the actual superblock I/O submission by 5 seconds (default setting). Namely, the 'sync_supers()' kernel thread will call 'ext4_write_super()' later and will actually submit the superblock for I/O. And this is the behavior this patch modifies: we stop using 's_dirt' and just mark the superblock buffer as dirty right away. Indeed, all 3 cases above are extremely rare and it does not add any value to delay the I/O submission for them. Note: 'ext4_handle_dirty_super()' executes '__ext4_handle_dirty_super()' with 'now = 0'. This patch basically makes the 'now' argument unneeded and it will be deleted in one of the next patches. This patch also removes 's_dirt' condition on the unmount path because we never set it anymore, so we should not test it. Tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-07-22 20:33:31 -04:00
Jan Kara	044ce47fec	ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() The last user of ext4_mark_super_dirty() in ext4_file_open() is so rare it can well be modifying the superblock properly by journalling the change. Change it and get rid of ext4_mark_super_dirty() as it's not needed anymore. Artem: small amendments. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-22 20:31:31 -04:00
Jan Kara	97a7406880	ext4: remove useless marking of superblock dirty Commit `a0375156` properly notes that superblock doesn't need to be marked as dirty when only number of free inodes / blocks / number of directories changes since that is recomputed on each mount anyway. However that comment leaves some unnecessary markings as dirty in place. Remove these. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-22 20:29:31 -04:00
Al Viro	254706056b	ext4: fix ext4 mismerge back in January Duplicate caused, AFAICS, by mismerge in ff9cb1c4eead5e4c292e75cd3170a82d66944101> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-07-22 20:27:31 -04:00
Theodore Ts'o	3108b54bce	ext4: remove dynamic array size in ext4_chksum() The ext4_checksum() inline function was using a dynamic array size, which is not legal C. (It is a gcc extension). Remove it. Cc: "Darrick J. Wong" <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:25:31 -04:00
Theodore Ts'o	8a9918497b	ext4: remove unused variable in ext4_update_super() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:23:31 -04:00
Aditya Kali	7c319d3285	ext4: make quota as first class supported feature This patch adds support for quotas as a first class feature in ext4; which is to say, the quota files are stored in hidden inodes as file system metadata, instead of as separate files visible in the file system directory hierarchy. It is based on the proposal at: https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4 This patch introduces a new feature - EXT4_FEATURE_RO_COMPAT_QUOTA which, when turned on, enables quota accounting at mount time iteself. Also, the quota inodes are stored in two additional superblock fields. Some changes introduced by this patch that should be pointed out are: 1) Two new ext4-superblock fields - s_usr_quota_inum and s_grp_quota_inum for storing the quota inodes in use. 2) Default quota inodes are: inode#3 for tracking userquota and inode#4 for tracking group quota. The superblock fields can be set to use other inodes as well. 3) If the QUOTA feature and corresponding quota inodes are set in superblock, the quota usage tracking is turned on at mount time. On 'quotaon' ioctl, the quota limits enforcement is turned on. 'quotaoff' ioctl turns off only the limits enforcement in this case. 4) When QUOTA feature is in use, the quota mount options 'quota', 'usrquota', 'grpquota' are ignored by the kernel. 5) mke2fs or tune2fs can be used to set the QUOTA feature and initialize quota inodes. The default reserved inodes will not be visible to user as regular files. 6) The quota-tools will need to be modified to support hidden quota files on ext4. E2fsprogs will also include support for creating and fixing quota files. 7) Support is only for the new V2 quota file format. Tested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-22 20:21:31 -04:00
Zheng Liu	4bd809dbbf	ext4: don't take the i_mutex lock when doing DIO overwrites Aligned and overwrite direct I/O can be parallelized. In ext4_file_dio_write, we first check whether these conditions are satisfied or not. If so, we take i_data_sem and release i_mutex lock directly. Meanwhile iocb->private is set to indicate that this is a dio overwrite, and it will be handled in ext4_ext_direct_IO. [ Added fix from Dan Carpenter to fix locking bug on the error path. ] CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> CC: Robin Dong <hao.bigrat@gmail.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>	2012-07-22 20:19:31 -04:00
Al Viro	8cae6f7158	ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:55 +04:00
Al Viro	11e62a8fab	btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:43 +04:00
Al Viro	765927b2d5	switch dentry_open() to struct path, make it grab references itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:29 +04:00
Al Viro	3b8b487114	ecryptfs: don't reinvent the wheels, please - use struct completion ... and keep the sodding requests on stack - they are small enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:01:02 +04:00
Al Viro	8fc37ec54c	don't expose I_NEW inodes via dentry->d_inode d_instantiate(dentry, inode); unlock_new_inode(inode); is a bad idea; do it the other way round... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:58 +04:00
Al Viro	32a7991b6a	tidy up namei.c a bit locking/unlocking for rcu walk taken to a couple of inline helpers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:55 +04:00
Al Viro	3c0a616368	unobfuscate follow_up() a bit really convoluted test in there has grown up during struct mount introduction; what it checks is that we'd reached the root of mount tree.	2012-07-23 00:00:45 +04:00
Eric Sandeen	de9b942202	ext3: pass custom EOF to generic_file_llseek_size() Use the new custom EOF argument to generic_file_llseek_size so that SEEK_END will go to the max hash value for htree dirs in ext3 rather than to i_size_read() Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:30 +04:00
Eric Sandeen	ec7268ce21	ext4: use core vfs llseek code for dir seeks Use the new functionality in generic_file_llseek_size() to accept a custom EOF position, and un-cut-and-paste all the vfs llseek code from ext4. Also fix up comments on ext4_llseek() to reflect reality. Signed-off-by: Eric Sandeen <sandeen@redaht.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:28 +04:00
Eric Sandeen	e8b96eb503	vfs: allow custom EOF in generic_file_llseek code For ext3/4 htree directories, using the vfs llseek function with SEEK_END goes to i_size like for any other file, but in reality we want the maximum possible hash value. Recent changes in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c, but replicating this core code seems like a bad idea, especially since the copy has already diverged from the vfs. This patch updates generic_file_llseek_size to accept both a custom maximum offset, and a custom EOF position. With this in place, ext4_dir_llseek can pass in the appropriate maximum hash position for both maxsize and eof, and get what it wants. As far as I know, this does not fix any bugs - nfs in the kernel doesn't use SEEK_END, and I don't know of any user who does. But some ext4 folks seem keen on doing the right thing here, and I can't really argue. (Patch also fixes up some comments slightly) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-23 00:00:15 +04:00
Jan Kara	4ea425b63a	vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes wakeup_flusher_threads(0) will queue work doing complete writeback for each flusher thread. Thus there is not much point in submitting another work doing full inode WB_SYNC_NONE writeback by writeback_inodes_sb(). After this change it does not make sense to call nonblocking ->sync_fs and block device flush before calling sync_inodes_sb() because wakeup_flusher_threads() is completely asynchronous and thus these functions would be called in parallel with inode writeback running which will effectively void any work they do. So we move sync_inodes_sb() call before these two functions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:59:01 +04:00
Jan Kara	d0e91b13eb	vfs: Remove unnecessary flushing of block devices It is not necessary to write block devices twice. The reason why we first did flush and then proper sync is that for_each_bdev() { write_bdev() wait_for_completion() } is much slower than for_each_bdev() write_bdev() for_each_bdev() wait_for_completion() when there is bigger amount of data. But as is seen in the above, there's no real need to scan pages and submit them twice. We just need to separate the submission and waiting part. This patch does that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:53 +04:00
Jan Kara	a8c7176b6d	vfs: Make sys_sync writeout also block device inodes In case block device does not have filesystem mounted on it, sys_sync will just ignore it and doesn't writeout its dirty pages. This is because writeback code avoids writing inodes from superblock without backing device and blockdev_superblock is such a superblock. Since it's unexpected that sync doesn't writeout dirty data for block devices be nice to users and change the behavior to do so. So now we iterate over all block devices on blockdev_super instead of iterating over all superblocks when syncing block devices. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:49 +04:00
Jan Kara	5c0d6b60a0	vfs: Create function for iterating over block devices Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:45 +04:00
Jan Kara	b3de653105	vfs: Reorder operations during sys_sync Change the order of operations during sync from for_each_sb { writeback_inodes_sb(); sync_fs(nowait); __sync_blockdev(nowait); } for_each_sb { sync_inodes_sb(); sync_fs(wait); __sync_blockdev(wait); } to for_each_sb writeback_inodes_sb(); for_each_sb sync_fs(nowait); for_each_sb __sync_blockdev(nowait); for_each_sb sync_inodes_sb(); for_each_sb sync_fs(wait); for_each_sb __sync_blockdev(wait); This is a preparation for the following patches in this series. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:41 +04:00
Jan Kara	a117782571	quota: Move quota syncing to ->sync_fs method Since the moment writes to quota files are using block device page cache and space for quota structures is reserved at the moment they are first accessed we have no reason to sync quota before inode writeback. In fact this order is now only harmful since quota information can easily change during inode writeback (either because conversion of delayed-allocated extents or simply because of allocation of new blocks for simple filesystems not using page_mkwrite). So move syncing of quota information after writeback of inodes into ->sync_fs method. This way we do not have to use ->quota_sync callback which is primarily intended for use by quotactl syscall anyway and we get rid of calling ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which also support legacy non-journalled quotas) and thus there are no dirty quota structures. CC: "Theodore Ts'o" <tytso@mit.edu> CC: Joel Becker <jlbec@evilplan.org> CC: reiserfs-devel@vger.kernel.org Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Dave Kleikamp <shaggy@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:34 +04:00
Jan Kara	ceed17236a	quota: Split dquot_quota_sync() to writeback and cache flushing part Split off part of dquot_quota_sync() which writes dquots into a quota file to a separate function. In the next patch we will use the function from filesystems and we do not want to abuse ->quota_sync quotactl callback more than necessary. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:19 +04:00
Jan Kara	6eedc70150	vfs: Move noop_backing_dev_info check from sync into writeback In principle, a filesystem may want to have ->sync_fs() called during sync(1) although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info). Only writeback code really needs bdi set to something reasonable. So move the checks where they are more logical. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:18 +04:00
Artem Bityutskiy	9e9ad5f408	fs/ufs: get rid of write_super This patch makes UFS stop using the VFS '->write_super()' method along with the 's_dirt' superblock flag, because they are on their way out. The way we implement this is that we schedule a delay job instead relying on 's_dirt' and '->write_super()'. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblocks using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds, even if there are no diry superblocks, or there are no client file-systems which would need this (e.g., btrfs does not use '->write_super()'). So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super()' VFS service, and then remove it together with the kernel thread. Tested using fsstress from the LTP project. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:16 +04:00
Artem Bityutskiy	7bd54ef722	fs/ufs: re-arrange the code a bit This patch does not do any functional changes. It only moves 3 functions in fs/ufs/super.c a little bit up in order to prepare for further changes where I'll need this new arrangement to avoid forward declarations. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:14 +04:00
Artem Bityutskiy	65e5e83f7d	fs/ufs: remove extra superblock write on unmount UFS calls 'ufs_write_super()' from 'ufs_put_super()' in order to write the superblocks to the media. However, it is not needed because VFS calls '->sync_fs()' before calling '->put_super()' - so by the time we are in 'ufs_write_super()', the superblocks are already synchronized. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:14 +04:00
Artem Bityutskiy	9d46be294d	fs/sysv: stop using write_super and s_dirt It does not look like sysv FS needs 'write_super()' at all, because all it does is a timestamp update. I cannot test this patch, because this file-system is so old and probably has not been used by anyone for years, so there are no tools to create it in Linux. But from the code I see that marking the superblock as dirty is basically marking the superblock buffers as drity and then setting the s_dirt flag. And when 'write_super()' is executed to handle the s_dirt flag, we just update the timestamp and again mark the superblock buffer as dirty. Seems pointless. It looks like we can update the timestamp more opprtunistically - on unmount or remount of sync, and nothing should change. Thus, this patch removes 'sysv_write_super()' and 's_dirt'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:12 +04:00
Artem Bityutskiy	eee458936b	fs/sysv: remove another useless write_super call We do not need to call 'sysv_write_super()' from 'sysv_remount()', because VFS has called 'sysv_sync_fs()' before calling '->remount()'. So remove it. Remove also '(un)lock_super()' which obvioulsy is becoming useless in this function. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:11 +04:00
Artem Bityutskiy	a4d05d315a	fs/sysv: remove useless write_super call We do not need to call 'sysv_write_super()' from 'sysv_put_super()', because VFS has called 'sysv_sync_fs()' before calling '->put_super()'. So remove it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:10 +04:00
Artem Bityutskiy	5687b5780e	hfs: get rid of hfs_sync_super This patch makes hfs stop using the VFS '->write_super()' method along with the 's_dirt' superblock flag, because they are on their way out. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblocks using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds, even if there are no diry superblocks, or there are no client file-systems which would need this (e.g., btrfs does not use '->write_super()'). So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super()' VFS service, and then remove it together with the kernel thread. Tested using fsstress from the LTP project. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:09 +04:00
Artem Bityutskiy	b16ca62635	hfs: introduce VFS superblock object back-reference Add an 'sb' VFS superblock back-reference to the 'struct hfs_sb_info' data structure - we will need to find the VFS superblock from a 'struct hfs_sb_info' object in the next patch, so this change is jut a preparation. Remove few useless newlines while on it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:08 +04:00
Artem Bityutskiy	4527440d5d	hfs: simplify a bit checking for R/O We have the following pattern in 2 places in HFS if (!RDONLY) hfs_mdb_commit(); This patch pushes the RDONLY check down to 'hfs_mdb_commit()'. This will make the following patches a bit simpler. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:07 +04:00
Artem Bityutskiy	a3742d4828	hfs: remove extra mdb write on unmount HFS calls 'hfs_write_super()' from 'hfs_put_super()' in order to write the MDB to the media. However, it is not needed because VFS calls '->sync_fs()' before calling '->put_super()' - so by the time we are in 'hfs_write_super()', the MDB is already synchronized. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:07 +04:00
Artem Bityutskiy	b59352359d	hfs: get rid of lock_super Stop using lock_super for serializing the MDB changes - use the buffer-head own lock instead. Tested with fsstress. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:06 +04:00
Artem Bityutskiy	715189d836	hfs: push lock_super down HFS uses 'lock_super()'/'unlock_super()' around 'hfs_mdb_commit()' in order to serialize MDB (Master Directory Block) changes. Push it down to 'hfs_mdb_commit()' in order to simplify the code a bit. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:05 +04:00
Artem Bityutskiy	9e6c5829b0	hfsplus: get rid of write_super This patch makes hfsplus stop using the VFS '->write_super()' method along with the 's_dirt' superblock flag, because they are on their way out. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblocks using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds, even if there are no diry superblocks, or there are no client file-systems which would need this (e.g., btrfs does not use '->write_super()'). So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super()' VFS service, and then remove it together with the kernel thread. Tested using fsstress from the LTP project. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:04 +04:00
Artem Bityutskiy	58770d7e83	hfsplus: remove useless check This check is useless because we always have 'sb->s_fs_info' to be non-NULL. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:03 +04:00
Artem Bityutskiy	b7a90e8043	hfsplus: amend debugging print Print correct function name in the debugging print of the 'hfsplus_sync_fs()' function. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:02 +04:00
Artem Bityutskiy	0a81861978	hfsplus: make hfsplus_sync_fs static ... because it is used only in fs/hfsplus/super.c. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:58:01 +04:00
Al Viro	3ffa3c0e3f	aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:57:59 +04:00
Al Viro	4a9d4b024a	switch fput to task_work_add ... and schedule_work() for interrupt/kernel_thread callers (and yes, now it is OK to call from interrupt). We are guaranteed that __fput() will be done before we return to userland (or exit). Note that for fput() from a kernel thread we get an async behaviour; it's almost always OK, but sometimes you might need to have __fput() completed before you do anything else. There are two mechanisms for that - a general barrier (flush_delayed_fput()) and explicit __fput_sync(). Both should be used with care (as was the case for fput() from kernel threads all along). See comments in fs/file_table.c for details. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:57:58 +04:00
Al Viro	1e0ea00144	use __lookup_hash() in kern_path_parent() No need to bother with lookup_one_len() here - it's an overkill Signed-off-by Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:57:53 +04:00
Christoph Hellwig	824c313139	xfs: remove xfs_ialloc_find_free This function is entirely trivial and only has one caller, so remove it to simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 13:40:10 -05:00
Alain Renaud	0d882a360b	Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision. Add a XFS_ prefix to IO_DIRECT,XFS_IO_DELALLOC, XFS_IO_UNWRITTEN and XFS_IO_OVERWRITE. This to avoid namespace conflict with other modules. Signed-off-by: Alain Renaud <arenaud@sgi.com> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 11:00:55 -05:00
Christoph Hellwig	129dbc9a2d	xfs: remove xfs_inotobp There is no need to keep this helper around, opencoding it in the only caller is just as clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 10:55:36 -05:00
Christoph Hellwig	475ee413f3	xfs: merge xfs_itobp into xfs_imap_to_bp All callers of xfs_imap_to_bp want the dinode pointer, so let's calculate it inside xfs_imap_to_bp. Once that is done xfs_itobp becomes a fairly pointless wrapper which can be replaced with direct calls to xfs_imap_to_bp. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 10:46:56 -05:00
Christoph Hellwig	6b7a03f03a	xfs: handle EOF correctly in xfs_vm_writepage We need to zero out part of a page which beyond EOF before setting uptodate, otherwise, mapread or write will see non-zero data beyond EOF. Based on the code in fs/buffer.c and the following ext4 commit: ext4: handle EOF correctly in ext4_bio_write_page() And yes, I wish we had a good test case for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 10:42:56 -05:00
Christoph Hellwig	69ff282611	xfs: implement ->update_time Use this new method to replace our hacky use of ->dirty_inode. An additional benefit is that we can now propagate errors up the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 10:38:32 -05:00
Chen Baozi	96ee34be7a	xfs: fix comment typo of struct xfs_da_blkinfo. Fix trivial typo error that has written "It" to "Is". Signed-off-by: Chen Baozi <baozich@gmail.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-22 10:34:42 -05:00
Linus Torvalds	ce9f8d6b39	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd Pull pnfs/ore fixes from Boaz Harrosh: "These are catastrophic fixes to the pnfs objects-layout that were just discovered. They are also destined for @stable. I have found these and worked on them at around RC1 time but unfortunately went to the hospital for kidney stones and had a very slow recovery. I refrained from sending them as is, before proper testing, and surly I have found a bug just yesterday. So now they are all well tested, and have my sign-off. Other then fixing the problem at hand, and assuming there are no bugs at the new code, there is low risk to any surrounding code. And in anyway they affect only these paths that are now broken. That is RAID5 in pnfs objects-layout code. It does also affect exofs (which was not broken) but I have tested exofs and it is lower priority then objects-layout because no one is using exofs, but objects-layout has lots of users." * 'for-linus' of git://git.open-osd.org/linux-open-osd: pnfs-obj: Fix __r4w_get_page when offset is beyond i_size pnfs-obj: don't leak objio_state if ore_write/read fails ore: Unlock r4w pages in exact reverse order of locking ore: Remove support of partial IO request (NFS crash) ore: Fix NFS crash by supporting any unaligned RAID IO	2012-07-20 11:43:53 -07:00
Linus Torvalds	1793416287	Fix a bug in UBIFS free space fix-up reported already twice recently: http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html and we finally have the fix. I am quite confident the fix is correct because I could reproduce the problem with nandsim and verify the fix. It was also verified by Iwo (the reporter). I am also confident that this is OK to merge the fix so late because this patch affects only the fixup functionality, which is not used by most users. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQCQZ1AAoJECmIfjd9wqK0fosP/RD3Ruo5ILvTtBThKJPUoeld kihD9w3rk26cILlpGA3Cs/kaoOj/wPtjMVKGVkw50cWKRQemFLMh4ZcbCepfae+b g+YsH+ihkINgjdpKM351lgSCS+NEPJ695zmxNJ+/zjM5+ewfP6vK0qivnjF7w81k jLAVt80a1nhjNPyDMeQVr69HxBegYuX927LL4onJULYqvmrSiX/5tXzI+02emjDf 9gA99fyc4pLNAJzzQyr44pogNaSME+Q90p4PAd11tlaVfn1kXgCXA3Ybv2cy7cer ipQfHQzfMjiCMO7Kpt5Ja3necuTarZsHV4UtmXhc4uIOr5p57dJX7RfBzA3j4RmV 2ZFynqjl7n6ZT0pAM/0F9h9FyjZrCcgg1BGcEsqfJv2Yu7txOX1Qo2gkEvYJl8Sx Q2G6xNdzyib8MXClm4L2Zix16WqAF7CyUZo+szUTpdO8PPzgJ/vNpAk+3yqoVeep 0Dr0HmTMRuP6tJGa9TH58QlvhClkXGSb7ukk1UlV4RVXtvvYtjVBwUXoHSUHNDJO HB9B+7ViTIjm9fdILqCX5wtrnZZQgFd1hBiQ/13/ZFrtB1hz5WfOdgfLRIBifjbq hGkwQyb5zsWTm7KGTOV0Yncmbnkut4zSJpMCbjZvcPJ2r5zwNwImKdvLBJ7oCKmd nPZ2dJmJYdKw2L00SzGZ =bUPG -----END PGP SIGNATURE----- Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs Pull UBIFS free space fix-up bugfix from Artem Bityutskiy: "It's been reported already twice recently: http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html and we finally have the fix. I am quite confident the fix is correct because I could reproduce the problem with nandsim and verify the fix. It was also verified by Iwo (the reporter). I am also confident that this is OK to merge the fix so late because this patch affects only the fixup functionality, which is not used by most users." * tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs: UBIFS: fix a bug in empty space fix-up	2012-07-20 11:42:30 -07:00
Bob Peterson	15e1c96022	GFS2: Eliminate 64-bit divides This patch removes the 64-bit divides introduced in the previous patch in favor of shifting, so that it will compile properly on 32-bit machines. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-20 19:15:09 +01:00
Boaz Harrosh	c999ff6802	pnfs-obj: Fix __r4w_get_page when offset is beyond i_size It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global zero_page, if offset is beyond i_size. TODO: Change the API to ->__r4w_get_page() so a NULL can be returned without being considered as error, since XOR API treats NULL entries as zero_pages. [Bug since 3.2. Should apply the same way to all Kernels since] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-07-20 11:50:31 +03:00
Boaz Harrosh	9909d45a85	pnfs-obj: don't leak objio_state if ore_write/read fails [Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-07-20 11:50:30 +03:00
Boaz Harrosh	537632e0a5	ore: Unlock r4w pages in exact reverse order of locking The read-4-write pages are locked in address ascending order. But where unlocked in a way easiest for coding. Fix that, locks should be released in opposite order of locking, .i.e descending address order. I have not hit this dead-lock. It was found by inspecting the dbug print-outs. I suspect there is an higher lock at caller that protects us, but fix it regardless. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-07-20 11:49:25 +03:00
Boaz Harrosh	62b62ad873	ore: Remove support of partial IO request (NFS crash) Do to OOM situations the ore might fail to allocate all resources needed for IO of the full request. If some progress was possible it would proceed with a partial/short request, for the sake of forward progress. Since this crashes NFS-core and exofs is just fine without it just remove this contraption, and fail. TODO: Support real forward progress with some reserved allocations of resources, such as mem pools and/or bio_sets [Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> CC: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-07-20 11:47:43 +03:00
Boaz Harrosh	9ff19309a9	ore: Fix NFS crash by supporting any unaligned RAID IO In RAID_5/6 We used to not permit an IO that it's end byte is not stripe_size aligned and spans more than one stripe. .i.e the caller must check if after submission the actual transferred bytes is shorter, and would need to resubmit a new IO with the remainder. Exofs supports this, and NFS was supposed to support this as well with it's short write mechanism. But late testing has exposed a CRASH when this is used with none-RPC layout-drivers. The change at NFS is deep and risky, in it's place the fix at ORE to lift the limitation is actually clean and simple. So here it is below. The principal here is that in the case of unaligned IO on both ends, beginning and end, we will send two read requests one like old code, before the calculation of the first stripe, and also a new site, before the calculation of the last stripe. If any "boundary" is aligned or the complete IO is within a single stripe. we do a single read like before. The code is clean and simple by splitting the old _read_4_write into 3 even parts: 1._read_4_write_first_stripe 2. _read_4_write_last_stripe 3. _read_4_write_execute And calling 1+3 at the same place as before. 2+3 before last stripe, and in the case of all in a single stripe then 1+2+3 is preformed additively. Why did I not think of it before. Well I had a strike of genius because I have stared at this code for 2 years, and did not find this simple solution, til today. Not that I did not try. This solution is much better for NFS than the previous supposedly solution because the short write was dealt with out-of-band after IO_done, which would cause for a seeky IO pattern where as in here we execute in order. At both solutions we do 2 separate reads, only here we do it within a single IO request. (And actually combine two writes into a single submission) NFS/exofs code need not change since the ORE API communicates the new shorter length on return, what will happen is that this case would not occur anymore. hurray!! [Stable this is an NFS bug since 3.2 Kernel should apply cleanly] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>	2012-07-20 11:45:28 +03:00
Julia Lawall	7074e5eb23	UBIFS: remove invalid reference to list iterator variable If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. Replace a field access from orphan by NULL in two places. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier c; expression E; iterator name list_for_each_entry; statement S; @@ list_for_each_entry(c,...) { ... when != break; when forall when strict } ... ( c = E \| *c ) // </smpl> Artem: fortunately, this did not cause any issues because we iterate the orphan list using the elements count, so we never dereferenced the corrupted pointer. This is why I do not send this patch to -stable. But otherwise - well spotted! Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>	2012-07-20 10:27:25 +03:00
Artem Bityutskiy	d51f17ea0a	UBIFS: simplify reply code a bit In the log reply code we assume that 'c->lhead_offs' is known and may be non-zero, which is not the case because we do not store it in the master node and have to find out by scanning on every mount. Knowing this fact allows us to simplify the log scanning loop a bit and remove a couple of unneeded local variables. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>	2012-07-20 10:27:25 +03:00
Artem Bityutskiy	06bef9451a	UBIFS: add debugfs knob to switch to R/O mode This patch adds another debugfs knob which switches UBIFS to R/O mode. I needed it while trying to reproduce the 'first log node is not CS node' bug. Without this debugfs knob you have to perform a power cut to repruduce the bug. The knob is named 'ro_error' and all it does is it sets the 'ro_error' UBIFS flag which makes UBIFS disallow any further writes - even write-back will fail with -EROFS. Useful for debugging. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>	2012-07-20 10:27:25 +03:00
Alexandre Pereira da Silva	782759b9f5	UBIFS: fix compilation warning Fix the following compilation warning: fs/ubifs/dir.c: In function 'ubifs_rename': fs/ubifs/dir.c:972:15: warning: 'saved_nlink' may be used uninitialized in this function Use the 'uninitialized_var()' macro to get rid of this false-positive. Artem: massaged the patch a bit. Signed-off-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-20 10:27:25 +03:00
Artem Bityutskiy	c6727932cf	UBIFS: fix a bug in empty space fix-up UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Cc: stable@vger.kernel.org [v3.0+] Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Reported-by: James Nute <newten82@gmail.com>	2012-07-20 10:13:27 +03:00
Bob Peterson	8e2e004735	GFS2: Reduce file fragmentation This patch reduces GFS2 file fragmentation by pre-reserving blocks. The resulting improved on disk layout greatly speeds up operations in cases which would have resulted in interlaced allocation of blocks previously. A typical example of this is 10 parallel dd processes, each writing to a file in a common dirctory. The implementation uses an rbtree of reservations attached to each resource group (and each inode). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-19 14:51:08 +01:00
Linus Torvalds	a9866ba47c	Merge git://git.samba.org/sfrench/cifs-2.6 Pull CIFS fixes from Steve French. * git://git.samba.org/sfrench/cifs-2.6: cifs: always update the inode cache with the results from a FIND_* cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space Initialise mid_q_entry before putting it on the pending queue	2012-07-18 09:28:11 -07:00
Al Viro	331ae4962b	ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT Caused, AFAICS, by mismerge in commit `ff9cb1c4ee` ("Merge branch 'for_linus' into for_linus_merged") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-18 08:59:46 -07:00
Abhijith Das	294f2ad5a5	GFS2: kernel panic with small gfs2 filesystems - 1 RG In the unlikely setup where there's only one resource group in the gfs2 filesystem, gfs2_rgrpd_get_next() returns a NULL rgd that is not dealt with properly, causing a kernel NULL ptr dereference. This patch fixes this issue. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-07-18 16:45:13 +01:00
Miklos Szeredi	69fe05c90e	fuse: add missing INIT flags Add missing flags that userspace derived from the protocol version number. This makes the protocol more flexible. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-07-18 16:09:40 +02:00
Brian Foster	a8894274a3	fuse: update attributes on aio_read A fuse-based network filesystem might allow for the inode and/or file data to change unexpectedly. A local client that opens and repeatedly reads a file might never pick up on such changes and indefinitely return stale data. Always invoke fuse_update_attributes() in the read path to cause an attr revalidation when the attributes expire. This leads to a page cache invalidation if necessary and ensures fuse issues new read requests to the fuse client. The original logic (reval only on reads beyond EOF) is preserved unless the client specifies FUSE_AUTO_INVAL_DATA on init. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-07-18 16:09:40 +02:00
Brian Foster	eed2179efe	fuse: invalidate inode mapping if mtime changes We currently invalidate the inode address space mapping if the file size changes unexpectedly. In the case of a fuse network filesystem, a portion of a file could be overwritten remotely without changing the file size. Compare the old mtime as well to detect this condition and invalidate the mapping if the file has been updated. The original logic (to ignore changes in mtime) is preserved unless the client specifies FUSE_AUTO_INVAL_DATA on init. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-07-18 16:09:40 +02:00
Brian Foster	72d0d248ca	fuse: add FUSE_AUTO_INVAL_DATA init flag FUSE_AUTO_INVAL_DATA is provided to enable updated/auto cache invalidation logic. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2012-07-18 16:09:40 +02:00
Anton Vorontsov	cbe7cbf5a6	pstore/ram: Make tracing log versioned Decoding the binary trace w/ a different kernel might be troublesome since we convert addresses to symbols. For kernels with minimal changes, the mappings would probably match, but it's not guaranteed at all. (But still we could convert the addresses by hand, since we do print raw addresses.) If we use modules, the symbols could be loaded at different addresses from the previously booted kernel, and so this would also fail, but there's nothing we can do about it. Also, the binary data format that pstore/ram is using in its ringbuffer may change between the kernels, so here we too must ensure that we're running the same kernel. So, there are two questions really: 1. How to compute the unique kernel tag; 2. Where to store it. In this patch we're using LINUX_VERSION_CODE, just as hibernation (suspend-to-disk) does. This way we are protecting from the kernel version mismatch, making sure that we're running the same kernel version and patch level. We could use CRC of a symbol table (as suggested by Tony Luck), but for now let's not be that strict. And as for storing, we are using a small trick here. Instead of allocating a dedicated buffer for the tag (i.e. another prz), or hacking ram_core routines to "reserve" some control data in the buffer, we are just encoding the tag into the buffer signature (and XOR'ing it with the actual signature value, so that buffers not needing a tag can just pass zero, which will result into the plain old PRZ signature). Suggested-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Tony Luck <tony.luck@intel.com> Suggested-by: Colin Cross <ccross@android.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 16:48:09 -07:00
Linus Torvalds	a5e135122c	Last-minute PM update for 3.5 This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future reuse of the capability in question in related cases. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQBcRhAAoJEKhOf7ml8uNsnIoP/2XhSul9N/AWC5jfEAh4Af07 QdhfJmYXnXC1Irndh/IoAITu+vHQecm0XjbvAy/9QOBn9oSkM7kNilvOLrCrdzzQ j9/BRMRCJRcu/vMyJmt37z0OIgfiktgDoOBaE6nC5t+1nHotcByAMWdy/AGwqqaL q3lbYcoRtDDQpDr9XPm68cyRdddvWnq81gXb90gNovvfgCjNFVvscshXmMGv3Luy Dx29zROJHJNOWG3kV1Xq7PdNffZj1ChCgIsBRKkzKWROcVEGPEuH5O0wjf4I4rCV PW6nRV9WOykqJI5CAnrWzr9bf8AvpclXtGYWFiwPvUF0kMggSoNFb5xQyRy45SBC nC+daLZNO123yU8xKb3qXaotsKPJ0qRTKAWUqWaGkRkQ0Mg90VmanyYkmP5PkeUX ZABNS4QlxnLGDtZuhSBioUO5pf0iDdzSrYkIOuYD81DGM8yKWWmUyxupOoVW5Kmu QD0d34+ZgEndv9znZzBF8DdGxkwjwljJW6sIBw7PGDq3qXcYdzd4awgtPlnGEOh/ oi6iG24r8oysB8w5IJpwj20/zCvJyYVR+m+eHXxEs373xIGpbAfJbHYRKHqkYgTo nYkZyLgE0g46Izqbb42yrN7y5dUhSsrbImTI8L5xaLVkBYhspEuSO/eSLgoklWiw VgbmreU3R0apj0hwPcA5 =oZrz -----END PGP SIGNATURE----- Merge tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull a last-minute PM update from Rafael J. Wysocki: "This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future reuse of the capability in question in related cases." * tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND	2012-07-17 14:15:43 -07:00
Bryan Schumaker	bb6e071f84	NFS: exit_nfs_v4() shouldn't be an __exit function ... yet. Right now, init_nfs() is calling this function if an error is encountered when loading the nfs module. An __exit function can't be called from one declared as __init. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 17:02:57 -04:00
Michael Kerrisk	d9914cf661	PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND As discussed in http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990, the capability introduced in `4d7e30d989` to govern EPOLLWAKEUP seems misnamed: this capability is about governing the ability to suspend the system, not using a particular API flag (EPOLLWAKEUP). We should make the name of the capability more general to encourage reuse in related cases. (Whether or not this capability should also be used to govern the use of /sys/power/wake_lock is a question that needs to be separately resolved.) This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure that the old capability name doesn't make it out into the wild, could you please apply and push up the tree to ensure that it is incorporated for the 3.5 release. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-17 21:37:27 +02:00
Anton Vorontsov	67a101f573	pstore: Headers should include all stuff they use Headers should really include all the needed prototypes, types, defines etc. to be self-contained. This is a long-standing issue, but apparently the new tracing code unearthed it (SMP=n is also a prerequisite): In file included from fs/pstore/internal.h:4:0, from fs/pstore/ftrace.c:21: include/linux/pstore.h:43:15: error: field ‘read_mutex’ has incomplete type While at it, I also added the following: linux/types.h -> size_t, phys_addr_t, uXX and friends linux/spinlock.h -> spinlock_t linux/errno.h -> Exxxx linux/time.h -> struct timespec (struct passed by value) struct module and rs_control forward declaration (passed via pointers). Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 12:15:30 -07:00
Bryan Schumaker	ec409897e7	NFS: Split out NFS v4 client functions These functions are only needed by NFS v4, so they can be moved into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:56 -04:00
Bryan Schumaker	fbdefd6442	NFS: Split out the NFS v4 filesystem types This allows me to move the v4 mounting and unmounting functions out of the generic client and into a file that is only compiled when CONFIG_NFS_V4 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:55 -04:00
Bryan Schumaker	3cadf4b864	NFS: Create a single nfs_clone_super() function v2 and v3 shared a function for this, but v4 implemented something only slightly different. Might as well share code whenever possible... Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:54 -04:00
Bryan Schumaker	fcf10398f6	NFS: Split out NFS v4 server creating code These functions are specific to NFS v4 and can be moved to nfs4client.c to keep them out of the generic client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:53 -04:00
Bryan Schumaker	428360d77c	NFS: Initialize the NFS v4 client from init_nfs_v4() And split these functions out of the generic client into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:52 -04:00
Bryan Schumaker	a38a9eac75	NFS: Move the v4 getroot code to nfs4getroot.c Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:51 -04:00
Bryan Schumaker	ce4ef7c0a8	NFS: Split out NFS v4 file operations This patch moves the NFS v4 file functions into a new file that is only compiled when CONFIG_NFS_V4 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:50 -04:00
Bryan Schumaker	466bfe7f4a	NFS: Initialize v4 sysctls from nfs_init_v4() And split them out of the generic client into their own file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:18 -04:00
Bryan Schumaker	129d1977ed	NFS: Create an init_nfs_v4() function I want to initialize all of NFS v4 in a single function that will eventually be used as the v4 module init function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:13 -04:00
Bryan Schumaker	73a79706d7	NFS: Split out NFS v4 inode operations The NFS v4 file inode operations are already already in nfs4proc.c, so this patch just needs to move the directory operations to the same file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:05 -04:00
Bryan Schumaker	ab96291ea1	NFS: Split out NFS v3 inode operations This patch moves the NFS v3 file and directory inode functions into files that are only compiled whet CONFIG_NFS_V3 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:33:03 -04:00
Bryan Schumaker	597d92891b	NFS: Split out NFS v2 inode operations This patch moves the NFS v2 file and directory inode functions into files that are only compiled whet CONFIG_NFS_V2 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-17 13:32:55 -04:00
Anton Vorontsov	a694d1b591	pstore/ram: Add ftrace messages handling The ftrace log size is configurable via ramoops.ftrace_size module option, and the log itself is available via <pstore-mount>/ftrace-ramoops file. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 10:14:17 -07:00
Anton Vorontsov	c2b7113261	pstore/ram: Convert to write_buf callback Don't use pstore.buf directly, instead convert the code to write_buf callback which passes a pointer to a buffer as an argument. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 10:07:09 -07:00
Anton Vorontsov	060287b8c4	pstore: Add persistent function tracing With this support kernel can save function call chain log into a persistent ram buffer that can be decoded and dumped after reboot through pstore filesystem. It can be used to determine what function was last called before a reset or panic. We store the log in a binary format and then decode it at read time. p.s. Mostly the code comes from trace_persistent.c driver found in the Android git tree, written by Colin Cross <ccross@android.com> (according to sign-off history). I reworked the driver a little bit, and ported it to pstore. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 10:05:52 -07:00
Anton Vorontsov	897dba0274	pstore: Introduce write_buf backend callback For function tracing we need to stop using pstore.buf directly, since in a tracing callback we can't use spinlocks, and thus we can't safely use the global buffer. With write_buf callback, backends no longer need to access pstore.buf directly, and thus we can pass any buffers (e.g. allocated on stack). Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:51:38 -07:00
Anton Vorontsov	c1743cbc8d	pstore/ram_core: Get rid of prz->ecc enable/disable flag Nowadays we can use prz->ecc_size as a flag, no need for the special member in the prz struct. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:46:52 -07:00
Anton Vorontsov	5ca5d4e61d	pstore/ram: Make ECC size configurable This is now pretty straightforward: instead of using bool, just pass an integer. For backwards compatibility ramoops.ecc=1 means 16 bytes ECC (using 1 byte for ECC isn't much of use anyway). Suggested-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:46:52 -07:00
Anton Vorontsov	4a53ffae6a	pstore/ram_core: Get rid of prz->ecc_symsize and prz->ecc_poly The struct members were never used anywhere outside of persistent_ram_init_ecc(), so there's actually no need for them to be in the struct. If we ever want to make polynomial or symbol size configurable, it would make more sense to just pass initialized rs_decoder to the persistent_ram init functions. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:46:52 -07:00
Andrew Morton	17f79be93d	sysfs: fail dentry revalidation after namespace change fix don't assume that KOBJ_NS_TYPE_NONE==0. Also save a test-n-branch. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:43:55 -07:00
Glauber Costa	e5bcac6147	sysfs: fail dentry revalidation after namespace change When we change the namespace tag of a sysfs entry, the associated dentry is still kept around. readdir() will work correctly and not display the old entries, but open() will still succeed, so will reads and writes. This will no longer happen if sysfs is remounted, hinting that this is a cache-related problem. I am using the following sequence to demonstrate that: shell1: ip link add type veth unshare -nm shell2: ip link set veth1 <pid_of_shell_1> cat /sys/devices/virtual/net/veth1/ifindex Before that patch, this will succeed (fail to fail). After it, it will correctly return an error. Differently from a normal rename, which we handle fine, changing the object namespace will keep it's path intact. So this check seems necessary as well. [ v2: get type from parent, as suggested by Eric Biederman ] Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Tejun Heo <tj@kernel.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-17 09:43:55 -07:00
Jeff Layton	cd60042cc1	cifs: always update the inode cache with the results from a FIND_* When we get back a FIND_FIRST/NEXT result, we have some info about the dentry that we use to instantiate a new inode. We were ignoring and discarding that info when we had an existing dentry in the cache. Fix this by updating the inode in place when we find an existing dentry and the uniqueid is the same. Cc: <stable@vger.kernel.org> # .31.x Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org> Reported-by: Bill Robertson <bill_robertson@debortoli.com.au> Reported-by: Dion Edwards <dion_edwards@debortoli.com.au> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-16 23:57:23 -05:00
Jeff Layton	3cf003c08b	cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Jian found that when he ran fsx on a 32 bit arch with a large wsize the process and one of the bdi writeback kthreads would sometimes deadlock with a stack trace like this: crash> bt PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx" #0 [eed63cbc] schedule at c083c5b3 #1 [eed63d80] kmap_high at c0500ec8 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs] #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs] #4 [eed63e50] do_writepages at c04f3e32 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a #6 [eed63ea4] filemap_fdatawrite at c04e1b3e #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs] #8 [eed63ecc] do_sync_write at c052d202 #9 [eed63f74] vfs_write at c052d4ee #10 [eed63f94] sys_write at c052df4c #11 [eed63fb0] ia32_sysenter_target at c0409a98 EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6 DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000 SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033 CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246 Each task would kmap part of its address array before getting stuck, but not enough to actually issue the write. This patch fixes this by serializing the marshal_iov operations for async reads and writes. The idea here is to ensure that cifs aggressively tries to populate a request before attempting to fulfill another one. As soon as all of the pages are kmapped for a request, then we can unlock and allow another one to proceed. There's no need to do this serialization on non-CONFIG_HIGHMEM arches however, so optimize all of this out when CONFIG_HIGHMEM isn't set. Cc: <stable@vger.kernel.org> Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-16 23:57:14 -05:00
Jeff Layton	3ae629d98b	cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space We currently rely on being able to kmap all of the pages in an async read or write request. If you're on a machine that has CONFIG_HIGHMEM set then that kmap space is limited, sometimes to as low as 512 slots. With 512 slots, we can only support up to a 2M r/wsize, and that's assuming that we can get our greedy little hands on all of them. There are other users however, so it's possible we'll end up stuck with a size that large. Since we can't handle a rsize or wsize larger than that currently, cap those options at the number of kmap slots we have. We could consider capping it even lower, but we currently default to a max of 1M. Might as well allow those luddites on 32 bit arches enough rope to hang themselves. A more robust fix would be to teach the send and receive routines how to contend with an array of pages so we don't need to marshal up a kvec array at all. That's a fairly significant overhaul though, so we'll need this limit in place until that's ready. Cc: <stable@vger.kernel.org> Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-16 23:57:09 -05:00
Sachin Prabhu	ffc61ccbb9	Initialise mid_q_entry before putting it on the pending queue A user reported a crash in cifs_demultiplex_thread() caused by an incorrectly set mid_q_entry->callback() function. It appears that the callback assignment made in cifs_call_async() was not flushed back to memory suggesting that a memory barrier was required here. Changing the code to make sure that the mid_q_entry structure was completely initialised before it was added to the pending queue fixes the problem. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-07-16 23:57:02 -05:00
Greg Kroah-Hartman	28a78e46f0	Merge 3.5-rc7 into driver-core-next This pulls in the printk fixes to the driver-core-next branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-16 18:19:55 -07:00
David Teigland	96006ea6d4	dlm: fix missing dir remove I don't know exactly how, but in some cases, a dir record is not removed, or a new one is created when it shouldn't be. The result is that the dir node lookup returns a master node where the rsb does not exist. In this case, The master node will repeatedly return -EBADR for requests, and the lock requests will be stuck. Until all possible ways for this to happen can be eliminated, a simple and effective way to recover from this situation is for the supposed master node to send a standard remove message to the dir node when it receives a request for a resource it has no rsb for. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:24:43 -05:00
David Teigland	c503a62103	dlm: fix conversion deadlock from recovery The process of rebuilding locks on a new master during recovery could re-order the locks on the convert queue, creating an "in place" conversion deadlock that would not be resolved. Fix this by not considering queue order when granting conversions after recovery. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:18:22 -05:00
David Teigland	6d768177c2	dlm: use wait_event_timeout Use wait_event_timeout to avoid using a timer directly. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:18:12 -05:00
David Teigland	05c32f47bf	dlm: fix race between remove and lookup It was possible for a remove message on an old rsb to be sent after a lookup message on a new rsb, where the rsbs were for the same resource name. This could lead to a missing directory entry for the new rsb. It is fixed by keeping a copy of the resource name being removed until after the remove has been sent. A lookup checks if this in-progress remove matches the name it is looking up. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:18:01 -05:00
David Teigland	1d7c484eeb	dlm: use idr instead of list for recovered rsbs When a large number of resources are being recovered, a linear search of the recover_list takes a long time. Use an idr in place of a list. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:17:52 -05:00
David Teigland	c04fecb4d9	dlm: use rsbtbl as resource directory Remove the dir hash table (dirtbl), and use the rsb hash table (rsbtbl) as the resource directory. It has always been an unnecessary duplication of information. This improves efficiency by using a single rsbtbl lookup in many cases where both rsbtbl and dirtbl lookups were needed previously. This eliminates the need to handle cases of rsbtbl and dirtbl being out of sync. In many cases there will be memory savings because the dir hash table no longer exists. Signed-off-by: David Teigland <teigland@redhat.com>	2012-07-16 14:16:19 -05:00
Chuck Lever	6bbb4ae8ff	NFS: Clean up nfs4_proc_setclientid() and friends Add documenting comments and appropriate debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 15:12:16 -04:00
Chuck Lever	de73483122	NFS: Treat NFS4ERR_CLID_INUSE as a fatal error For NFSv4 minor version 0, currently the cl_id_uniquifier allows the Linux client to generate a unique nfs_client_id4 string whenever a server replies with NFS4ERR_CLID_INUSE. This implementation seems to be based on a flawed reading of RFC 3530. NFS4ERR_CLID_INUSE actually means that the client has presented this nfs_client_id4 string with a different principal at some time in the past, and that lease is still in use on the server. For a Linux client this might be rather difficult to achieve: the authentication flavor is named right in the nfs_client_id4.id string. If we change flavors, we change strings automatically. So, practically speaking, NFS4ERR_CLID_INUSE means there is some other client using our string. There is not much that can be done to recover automatically. Let's make it a permanent error. Remove the recovery logic in nfs4_proc_setclientid(), and remove the cl_id_uniquifier field from the nfs_client data structure. And, remove the authentication flavor from the nfs_client_id4 string. Keeping the authentication flavor in the nfs_client_id4.id string means that we could have a separate lease for each authentication flavor used by mounts on the client. But we want just one lease for all the mounts on this client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 15:12:16 -04:00
Chuck Lever	46a87b8a7b	NFS: When state recovery fails, waiting tasks should exit NFSv4 state recovery is not always successful. Failure is signalled by setting the nfs_client.cl_cons_state to a negative (errno) value, then waking waiters. Currently this can happen only during mount processing. I'm about to add an explicit case where state recovery failure during normal operation should force all NFS requests waiting on that state recovery to exit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 15:12:15 -04:00
Chuck Lever	6a1a1e34dc	SUNRPC: Add rpcauth_list_flavors() The gss_mech_list_pseudoflavors() function provides a list of currently registered GSS pseudoflavors. This list does not include any non-GSS flavors that have been registered with the RPC client. nfs4_find_root_sec() currently adds these extra flavors by hand. Instead, nfs4_find_root_sec() should be looking at the set of flavors that have been explicitly registered via rpcauth_register(). And, other areas of code will soon need the same kind of list that contains all flavors the kernel currently knows about (see below). Rather than cloning the open-coded logic in nfs4_find_root_sec() to those new places, introduce a generic RPC function that generates a full list of registered auth flavors and pseudoflavors. A new rpc_authops method is added that lists a flavor's pseudoflavors, if it has any. I encountered an interesting module loader loop when I tried to get the RPC client to invoke gss_mech_list_pseudoflavors() by name. This patch is a pre-requisite for server trunking discovery, and a pre-requisite for fixing up the in-kernel mount client to do better automatic security flavor selection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 15:12:15 -04:00
Chuck Lever	56d08fef23	NFS: nfs_getaclargs.acl_len is a size_t Squelch compiler warnings: fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’: fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Introduced by commit `bf118a34` "NFSv4: include bitmap in nfsv4 get acl data", Dec 7, 2011. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:53:43 -04:00
Chuck Lever	38527b153a	NFS: Clean up TEST_STATEID and FREE_STATEID error reporting As a finishing touch, add appropriate documenting comments and some debugging printk's. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:53:34 -04:00
Chuck Lever	3e60ffdd36	NFS: Clean up nfs41_check_expired_stateid() Clean up: Instead of open-coded flag manipulation, use test_bit() and clear_bit() just like all other accessors of the state->flag field. This also eliminates several unnecessary implicit integer type conversions. To make it absolutely clear what is going on, a number of comments are introduced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:49:40 -04:00
Chuck Lever	eb64cf964d	NFS: State reclaim clears OPEN and LOCK state The "state->flags & flags" test in nfs41_check_expired_stateid() allows the state manager to squelch a TEST_STATEID operation when it is known for sure that a state ID is no longer valid. If the lease was purged, for example, the client already knows that state ID is now defunct. But open recovery is still needed for that inode. To force a call to nfs4_open_expired(), change the default return value for nfs41_check_expired_stateid() to force open recovery, and the default return value for nfs41_check_locks() to force lock recovery, if the requested flags are clear. Fix suggested by Bryan Schumaker. Also, the presence of a delegation state ID must not prevent normal open recovery. The delegation state ID must be cleared if it was revoked, but once cleared I don't think it's presence or absence has any bearing on whether open recovery is still needed. So the logic is adjusted to ignore the TEST_STATEID result for the delegation state ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:48:53 -04:00
Chuck Lever	89af273958	NFS: Don't free a state ID the server does not recognize The result of a TEST_STATEID operation can indicate a few different things: o If NFS_OK is returned, then the client can continue using the state ID under test, and skip recovery. o RFC 5661 says that if the state ID was revoked, then the client must perform an explicit FREE_STATEID before trying to re-open. o If the server doesn't recognize the state ID at all, then no FREE_STATEID is needed, and the client can immediately continue with open recovery. Let's err on the side of caution: if the server clearly tells us the state ID is unknown, we skip the FREE_STATEID. For any other error, we issue a FREE_STATEID. Sometimes that FREE_STATEID will be unnecessary, but leaving unused state IDs on the server needlessly ties up resources. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:48:10 -04:00
Chuck Lever	377e507d15	NFS: Fix up TEST_STATEID and FREE_STATEID return code handling The TEST_STATEID and FREE_STATEID operations can return -NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION. nfs41_{test,free}_stateid() should not pass these errors to nfs4_handle_exception() during state recovery, since that will recursively kick off state recovery again, resulting in a deadlock. In particular, when the TEST_STATEID operation returns NFS4_OK, res.status can contain one of these errors. _nfs41_test_stateid() replaces NFS4_OK with the value in res.status, which is then returned to callers. But res.status is not passed through nfs4_stat_to_errno(), and thus is a positive NFS4ERR value. Currently callers are only interested in !NFS4_OK, and nfs4_handle_exception() ignores positive values. Thus the res.status values are currently ignored by nfs4_handle_exception() and won't cause the deadlock above. Thanks to this missing negative, it is only when these operations fail (which is very rare) that a deadlock can occur. Bryan agrees the original intent was to return res.status as a negative NFS4ERR value to callers of nfs41_test_stateid(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:47:52 -04:00
Andy Adamson	293b3b065c	NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list mark_matching_lsegs_invalid() resets the mds_threshold counters and can dereference the layout hdr on an initial empty plh_segs list. It returns 0 both in the case of an initial empty list and in a non-emtpy list that was cleared by calls to mark_lseg_invalid. Don't send a LAYOUTRETURN if the list was initially empty. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:39:00 -04:00
Andy Adamson	366d50521c	NFSv4.1 mark layout when already returned When the file layout driver is fencing a DS, _pnfs_return_layout can be called mulitple times per inode due to in-flight i/o referencing lsegs on it's plh_segs list. Remember that LAYOUTRETURN has been called, and do not call it again. Allow LAYOUTRETURNs after a subsequent LAYOUTGET. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:37:25 -04:00
Andy Adamson	baf6c2a44a	NFSv4.1 don't send LAYOUTCOMMIT if data resent through MDS Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:37:00 -04:00
Andy Adamson	82c7c7a5a9	NFSv4.1 return the LAYOUT for each file with failed DS connection I/O First mark the deviceid invalid to prevent any future use. Then fence all files involved in I/O to a DS with a connection error by sending a LAYOUTRETURN. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-16 14:36:52 -04:00
Linus Torvalds	fce667c574	xfs: regression fixes for 3.5-rc7 - Really fix a cursor leak in xfs_alloc_ag_vextent_near - Fix a performance regression related to doing allocation in workqueues - Prevent recursion in xfs_buf_iorequest which is causing stack overflows - Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQAGW0AAoJENaLyazVq6ZOOyEP/0xLuQKFF71eB/VL8BFylEHY H22/aDEWTo9pWDxxwBirVioBeCU07ByEv7zeQM1nqEm9pXESTSzsBUYKX2tWSRzY YclXYA0rpdCK2cdXpuz+0kWBFr9Y1Q1BIYNll6C3ZqhADgubAMHa13rKVUQlQqpD EZhvGrh42ujRhckwmi1E3+g3Ll79fty47WzyzEOa18ij3LI3q5Dm7WZpZlhv/MVW Fj975Q+LdJVchoQZ7gTiddMkZ936TwjLxM4EtWZd46CMG2/YRcPHx2YaItI0k6Xa Q34pbUHidZjqzng28iO6Y6BaB/rPLX/f7KcoZib+rc85zt8sShoDFXS9eq+DdTeH f5AgkPzpFfr3QpK3Fv5ZjICj5SkC6KhI14qnxLZhVGIWgJLGD8lb0fYDwN2y5BHq HmA7G4hALBaZ+TOpXQ2+XfxFyrffkQcM/Ja57yfDj37aOKYxMubAco4JlADBzBkg m1gF2QebQrbZ49k9x85vpIvoNFXcAg6FeuesYXciMcORCpMXp4f0oLjGlfwf13Fo upZ2AGfcpIcMl3fZzliw64VuJpX4QswTzUZFqXH8+Vn9eB6cwFqA8PJpLxBSshCk Y6E+LE6oYbsN+T6R+6Xj0DsPl4OIE0eWlGPaZZu9eWku02e0vZMj57RxUWpNcOmH 6lq8TJaS/N5Qri51NPVs =TP1A -----END PGP SIGNATURE----- Merge tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs Pull xfs regression fixes from Ben Myers: - Really fix a cursor leak in xfs_alloc_ag_vextent_near - Fix a performance regression related to doing allocation in workqueues - Prevent recursion in xfs_buf_iorequest which is causing stack overflows - Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks * tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs: xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks xfs: prevent recursion in xfs_buf_iorequest xfs: don't defer metadata allocation to the workqueue xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near	2012-07-16 10:02:36 -07:00
Trond Myklebust	8626e4a426	Merge commit '9249e17fe094d853d1ef7475dd559a2cc7e23d42' into nfs-for-3.6 Resolve conflicts with the VFS atomic open and sget changes. Conflicts: fs/nfs/nfs4proc.c	2012-07-16 12:01:42 -04:00
Anders Kaseorg	05d290d66b	fifo: Do not restart open() if it already found a partner If a parent and child process open the two ends of a fifo, and the child immediately exits, the parent may receive a SIGCHLD before its open() returns. In that case, we need to make sure that open() will return successfully after the SIGCHLD handler returns, instead of throwing EINTR or being restarted. Otherwise, the restarted open() would incorrectly wait for a second partner on the other end. The following test demonstrates the EINTR that was wrongly thrown from the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART to see a deadlock instead, in which the restarted open() waits for a second reader that will never come. (On my systems, this happens pretty reliably within about 5 to 500 iterations. Others report that it manages to loop ~forever sometimes; YMMV.) #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0) void handler(int signum) {} int main() { struct sigaction act = {.sa_handler = handler, .sa_flags = 0}; CHECK(sigaction(SIGCHLD, &act, NULL)); CHECK(mknod("fifo", S_IFIFO \| S_IRWXU, 0)); for (;;) { int fd; pid_t pid; putc('.', stderr); CHECK(pid = fork()); if (pid == 0) { CHECK(fd = open("fifo", O_RDONLY)); _exit(0); } CHECK(fd = open("fifo", O_WRONLY)); CHECK(close(fd)); CHECK(waitpid(pid, NULL, 0)); } } This is what I suspect was causing the Git test suite to fail in t9010-svn-fe.sh: http://bugs.debian.org/678852 Signed-off-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-16 08:33:14 -07:00
David Howells	0bdaea9017	VFS: Split inode_permission() Split inode_permission() into inode- and superblock-dependent parts. This is aimed at unionmounts where the superblock from the upper layer has to be checked rather than the superblock from the lower layer as the upper layer may be writable, thus allowing an unwritable file from the lower layer to be copied up and modified. Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> (Further development) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:38:36 +04:00
David Howells	9249e17fe0	VFS: Pass mount flags to sget() Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:38:34 +04:00
David Howells	f015f1267b	VFS: Comment mount following code Add comments describing what the directions "up" and "down" mean and ref count handling to the VFS mount following family of functions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> (Original author) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:38:32 +04:00
David Howells	be34d1a3bc	VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:37:27 +04:00
David Howells	55e4def0a6	VFS: Make chown() and lchown() call fchownat() Make the chown() and lchown() syscalls jump to the fchownat() syscall with the appropriate extra arguments. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:54 +04:00
Al Viro	c3c4f69424	do_dentry_open(): close the race with mark_files_ro() in failure exit we want to take it out of mark_files_ro() reach before we start checking if we ought to drop write access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:50 +04:00
Al Viro	85d7d618c1	mark_files_ro(): don't bother with mntget/mntput mnt_drop_write_file() is safe under any lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:46 +04:00
Andrew Morton	c4107b3097	notify_change(): check that i_mutex is held Cc: Djalal Harouni <tixxdz@opendz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:42 +04:00
Christoph Hellwig	b5fb63c183	fs: add nd_jump_link Add a helper that abstracts out the jump to an already parsed struct path from ->follow_link operation from procfs. Not only does this clean up the code by moving the two sides of this game into a single helper, but it also prepares for making struct nameidata private to namei.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:40 +04:00
Christoph Hellwig	408ef013cc	fs: move path_put on failure out of ->follow_link Currently the non-nd_set_link based versions of ->follow_link are expected to do a path_put(&nd->path) on failure. This calling convention is unexpected, undocumented and doesn't match what the nd_set_link-based instances do. Move the path_put out of the only non-nd_set_link based ->follow_link instance into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:35 +04:00
Al Viro	ac481d6ca4	debugfs: get rid of useless arguments to debugfs_{mkdir,symlink} Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:30 +04:00
Al Viro	cfa57c11b0	debugfs: fold debugfs_create_by_name() into the only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:25 +04:00
Al Viro	c3b1a35084	debugfs: make sure that debugfs_create_file() gets used only for regulars It, debugfs_create_dir() and debugfs_create_link() use the common helper now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:19 +04:00
Al Viro	ee3efa91e2	__d_unalias() should refuse to move mountpoints Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:15 +04:00
Al Viro	e77fb7cef8	sysfs: just use d_materialise_unique() same as for nfs et.al. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:12 +04:00
Al Viro	469796d105	sysfs: switch to ->s_d_op and ->d_release() a) ->d_iput() is wrong here - what we do to inode is completely usual, it's dentry->d_fsdata that we want to drop. Just use ->d_release(). b) switch to ->s_d_op - no need to play with d_set_d_op() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:06 +04:00
Al Viro	79714f72d3	get rid of kern_path_parent() all callers want the same thing, actually - a kinda-sorta analog of kern_path_create(). I.e. they want parent vfsmount/dentry (with ->i_mutex held, to make sure the child dentry is still their child) + the child dentry. Signed-off-by Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:02 +04:00
David Howells	1acf0af9b9	VFS: Fix the banner comment on lookup_open() Since commit 197e37d9, the banner comment on lookup_open() no longer matches what the function returns. It used to return a struct file pointer or NULL and now it returns an integer and is passed the struct file pointer it is to use amongst its arguments. Update the comment to reflect this. Also add a banner comment to atomic_open(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:57 +04:00
Al Viro	312b63fba9	don't pass nameidata * to vfs_create() all we want is a boolean flag, same as the method gets now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:50 +04:00
Al Viro	ebfc3b49a7	don't pass nameidata to ->create() boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:47 +04:00
Al Viro	72bd866a01	fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:40 +04:00
Al Viro	00cd8dd3bf	stop passing nameidata to ->lookup() Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:32 +04:00
Al Viro	201f956e43	fs/namei.c: don't pass namedata to lookup_dcache() just the flags... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:25 +04:00
Al Viro	4ce16ef3fe	fs/namei.c: don't pass nameidata to d_revalidate() since the method wrapped by it doesn't need that anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:21 +04:00
Al Viro	0b728e1911	stop passing nameidata * to ->d_revalidate() Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:14 +04:00
Al Viro	fa3c56bbda	fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:07 +04:00
Al Viro	facc3530fb	nfs_lookup_verify_inode() - nd is always non-NULL here Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:34:02 +04:00
Al Viro	93420b40bb	switch nfs_lookup_check_intent() away from nameidata just pass the flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:57 +04:00
Al Viro	02e5180d99	do_dentry_open(): take initialization of file->f_path to caller ... and get rid of a couple of arguments and a pointless reassignment in finish_open() case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:54 +04:00
Al Viro	2a027e7a18	fold __dentry_open() into its sole caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:52 +04:00
Al Viro	96b7e579ad	switch do_dentry_open() to returning int Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:49 +04:00
Al Viro	e45198a6ac	make finish_no_open() return int namely, 1 ;-) That's what we want to return from ->atomic_open() instances after finish_no_open(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:45 +04:00
Al Viro	2675a4eb6a	fs/namei.c: get do_last() and friends return int Same conventions as for ->atomic_open(). Trimmed the forest of labels a bit, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:43 +04:00
Al Viro	30d9049474	kill struct opendata Just pass struct file . Methods are happier that way... There's no need to return struct file from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:39 +04:00
Al Viro	a4a3bdd778	kill opendata->{mnt,dentry} ->filp->f_path is there for purpose... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:37 +04:00
Al Viro	d95852777b	make ->atomic_open() return int Change of calling conventions: old new NULL 1 file 0 ERR_PTR(-ve) -ve Caller knows that struct file *; no need to return it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:35 +04:00
Al Viro	3d8a00d209	don't modify od->filp at all make put_filp() conditional on flag set by finish_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:33 +04:00
Al Viro	47237687d7	->atomic_open() prototype change - pass int * instead of bool * ... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:31 +04:00
Miklos Szeredi	a8277b9baa	vfs: move O_DIRECT check to common code Perform open_check_o_direct() in a common place in do_last after opening the file. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:28 +04:00
Miklos Szeredi	f60dc3db6e	vfs: do_last(): clean up retry Move the lookup retry logic to the bottom of the function to make the normal case simpler to read. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:26 +04:00
Miklos Szeredi	77d660a8a8	vfs: do_last(): clean up bool Consistently use bool for boolean values in do_last(). Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:24 +04:00
Miklos Szeredi	e83db16722	vfs: do_last(): clean up labels Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:21 +04:00
Miklos Szeredi	aa4caadb70	vfs: do_last(): clean up error handling Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:20 +04:00
Miklos Szeredi	015c3bbcd8	vfs: remove open intents from nameidata All users of open intents have been converted to use ->atomic_{open,create}. This patch gets rid of nd->intent.open and related infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:18 +04:00
Miklos Szeredi	e43ae79c54	9p: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic open+create operation implemented via ->create. No functionality is changed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:17 +04:00
Miklos Szeredi	2d83bde9a1	ceph: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic lookup+open+create operation implemented via ->lookup and ->create operations. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:16 +04:00
Miklos Szeredi	3819219b59	ceph: remove unused arg from ceph_lookup_open() What was the purpose of this? Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:16 +04:00
Miklos Szeredi	d2c127197d	cifs: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic lookup+open+create operation implemented via ->lookup and ->create operations. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Steve French <sfrench@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:15 +04:00
Miklos Szeredi	c8ccbe032f	fuse: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic open+create operation implemented via ->create. No functionality is changed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:14 +04:00
Miklos Szeredi	eda72afb9e	nfs: don't use intents for checking atomic open is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether it's okay to skip normal revalidation. It does a racy check for mount read-onlyness and falls back to normal revalidation if the open would fail. This makes little sense now that this function isn't used for determining whether to actually open the file or not. The d_mountpoint() check still makes sense since it is an indication that we might be following a mount and so open may not revalidate the dentry. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:12 +04:00
Miklos Szeredi	50de348c36	nfs: don't use nd->intent.open.flags Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent flags were used for. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:10 +04:00
Miklos Szeredi	8867fe5899	nfs: clean up ->create in nfs_rpc_ops Don't pass nfs_open_context() to ->create(). Only the NFS4 implementation needed that and only because it wanted to return an open file using open intents. That task has been replaced by ->atomic_open so it is not necessary anymore to pass the context to the create rpc operation. Despite nfs4_proc_create apparently being okay with a NULL context it Oopses somewhere down the call chain. So allocate a context here. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:08 +04:00
Miklos Szeredi	0dd2b474d0	nfs: implement i_op->atomic_open() Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation and use the generic nfs_lookup for other lookups. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:06 +04:00
Miklos Szeredi	d18e9008c3	vfs: add i_op->atomic_open() Add a new inode operation which is called on the last component of an open. Using this the filesystem can look up, possibly create and open the file in one atomic operation. If it cannot perform this (e.g. the file type turned out to be wrong) it may signal this by returning NULL instead of an open struct file pointer. i_op->atomic_open() is only called if the last component is negative or needs lookup. Handling cached positive dentries here doesn't add much value: these can be opened using f_op->open(). If the cached file turns out to be invalid, the open can be retried, this time using ->atomic_open() with a fresh dentry. For now leave the old way of using open intents in lookup and revalidate in place. This will be removed once all the users are converted. David Howells noticed that if ->atomic_open() opens the file but does not create it, handle_truncate() will be called on it even if it is not a regular file. Fix this by checking the file type in this case too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:04 +04:00
Miklos Szeredi	54ef487241	vfs: lookup_open(): expand lookup_hash() Copy __lookup_hash() into lookup_open(). The next patch will insert the atomic open call just before the real lookup. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:02 +04:00
Miklos Szeredi	d58ffd35c1	vfs: add lookup_open() Split out lookup + maybe create from do_last(). This is the part under i_mutex protection. The function is called lookup_open() and returns a filp even though the open part is not used yet. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:01 +04:00
Miklos Szeredi	7157486541	vfs: do_last(): common slow lookup Make the slow lookup part of O_CREAT and non-O_CREAT opens common. This allows atomic_open to be hooked into the slow lookup part. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:33:00 +04:00
Miklos Szeredi	b6183df7b2	vfs: do_last(): separate O_CREAT specific code Check O_CREAT on the slow lookup paths where necessary. This allows the rest to be shared with plain open. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:59 +04:00
Miklos Szeredi	37d7fffc9c	vfs: do_last(): inline lookup_slow() Copy lookup_slow() into do_last(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:58 +04:00
Al Viro	6d7b5aaed7	namei.c: let follow_link() do put_link() on failure no need for kludgy "set cookie to ERR_PTR(...) because we failed before we did actual ->follow_link() and want to suppress put_link()", no pointless check in put_link() itself. Callers checked if follow_link() has failed anyway; might as well break out of their loops if that happened, without bothering to call put_link() first. [AV: folded fixes from hch] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:56 +04:00
Al Viro	1d674107ea	coda: use list_for_each_entry Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:56 +04:00
Al Viro	b3d9b7a3c7	vfs: switch i_dentry/d_alias to hlist Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:55 +04:00
Al Viro	9f713878f2	ext4: get rid of open-coded d_find_any_alias() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:54 +04:00
Al Viro	a614a092bf	ocfs2: use list_for_each_entry in ocfs2_find_local_alias() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:53 +04:00
Al Viro	12447c4039	affs: unobfuscate affs_fix_dcache() and add a comment on what it's doing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:52 +04:00
Al Viro	3084ee95f0	affs: get rid of open-coded list_for_each_entry() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:52 +04:00
Al Viro	7968ce12e9	adfs: don't bother with ->i_dentry in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:50 +04:00
Al Viro	e6f9f8d029	cifs: don't bother with ->i_dentry in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:49 +04:00
Al Viro	63a44583f3	qnx6: don't bother with ->i_dentry in inode-freeing callback we'll initialize it in inode_init_always() when we allocate that object again. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:48 +04:00
Al Viro	6ce6e24e72	get rid of magic in proc_namespace.c don't rely on proc_mounts->m being the first field; container_of() is there for purpose. No need to bother with ->private, while we are at it - the same container_of will do nicely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:48 +04:00
Al Viro	f7a99c5b7c	get rid of ->mnt_longterm it's enough to set ->mnt_ns of internal vfsmounts to something distinct from all struct mnt_namespace out there; then we can just use the check for ->mnt_ns != NULL in the fast path of mntput_no_expire() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:47 +04:00
Julia Lawall	d187663ef2	fs/direct-io.c: adjust suspicious bit operation READ is 0, so the result of the bit-and operation is 0. Rewrite with == as done elsewhere in the same file. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:46 +04:00
Artem Bityutskiy	3dd847820d	affs: get rid of affs_sync_super This patch makes affs stop using the VFS '->write_super()' method along with the 's_dirt' superblock flag, because they are on their way out. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblocks using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds, even if there are no diry superblocks, or there are no client file-systems which would need this (e.g., btrfs does not use '->write_super()'). So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super()' VFS service, and then remove it together with the kernel thread. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:45 +04:00
Artem Bityutskiy	a215fef7ed	affs: introduce VFS superblock object back-reference Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data structure - we will need to find the VFS superblock from a 'struct affs_sb_info' object in the next patch, so this change is jut a preparation. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:45 +04:00
Artem Bityutskiy	a837107439	affs: stop using lock_super The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted and just wait for a brave knight who'd kill them. This patch makes AFFS stop using them and use the buffer-head's own lock instead. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:44 +04:00
Artem Bityutskiy	e0471c8d8a	affs: re-structure superblock locking a bit AFFS wants to serialize the superblock (the root block in AFFS terms) updates and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the locking down to the 'affs_commit_super()' from the callers. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:43 +04:00
Artem Bityutskiy	0164b1a32e	affs: remove useless superblock writeout on remount We do not need to write out the superblock from '->remount_fs()' because VFS has already called '->sync_fs()' by this time and the superblock has already been written out. Thus, remove the 'affs_write_super()' infocation from 'affs_remount()'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:42 +04:00
Artem Bityutskiy	c9753b1d20	affs: remove useless superblock writeout on unmount We do not need to write out the superblock from '->put_super()' because VFS has already called '->sync_fs()' by this time and the superblock has already been written out. Thus, remove the 'affs_commit_super()' infocation from 'affs_put_super()'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:42 +04:00
Artem Bityutskiy	bc86256d2e	affs: stop setting bm_flags AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when it prefers one or another. AFFS writes '1' only from '->put_super()', while '->sync_fs()' and '->write_super()' store value '2'. So on the first glance, it looks like we want to have '1' if we unmount. However, this does not really happen in these cases: 1. superblock is written via 'write_super()' then we unmount; 2. we re-mount R/O, then unmount. which are quite typical. I could not find good documentation describing this field, except of one random piece of documentation in the internet which says that -1 means that the root block is valid, which is not consistent with what we have in the Linux AFFS driver. Jan Kara commented on this: "I have some vague recollection that on Amiga boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been ages..." Thus, my conclusion is that value of '1' is as good as value of '2' and we can just always use '2'. An Jan Kara suggested to go further: "generally bm_flags handling looks strange. If they are 0, we mount fs read only and thus cannot change them. If they are != 0, we write 2 there. So IMHO if you just removed bm_flags setting, nothing will really happen." So this patch removes the bm_flags setting completely. This makes the "clean" argument of the 'affs_commit_super()' function unneeded, so it is also removed. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:32:41 +04:00
Tim Sally	5f5b331d5c	eCryptfs: check for eCryptfs cipher support at mount The issue occurs when eCryptfs is mounted with a cipher supported by the crypto subsystem but not by eCryptfs. The mount succeeds and an error does not occur until a write. This change checks for eCryptfs cipher support at mount time. Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009. https://bugs.launchpad.net/ecryptfs/+bug/338914 Signed-off-by: Tim Sally <tsally@atomicpeace.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>	2012-07-13 17:20:34 -07:00
Tyler Hicks	821f7494a7	eCryptfs: Revert to a writethrough cache model A change was made about a year ago to get eCryptfs to better utilize its page cache during writes. The idea was to do the page encryption operations during page writeback, rather than doing them when initially writing into the page cache, to reduce the number of page encryption operations during sequential writes. This meant that the encrypted page would only be written to the lower filesystem during page writeback, which was a change from how eCryptfs had previously wrote to the lower filesystem in ecryptfs_write_end(). The change caused a few eCryptfs-internal bugs that were shook out. Unfortunately, more grave side effects have been identified that will force changes outside of eCryptfs. Because the lower filesystem isn't consulted until page writeback, eCryptfs has no way to pass lower write errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported that quotas could be bypassed because of the way eCryptfs may sometimes open the lower filesystem using a privileged kthread. It would be nice to resolve the latest issues, but it is best if the eCryptfs commits be reverted to the old behavior in the meantime. This reverts: `32001d6f` "eCryptfs: Flush file in vma close" `5be79de2` "eCryptfs: Flush dirty pages in setattr" `57db4e8d` "ecryptfs: modify write path to encrypt page in writepage" Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin King <colin.king@canonical.com> Cc: Colin King <colin.king@canonical.com> Cc: Thieu Le <thieule@google.com>	2012-07-13 16:46:06 -07:00
Christoph Hellwig	1632dcc93f	xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks xfs_bdstrat_cb only adds a check for a shutdown filesystem over xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down filesystem a little earlier. In addition the shutdown handling in xfs_bdstrat_cb is not very suitable for this caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 13:09:49 -05:00
Christoph Hellwig	40a9b7963d	xfs: prevent recursion in xfs_buf_iorequest If the b_iodone handler is run in calling context in xfs_buf_iorequest we can run into a recursion where xfs_buf_iodone_callbacks keeps calling back into xfs_buf_iorequest because an I/O error happened, which keeps calling back into xfs_buf_iorequest. This chain will usually not take long because the filesystem gets shut down because of log I/O errors, but even over a short time it can cause stack overflows if run on the same context. As a short term workaround make sure we always call the iodone handler in workqueue context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 13:09:39 -05:00
Dave Chinner	aa292847b9	xfs: don't defer metadata allocation to the workqueue Almost all metadata allocations come from shallow stack usage situations. Avoid the overhead of switching the allocation to a workqueue as we are not in danger of running out of stack when making these allocations. Metadata allocations are already marked through the args that are passed down, so this is trivial to do. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reported-by: Mel Gorman <mgorman@suse.de> Tested-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 13:09:27 -05:00
Dave Chinner	e3a746f5aa	xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near The current cursor is reallocated when retrying the allocation, so the existing cursor needs to be destroyed in both the restart and the failure cases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 13:09:18 -05:00
Linus Torvalds	4264e6a263	NFS client bugfixes for Linux 3.5 - Fix an NFSv4 mount regression - Fix O_DIRECT list manipulation snafus -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQADo8AAoJEGcL54qWCgDyu8wP/2entLv4jYgM7zp+/kOBhps6 RpAv57YYv86pLk5gxoZg/JJs3z4QzS3CjMRXK11CEzRMt1NUqJfI6587TIl11DYW 0UJcrAQo4X3EgNR7CCKAJaHUmYLayjUgquL8OjdBTuXCrpJzGdHcnBa71oF3xaEb jJhsA0W+qqBpa0295HH8sdHkURX1+46OdSd47+Dg+PZurkr0CRhjZo+DX+AkaSsU blM+pYUgu20NFaNUEfBphib3XMnnCGXc84g3gqjeOldRMijJGv1VcmhfsmJwmyLA 1mImwZ2HDzySVLrbA/D7yeddZfXuTaf8krFmyP07U6I7hIEXCAEr16DmqCQCF1UB ppzZM9lu8f6df9tIlmjdA/hGOfs5OSQzD1L9Irn8xpRIWYmg+mrxXSzIG6wZE8zu n1EURW5CrLrlz2d7rdhqA+Y/GIkAvIO0/iBbJnvkMUdMPsd9/ikFPkHGzMhsK1KN AxEi2r+n0e8Hwaueu0EP685QjrEjOyLDdKIsAzNy/UEGN4v297UoW8ZyYnrzEIJG mQg6l4ke34zuw3AxoR3X4SuW329rGpG7x0pXNwqJG292T9jki334dPzJCy1LbaGl o1g+Z0BcyMe9VC7tRwOFEiGsbcd/OYPRbVVhu/3RlrWuVvj46QDvVpQ7+1pOr2s/ 4Xu70AlKw3B02ge621+p =bX2K -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: - Fix an NFSv4 mount regression - Fix O_DIRECT list manipulation snafus * tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix an NFSv4 mount regression NFS: Fix list manipulation snafus in fs/nfs/direct.c	2012-07-13 10:58:45 -07:00
Christoph Hellwig	a2dcf5df5f	xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks xfs_bdstrat_cb only adds a check for a shutdown filesystem over xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down filesystem a little earlier. In addition the shutdown handling in xfs_bdstrat_cb is not very suitable for this caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 12:50:54 -05:00
Christoph Hellwig	08023d6dbe	xfs: prevent recursion in xfs_buf_iorequest If the b_iodone handler is run in calling context in xfs_buf_iorequest we can run into a recursion where xfs_buf_iodone_callbacks keeps calling back into xfs_buf_iorequest because an I/O error happened, which keeps calling back into xfs_buf_iorequest. This chain will usually not take long because the filesystem gets shut down because of log I/O errors, but even over a short time it can cause stack overflows if run on the same context. As a short term workaround make sure we always call the iodone handler in workqueue context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 12:50:42 -05:00
Dave Chinner	eb71a12e41	xfs: don't defer metadata allocation to the workqueue Almost all metadata allocations come from shallow stack usage situations. Avoid the overhead of switching the allocation to a workqueue as we are not in danger of running out of stack when making these allocations. Metadata allocations are already marked through the args that are passed down, so this is trivial to do. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reported-by: Mel Gorman <mgorman@suse.de> Tested-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 12:50:24 -05:00
Dave Jones	8d657eb3b4	Remove easily user-triggerable BUG from generic_setlease This can be trivially triggered from userspace by passing in something unexpected. kernel BUG at fs/locks.c:1468! invalid opcode: 0000 [#1] SMP RIP: 0010:generic_setlease+0xc2/0x100 Call Trace: __vfs_setlease+0x35/0x40 fcntl_setlease+0x76/0x150 sys_fcntl+0x1c6/0x810 system_call_fastpath+0x1a/0x1f Signed-off-by: Dave Jones <davej@redhat.com> Cc: stable@kernel.org # 3.2+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-13 10:50:23 -07:00
Dave Chinner	1f432a887e	xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near The current cursor is reallocated when retrying the allocation, so the existing cursor needs to be destroyed in both the restart and the failure cases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-07-13 12:47:58 -05:00
Jeff Moyer	91f68c89d8	block: fix infinite loop in __getblk_slow Commit `080399aaaf` ("block: don't mark buffers beyond end of disk as mapped") exposed a bug in __getblk_slow that causes mount to hang as it loops infinitely waiting for a buffer that lies beyond the end of the disk to become uptodate. The problem was initially reported by Torsten Hilbrich here: https://lkml.org/lkml/2012/6/18/54 and also reported independently here: http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511 and then Richard W.M. Jones and Marcos Mello noted a few separate bugzillas also associated with the same issue. This patch has been confirmed to fix: https://bugzilla.redhat.com/show_bug.cgi?id=835019 The main problem is here, in __getblk_slow: for (;;) { struct buffer_head * bh; int ret; bh = __find_get_block(bdev, block, size); if (bh) return bh; ret = grow_buffers(bdev, block, size); if (ret < 0) return NULL; if (ret == 0) free_more_memory(); } __find_get_block does not find the block, since it will not be marked as mapped, and so grow_buffers is called to fill in the buffers for the associated page. I believe the for (;;) loop is there primarily to retry in the case of memory pressure keeping grow_buffers from succeeding. However, we also continue to loop for other cases, like the block lying beond the end of the disk. So, the fix I came up with is to only loop when grow_buffers fails due to memory allocation issues (return value of 0). The attached patch was tested by myself, Torsten, and Rich, and was found to resolve the problem in call cases. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Josh Boyer <jwboyer@redhat.com> Cc: Stable <stable@vger.kernel.org> # 3.0+ [ Jens is on vacation, taking this directly - Linus ] -- Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-13 08:36:35 -07:00
Mathias Krause	0143fc5e9f	udf: avoid info leak on export For type 0x51 the udf.parent_partref member in struct fid gets copied uninitialized to userland. Fix this by initializing it to 0. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-13 11:21:21 +02:00
Mathias Krause	fe685aabf7	isofs: avoid info leak on export For type 1 the parent_offset member in struct isofs_fid gets copied uninitialized to userland. Fix this by initializing it to 0. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-13 11:21:21 +02:00
Liu Bo	10983f2e8d	Btrfs: fix typo in convert_extent_bit It should be convert_extent_bit. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-07-12 11:27:34 +02:00
Arne Jansen	6f72c7e20d	Btrfs: add qgroup inheritance When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:40 +02:00
Arne Jansen	5d13a37bd5	Btrfs: add qgroup ioctls Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:39 +02:00
Arne Jansen	c556723794	Btrfs: hooks to reserve qgroup space Like block reserves, reserve a small piece of space on each transaction start and for delalloc. These are the hooks that can actually return EDQUOT to the user. The amount of space reserved is tracked in the transaction handle. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:39 +02:00
Jan Schmidt	546adb0d81	Btrfs: hooks for qgroup to record delayed refs Hooks into qgroup code to record refs and into transaction commit. This is the main entry point for qgroup. Basically every change in extent backrefs got accounted to the appropriate qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-07-12 10:54:38 +02:00
Arne Jansen	bcef60f249	Btrfs: quota tree support and startup Init the quota tree along with the others on open_ctree and close_ctree. Add the quota tree to the list of well known trees in btrfs_read_fs_root_no_name. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-12 10:54:38 +02:00
Jan Schmidt	edf39272db	Btrfs: call the qgroup accounting functions Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-07-12 10:54:37 +02:00
Arne Jansen	bed92eae26	Btrfs: qgroup implementation and prototypes Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-07-12 10:54:21 +02:00
Steven J. Magnani	5d8ecbbc28	fat: fix non-atomic NFS i_pos read fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit accesses are not atomic. Make it use the same accessor as the rest of the FAT code. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-11 16:04:47 -07:00
Bob Liu	fea9f718b3	fs: ramfs: file-nommu: add SetPageUptodate() There is a bug in the below scenario for !CONFIG_MMU: 1. create a new file 2. mmap the file and write to it 3. read the file can't get the correct value Because sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page() which causes the page to be zeroed. Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that generic_file_aio_read() do not call simple_readpage(). Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-11 16:04:47 -07:00
Luis Henriques	a4e08d001f	ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space() As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL as the first parameter (file), it may trigger a NULL pointer dereferrence due to a missing check. Addresses http://bugs.launchpad.net/bugs/1006012 Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Reported-by: Bret Towe <magnade@gmail.com> Tested-by: Bret Towe <magnade@gmail.com> Cc: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-11 16:04:43 -07:00
Dong Aisheng	475d009429	of: Improve prom_update_property() function prom_update_property() currently fails if the property doesn't actually exist yet which isn't what we want. Change to add-or-update instead of update-only, then we can remove a lot duplicated lines. Suggested-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-11 15:26:51 +10:00
J. Bruce Fields	7f2e7dc0fd	nfsd: share some function prototypes Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-10 16:41:35 -04:00
J. Bruce Fields	d91d0b5690	nfsd: allow owner_override only for regular files We normally allow the owner of a file to override permissions checks on IO operations, since: - the client will take responsibility for doing an access check on open; - the permission checks offer no protection against malicious clients--if they can authenticate as the file's owner then they can always just change its permissions; - checking permission on each IO operation breaks the usual posix rule that permission is checked only on open. However, we've never allowed the owner to override permissions on readdir operations, even though the above logic would also apply to directories. I've never heard of this causing a problem, probably because a) simultaneously opening and creating a directory (with restricted mode) isn't possible, and b) opening a directory, then chmod'ing it, is rare. Our disallowal of owner-override on directories appears to be an accident, though--the readdir itself succeeds, and then we fail just because lookup_one_len() calls in our filldir methods fail. I'm not sure what the easiest fix for that would be. For now, just make this behavior obvious by denying the override right at the start. This also fixes some odd v4 behavior: with the rdattr_error attribute requested, it would perform the readdir but return an ACCES error with each entry. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-10 16:41:35 -04:00
J. Bruce Fields	74dbafaf5d	nfsd4: release openowners on free in >=4.1 case We don't need to keep openowners around in the >=4.1 case, because they aren't needed to handle CLOSE replays any more (that's a problem for sessions). And doing so causes unexpected failures on a subsequent destroy_clientid to fail. We probably also need something comparable for lock owners on last unlock. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-10 16:41:34 -04:00
J. Bruce Fields	2930d381d2	nfsd4: our filesystems are normally case sensitive Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-07-10 15:20:57 -04:00
Trond Myklebust	f1daf666dd	NFSv4: Fix an NFSv4 mount regression The helper nfs_fs_mount() will always call nfs4_try_mount with the mount_info->fill_super argument pointing to nfs_fill_super, which is NFSv2/v3 only. Fix is to have nfs4_try_mount replace it with nfs4_fill_super. The regression was introduced by commit `c40f8d1d` (NFS: Create a common fs_mount() function) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-10 13:25:39 -04:00
Jan Kara	57b9655d01	udf: Improve table length check to avoid possible overflow When a partition table length is corrupted to be close to 1 << 32, the check for its length may overflow on 32-bit systems and we will think the length is valid. Later on the kernel can crash trying to read beyond end of buffer. Fix the check to avoid possible overflow. CC: stable@vger.kernel.org Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-10 18:02:17 +02:00
Arne Jansen	709c0486b9	Btrfs: Test code to change the order of delayed-ref processing Normally delayed refs get processed in ascending bytenr order. This correlates in most cases to the order added. To expose dependencies on this order, we start to process the tree in the middle instead of the beginning. This code is only effective when SCRAMBLE_DELAYED_REFS is defined. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:44 +02:00
Arne Jansen	416ac51da9	Btrfs: qgroup state and initialization Add state to fs_info. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:44 +02:00
Arne Jansen	20897f5c86	Btrfs: added helper to create new trees This creates a brand new tree. Will be used to create the quota tree. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:43 +02:00
Arne Jansen	d13603ef6e	Btrfs: check the root passed to btrfs_end_transaction This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:43 +02:00
Arne Jansen	2f38b3e190	Btrfs: add helper for tree enumeration Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:42 +02:00
Arne Jansen	630dc772ea	Btrfs: qgroup on-disk format Not all features are in use by the current version and thus may change in the future. Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-07-10 15:14:42 +02:00
Jan Schmidt	097b8a7c9e	Btrfs: join tree mod log code with the code holding back delayed refs We've got two mechanisms both required for reliable backref resolving (tree mod log and holding back delayed refs). You cannot make use of one without the other. So instead of requiring the user of this mechanism to setup both correctly, we join them into a single interface. Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list as we did before, which was of no value. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-07-10 15:14:41 +02:00
Jan Schmidt	cf5388307a	Btrfs: fix buffer leak in btrfs_next_old_leaf When calling btrfs_next_old_leaf, we were leaking an extent buffer in the rare case of using the deadlock avoidance code needed for the tree mod log. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-07-10 15:14:41 +02:00
Jan Kara	44f4f729e7	ext3: Check return value of blkdev_issue_flush() blkdev_issue_flush() can fail. Make sure the error gets properly propagated. Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 23:40:46 +02:00
Jan Kara	349ecd6a3c	jbd: Check return value of blkdev_issue_flush() blkdev_issue_flush() can fail. Make sure the error gets properly propagated. Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 23:38:36 +02:00
Zheng Liu	729f52c6be	ext4: add a new nolock flag in ext4_map_blocks EXT4_GET_BLOCKS_NO_LOCK flag is added to indicate that we don't need to acquire i_data_sem lock in ext4_map_blocks. Meanwhile, it changes ext4_get_block() to not start a new journal because when we do a overwrite dio, there is no any metadata that needs to be modified. We define a new function called ext4_get_block_write_nolock, which is used in dio overwrite nolock. In this function, it doesn't try to acquire i_data_sem lock and doesn't start a new journal as it does a lookup. CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> CC: Robin Dong <hao.bigrat@gmail.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-09 16:29:29 -04:00
Zheng Liu	fbe104942d	ext4: split ext4_file_write into buffered IO and direct IO ext4_file_dio_write is defined in order to split buffered IO and direct IO in ext4. This patch just refactor some stuff in write path. CC: Tao Ma <tm@tao.ma> CC: Eric Sandeen <sandeen@redhat.com> CC: Robin Dong <hao.bigrat@gmail.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-09 16:29:29 -04:00
Haibo Liu	62a1391ddd	ext4: remove an unused statement in ext4_mb_get_buddy_page_lock() In this patch, the statement "poff = block % blocks_per_page" in ext4_mb_get_buddy_page_lock has no effect. It will be optimized out by the compiler, but it's better to remove it. Signed-off-by: Haibo Liu <HaiboLiu6@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-09 16:29:28 -04:00
HaiboLiu	e7bcf82304	ext4: fix out-of-date comments in extents.c In this patch, ext4_ext_try_to_merge has been change to merge an extent both left and right. So we need to update the comment in here. Signed-off-by: HaiboLiu <HaiboLiu6@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-07-09 16:29:28 -04:00
Tao Ma	41eb70dde4	ext4: use s_csum_seed instead of i_csum_seed for xattr block In xattr block operation, we use h_refcount to indicate whether the xattr block is shared among many inodes. And xattr block csum uses s_csum_seed if it is shared and i_csum_seed if it belongs to one inode. But this has a problem. So consider the block is shared first bewteen inode A and B, and B has some xattr update and CoW the xattr block. When it updates the old xattr block(because of the h_refcount change) and calls ext4_xattr_release_block, we has no idea that inode A is the real owner of the old xattr block and we can't use the i_csum_seed of inode A either in xattr block csum calculation. And I don't think we have an easy way to find inode A. So this patch just removes the tricky i_csum_seed and we now uses s_csum_seed every time for the xattr block csum. The corresponding patch for the e2fsprogs will be sent in another patch. This is spotted by xfstests 117. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <djwong@us.ibm.com>	2012-07-09 16:29:27 -04:00
Tao Ma	ef58f69c3c	ext4: use proper csum calculation in ext4_rename In ext4_rename, when the old name is a dir, we need to change ".." to its new parent and journal the change, so with metadata_csum enabled, we have to re-calc the csum. As the first block of the dir can be either a htree root or a normal directory block and we have different csum calculation for these 2 types, we have to choose the right one in ext4_rename. btw, it is found by xfstests 013. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <djwong@us.ibm.com>	2012-07-09 16:29:05 -04:00
Theodore Ts'o	952fc18ef9	ext4: fix overhead calculation used by ext4_statfs() Commit `f975d6bcc7` introduced bug which caused ext4_statfs() to miscalculate the number of file system overhead blocks. This causes the f_blocks field in the statfs structure to be larger than it should be. This would in turn cause the "df" output to show the number of data blocks in the file system and the number of data blocks used to be larger than they should be. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-07-09 16:27:05 -04:00
Jan Kara	17dc59ba41	udf: Do not decrement i_blocks when freeing indirect extent block Indirect extent block is not accounted in i_blocks during allocation thus we should not decrement i_blocks when we are freeing such block during truncation. Reported-by: Steve Nickel <snickel58@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 13:24:21 +02:00
Jan Kara	bff943af6f	udf: Fix memory leak when mounting When we are mounting filesystem, we can load one partition table before finding out that we cannot complete processing of logical volume descriptor and trying the reserve descriptor. Free the table properly before trying the reserve descriptor. Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:12 +02:00
Wanlong Gao	e124a32043	ext2: cleanup the confused goto label Cleanup the confused goto label, since the big lock has been removed. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:12 +02:00
Ashish Sangwan	a0e589b485	UDF: Remove unnecessary variable "offset" from udf_fill_inode The variable "offset" is not needed. Remove it. Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:12 +02:00
Artem Bityutskiy	db8109ef98	udf: stop using s_dirt The UDF file-system does not need the 's_dirt' superblock flag because it does not define the 'write_super()' method. This flag was set to 1 in few places and set to 0 in '->sync_fs()' and was basically useless. Stop using it because it is on its way out. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:11 +02:00
Eric Sandeen	f007dbf8e5	ext3: force ro mount if ext3_setup_super() fails If ext3_setup_super() fails i.e. due to a too-high revision, the error is logged in dmesg but the fs is not mounted RO as indicated. Tested by: [164152.114551] EXT3-fs (sdb6): error: revision level too high, forcing read-only mode /dev/sdb6 /mnt/test2 ext3 rw,seclabel,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0 ^^ Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:11 +02:00
Jeff Liu	f3da93105b	quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h> checkpatch.pl warns: "WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>" Below patch fixes it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2012-07-09 12:03:11 +02:00
Tyler Hicks	e3ccaa9761	eCryptfs: Initialize empty lower files when opening them Historically, eCryptfs has only initialized lower files in the ecryptfs_create() path. Lower file initialization is the act of writing the cryptographic metadata from the inode's crypt_stat to the header of the file. The ecryptfs_open() path already expects that metadata to be in the header of the file. A number of users have reported empty lower files in beneath their eCryptfs mounts. Most of the causes for those empty files being left around have been addressed, but the presence of empty files causes problems due to the lack of proper cryptographic metadata. To transparently solve this problem, this patch initializes empty lower files in the ecryptfs_open() error path. If the metadata is unreadable due to the lower inode size being 0, plaintext passthrough support is not in use, and the metadata is stored in the header of the file (as opposed to the user.ecryptfs extended attribute), the lower file will be initialized. The number of nested conditionals in ecryptfs_open() was getting out of hand, so a helper function was created. To avoid the same nested conditional problem, the conditional logic was reversed inside of the helper function. https://launchpad.net/bugs/911507 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com>	2012-07-08 12:51:45 -05:00
Tyler Hicks	8bc2d3cf61	eCryptfs: Unlink lower inode when ecryptfs_create() fails ecryptfs_create() creates a lower inode, allocates an eCryptfs inode, initializes the eCryptfs inode and cryptographic metadata attached to the inode, and then writes the metadata to the header of the file. If an error was to occur after the lower inode was created, an empty lower file would be left in the lower filesystem. This is a problem because ecryptfs_open() refuses to open any lower files which do not have the appropriate metadata in the file header. This patch properly unlinks the lower inode when an error occurs in the later stages of ecryptfs_create(), reducing the chance that an empty lower file will be left in the lower filesystem. https://launchpad.net/bugs/872905 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com>	2012-07-08 12:51:44 -05:00
Tyler Hicks	2ecaf55db6	eCryptfs: Make all miscdev functions use daemon ptr in file private_data Now that a pointer to a valid struct ecryptfs_daemon is stored in the private_data of an opened /dev/ecryptfs file, the remaining miscdev functions can utilize the pointer rather than looking up the ecryptfs_daemon at the beginning of each operation. The security model of /dev/ecryptfs is simplified a little bit with this patch. Upon opening /dev/ecryptfs, a per-user ecryptfs_daemon is registered. Another daemon cannot be registered for that user until the last file reference is released. During the lifetime of the ecryptfs_daemon, access checks are not performed on the /dev/ecryptfs operations because it is assumed that the application securely handles the opened file descriptor and does not unintentionally leak it to processes that are not trusted. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Sasha Levin <levinsasha928@gmail.com>	2012-07-08 12:51:44 -05:00
Tyler Hicks	5669688665	eCryptfs: Remove unused messaging declarations and function These are no longer needed. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Sasha Levin <levinsasha928@gmail.com>	2012-07-08 12:51:43 -05:00

... 5 6 7 8 9 ...

28276 Commits