linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Coly Li	3a05d7961e	ocfs2: explicit declare uninitialized var in user_cluster_connect() This patch explicitly declares an uninitialized local variable in user_cluster_connect(), to remove a compiling warning. Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-17 20:55:52 -08:00
Jeff Liu	f5befbbbf9	ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock() osb->s_mount_opt has already been checked against OCFS2_MOUNT_POSIX_ACL_CHECK before calling ocfs2_get_acl_nolock() in ocfs2_init_acl() && ocfs2_get_acl(), so remove it. Signed-off-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-17 03:19:44 -08:00
Christoph Hellwig	2cfd30adf6	ocfs: stop using do_sync_mapping_range do_sync_mapping_range(..., SYNC_FILE_RANGE_WRITE) is a very awkward way to perform a filemap_fdatawrite_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:49 -05:00
Christoph Hellwig	1e431f5ce7	cleanup blockdev_direct_IO locking Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:49 -05:00
Christoph Hellwig	431547b3c4	sanitize xattr handler prototypes Add a flags argument to struct xattr_handler and pass it to all xattr handler methods. This allows using the same methods for multiple handlers, e.g. for the ACL methods which perform exactly the same action for the access and default ACLs, just using a different underlying attribute. With a little more groundwork it'll also allow sharing the methods for the regular user/trusted/secure handlers in extN, ocfs2 and jffs2 like it's already done for xfs in this patch. Also change the inode argument to the handlers to a dentry to allow using the handlers mechnism for filesystems that require it later, e.g. cifs. [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:49 -05:00
Jan Kara	3067393005	quota: Move definition of QFMT_OCFS2 to linux/quota.h Move definition of this constant to linux/quota.h so that it cannot clash with other format IDs. CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:53 +01:00
Alexey Dobriyan	1472da5fdc	const: struct quota_format_ops Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:51 +01:00
Christoph Hellwig	6b2f3d1f76	vfs: Implement proper O_SYNC semantics While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Linus Torvalds	d7fc02c7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c	2009-12-08 07:55:01 -08:00
Linus Torvalds	1557d33007	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...	2009-12-08 07:38:50 -08:00
Jiri Kosina	d014d04386	Merge branch 'for-next' into for-linus Conflicts: kernel/irq/chip.c	2009-12-07 18:36:35 +01:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Uwe Kleine-König	bf48aabb89	tree-wide: fix typos "offest" -> "offset" This patch was generated by git grep -E -i -l 'offest' \| xargs -r perl -p -i -e 's/offest/offset/' Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:50 +01:00
Tiger Yang	aad1b15310	ocfs2: return -EAGAIN instead of EAGAIN in dlm We used to return positive EAGAIN to indicate a retry action is needed in dlm_begin_reco_handler(). Now we return negative -EAGAIN to erase the confusion caused by this error code. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-02 16:49:28 -08:00
Sunil Mushran	f6656d26d1	ocfs2/cluster: Make fence method configurable - v2 By default, o2cb fences the box by calling emergency_restart(). While this scheme works well in production, it comes in the way during testing as it does not let the tester take stack/core dumps for analysis. This patch allows user to dynamically change the fence method to panic() by: # echo "panic" > /sys/kernel/config/cluster/<clustername>/fence_method Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-02 16:49:26 -08:00
Tao Ma	12d4cec988	ocfs2: refcounttree.c cleanup. sparse check finds some endian problem and some other minor issues. There is an obsolete function which should be removed. So this patch resolve all these. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-02 16:15:01 -08:00
Tao Ma	38a04e4327	ocfs2: Find proper end cpos for a leaf refcount block. ocfs2 refcount tree is stored as an extent tree while the leaf ocfs2_refcount_rec points to a refcount block. The following step can trip a kernel panic. mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE mount -t ocfs2 $DEVICE $MNT_DIR FILE_NAME=$RANDOM FILE_NAME_1=$RANDOM FILE_REF="${FILE_NAME}_ref" FILE_REF_1="${FILE_NAME}_ref_1" for((i=0;i<305;i++)) do # /mnt/1048576 is a file with 1048576 sizes. cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done for((i=0;i<3;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME done for((i=0;i<2;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME for((i=0;i<11;i++)) do cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1 done reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF # write_f is a program which will write some bytes to a file at offset. # write_f -f file_name -l offset -w write_bytes. ./write_f -f $MNT_DIR/$FILE_REF -l $[3101048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_REF -l $[3061048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_REF -l $[3111048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_NAME -l $[3101048576] -w 4096 ./write_f -f $MNT_DIR/$FILE_NAME -l $[3111048576] -w 4096 reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1 ./write_f -f $MNT_DIR/$FILE_NAME -l $[3111048576] -w 4096 #kernel panic here. The reason is that if the ocfs2_extent_rec is the last record in a leaf extent block, the old solution fails to find the suitable end cpos. So this patch try to walk through the b-tree, find the next sub root and get the c_pos the next sub-tree starts from. btw, I have runned tristan's test case against the patched kernel for several days and this type of kernel panic never happens again. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-12-02 16:14:57 -08:00
David S. Miller	ff9c38bba3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c	2009-12-01 22:13:38 -08:00
Eric W. Biederman	6d4561110a	sysctl: Drop & in front of every proc_handler. For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-18 08:37:40 -08:00
Sunil Mushran	7aee47b0bb	ocfs2: Trivial cleanup of jbd compatibility layer removal Mainline commit `53ef99cad9` removed the JBD compatibility layer from OCFS2. This patch removes the last remaining remnants of that. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-11-13 15:45:05 -08:00
Eric W. Biederman	ab09203e30	sysctl fs: Remove dead binary sysctl support Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name and .strategy members of sysctl tables are dead code. Remove them. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-12 02:04:55 -08:00
Coly Li	837711f862	ocfs2: return f_fsid info in ocfs2_statfs() Currently the f_fsid of struct kstatfs returned from ocfs2_statfs() is undefined (vfs layer fills in 0 as default). Since in some conditions, f_fsid value might be used in a (f_fsid, ino) pair to uniquely identify a file, ocfs2 should return a unique defined f_fsid value from ocfs2_statfs(). Because uuid_str is the same on big or litlle endian machine, it's endian consistent to use osb->uuid_str to generate f_fsid value. Signed-off-by: Coly Li <coly.li@suse.de> Cc: Sunil Mushran <sunil.mushran@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-10-29 15:02:20 -07:00
Jan Kara	57b09bb5e4	ocfs2: Set MS_POSIXACL on remount We have to set MS_POSIXACL on remount as well. Otherwise VFS would not know we started supporting ACLs after remount and thus ACLs would not work. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-10-28 23:06:37 -07:00
Jan Kara	5297aad80c	ocfs2: Make acl use the default Change acl mount options handling to match the one of XFS and BTRFS and hopefully it is also easier to use now. When admin does not specify any acl mount option, acls are enabled if and only if the filesystem has xattr feature enabled. If admin specifies 'acl' mount option, we fail the mount if the filesystem does not have xattr feature and thus acls cannot be enabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-10-28 23:06:32 -07:00
Jan Kara	e6aabe0cac	ocfs2: Always include ACL support To become consistent with filesystems such as XFS or BTRFS, make posix ACLs always available. This also reduces possibility of misconfiguration on admin's side. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-10-28 23:05:57 -07:00
Tao Ma	2f48d593b6	ocfs2: duplicate inline data properly during reflink. The old reflink fails to handle inodes with inline data and will oops if it encounters them. This patch copies inline data to the new inode. Extended attributes may still be refcounted. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>	2009-10-28 22:48:23 -07:00
Tao Ma	87f4b1bb98	ocfs2: Move ocfs2_complete_reflink to the right place. As its name ocfs2_complete_reflink indicates, it should be called after all the work for reflink is done, so it really should be called after we reflink xattr successfully. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com>	2009-10-28 22:44:19 -07:00
Joel Becker	fb5cbe9efd	ocfs2: Return -EINVAL when a device is not ocfs2. In case of non-modular kernels the root filesystem is mounted by trying several filesystems. If ocfs2 was tried before the actual filesystem type, the mount would fail because ocfs2_sb_probe() returns -EAGAIN instead of -EINVAL. ocfs2 will now return -EINVAL properly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Reported-by: Laszlo Attila Toth <panther@balabit.hu>	2009-10-28 22:28:24 -07:00
Eric Dumazet	c720c7e838	inet: rename some inet_sock fields In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-18 18:52:53 -07:00
Alexey Dobriyan	828c09509b	const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-01 16:11:11 -07:00
Alexey Dobriyan	f0f37e2f77	const: mark struct vm_struct_operations * mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-27 11:39:25 -07:00
Linus Torvalds	db16826367	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...	2009-09-24 07:53:22 -07:00
Alexey Dobriyan	2bcd57ab61	headers: utsname.h redux * remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 18:13:10 -07:00
Linus Torvalds	b64ada6b23	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (85 commits) ocfs2: Use buffer IO if we are appending a file. ocfs2: add spinlock protection when dealing with lockres->purge. dlmglue.c: add missed mlog lines ocfs2: __ocfs2_abort() should not enable panic for local mounts ocfs2: Add ioctl for reflink. ocfs2: Enable refcount tree support. ocfs2: Implement ocfs2_reflink. ocfs2: Add preserve to reflink. ocfs2: Create reflinked file in orphan dir. ocfs2: Use proper parameter for some inode operation. ocfs2: Make transaction extend more efficient. ocfs2: Don't merge in 1st refcount ops of reflink. ocfs2: Modify removing xattr process for refcount. ocfs2: Add reflink support for xattr. ocfs2: Create an xattr indexed block if needed. ocfs2: Call refcount tree remove process properly. ocfs2: Attach xattr clusters to refcount tree. ocfs2: Abstract ocfs2 xattr tree extend rec iteration process. ocfs2: Abstract the creation of xattr block. ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value. ...	2009-09-23 09:29:20 -07:00
James Morris	88e9d34c72	seq_file: constify seq_operations Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:29 -07:00
Tao Ma	b80474b432	ocfs2: Use buffer IO if we are appending a file. In ocfs2_file_aio_write, we will prevent direct io if we find that we are appending(changing i_size) and call generic_file_aio_write_nolock. But actually O_DIRECT flag is there and this function will call generic_file_direct_write eventually which will update i_size and leave di->i_size alone. The bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1173. So this patch let ocfs2_direct_IO returns 0 directly if we are appending so that buffered write will be called and di->i_size get updated successfully. And this is also what we want in ocfs2_file_aio_write. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-23 01:54:49 -07:00
Wengang Wang	83e32d9044	ocfs2: add spinlock protection when dealing with lockres->purge. when we check/modify lockres->purge, we should with the protection of lockres->spinlock. in dlm_purge_lockres(), the checking/modifying is not with the protectin. this patch fixes it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-23 01:54:48 -07:00
Coly Li	d92bc5127b	dlmglue.c: add missed mlog lines This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines return. Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-23 01:54:47 -07:00
Sunil Mushran	a2f2ddbf2b	ocfs2: __ocfs2_abort() should not enable panic for local mounts In a clustered setup, we have to panic the box on journal abort. This is because we don't have the facility to go hard readonly. With hard ro, another node would detect node failure and initiate recovery. Having said that, we shouldn't force panic if the volume is mounted locally. This patch defers the handling to the mount option, errors. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-23 01:54:46 -07:00
Tao Ma	bd50873dc7	ocfs2: Add ioctl for reflink. The ioctl will take 3 parameters: old_path, new_path and preserve and call vfs_reflink. It is useful when we backport reflink features to old kernels. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:51 -07:00
Tao Ma	64871b8d62	ocfs2: Enable refcount tree support. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:50 -07:00
Tao Ma	09bf27a000	ocfs2: Implement ocfs2_reflink. Implement ocfs2_reflink. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:49 -07:00
Tao Ma	0fe9b66c65	ocfs2: Add preserve to reflink. reflink has 2 options for the destination file: 1. snapshot: reflink will attempt to preserve ownership, permissions, and all other security state in order to create a full snapshot. 2. new file: it will acquire the data extent sharing but will see the file's security state and attributes initialized as a new file. So add the option to ocfs2. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:49 -07:00
Tao Ma	bc13d34757	ocfs2: Create reflinked file in orphan dir. reflink is a very complicated process, so it can't be integrated into one transaction. So if the system panic in the operation, we may leave a unfinished inode in the destication directory. So we will try to create an inode in orphan_dir first, reflink it to the src file and then move it to the destication file in the end. In that way we won't be afraid of any corruption during the reflink. This patch adds 2 functions for orphan_dir operation: 1. Create a new inode in orphand dir. 2. Move an inode to a target dir. Note: fsck.ocfs2 should work for us to remove the unfinished file in the orphan_dir. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:48 -07:00
Tao Ma	19bd341f6a	ocfs2: Use proper parameter for some inode operation. In order to make the original function more suitable for reflink, we modify the following inode operations. Both are tiny. 1. ocfs2_mknod_locked only use dentry for mlog, so move it to the caller so that reflink can use it without dentry. 2. ocfs2_prepare_orphan_dir only want inode to get its ip_blkno. So use ip_blkno instead. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:47 -07:00
Tao Ma	c18b812d12	ocfs2: Make transaction extend more efficient. In ocfs2_extend_rotate_transaction, op_credits is the orignal credits in the handle and we only want to extend the credits for the rotation, but the old solution always double it. It is harmless for some minor operations, but for actions like reflink we may rotate tree many times and cause the credits increase dramatically. So this patch try to only increase the desired credits. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:46 -07:00
Tao Ma	7540c1a77b	ocfs2: Don't merge in 1st refcount ops of reflink. Actually the whole reflink will touch refcount tree 2 times: 1. It will add the clusters in the extent record to the tree if it isn't refcounted before. 2. It will add 1 refcount to these clusters when it add these extent records to the tree. So actually we shouldn't do merge in the 1st operation since the 2nd one will soon be called and we may have to split it again. Do a merge first and split soon is a waste of time. So we only merge in the 2nd round. This is done by adding a new internal __ocfs2_increase_refcount and call it with "not-merge" for 1st refcount operation in reflink. This also has a side-effect that we don't need to worry too much about the metadata allocation in the 2nd round since it will only merge and no split will happen for those records. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:46 -07:00
Tao Ma	ce9c5a54c0	ocfs2: Modify removing xattr process for refcount. The old xattr value remove is quite simple, it just erase the tree and free the clusters. But as we have added refcount support, The process is a little complicated. We have to lock the refcount tree at the beginning, what's more, we may split the refcount tree in some cases, so meta/credits are needed. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:45 -07:00
Tao Ma	2999d12f4d	ocfs2: Add reflink support for xattr. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:45 -07:00
Tao Ma	a7fe7a3a1a	ocfs2: Create an xattr indexed block if needed. With reflink, there is a need that we create a new xattr indexed block from the very beginning. So add a new parameter for ocfs2_create_xattr_block. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:44 -07:00
Tao Ma	8b2c0dba51	ocfs2: Call refcount tree remove process properly. Now with xattr refcount support, we need to check whether we have xattr refcounted before we remove the refcount tree. Now the mechanism is: 1) Check whether i_clusters == 0, if no, exit. 2) check whether we have i_xattr_loc in dinode. if yes, exit. 2) Check whether we have inline xattr stored outside, if yes, exit. 4) Remove the tree. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:44 -07:00
Tao Ma	0129241e2b	ocfs2: Attach xattr clusters to refcount tree. In ocfs2, when xattr's value is larger than OCFS2_XATTR_INLINE_SIZE, it will be kept outside of the blocks we store xattr entry. And they are stored in a b-tree also. So this patch try to attach all these clusters to refcount tree also. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:43 -07:00
Tao Ma	47bca4950b	ocfs2: Abstract ocfs2 xattr tree extend rec iteration process. Currently we have ocfs2_iterate_xattr_buckets which can receive a para and a callback to iterate a series of bucket. It is good. But actually the 2 callers ocfs2_xattr_tree_list_index_block and ocfs2_delete_xattr_index_block are almost the same. The only difference is that the latter need to handle the extent record also. So add a new function named ocfs2_iterate_xattr_index_block. It can be given func callback which are used for exten record. So now we only have one iteration function for the xattr index block. Ane what's more, it is useful for our future reflink operations. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:43 -07:00
Tao Ma	5aea1f0ef4	ocfs2: Abstract the creation of xattr block. In xattr reflink, we also need to create xattr block, so abstract the process out. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:42 -07:00
Tao Ma	fd68a894fc	ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value. In ocfs2_xattr_bucket_get_name_value, actually we only use super_block. So use it. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:41 -07:00
Tao Ma	492a8a33e1	ocfs2: Add CoW support for xattr. In order to make 2 transcation(xattr and cow) independent with each other, we CoW the whole xattr out in case we are setting them. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:41 -07:00
Tao Ma	913580b4cd	ocfs2: Abstract duplicate clusters process in CoW. We currently use pagecache to duplicate clusters in CoW, but it isn't suitable for xattr case. So abstract it out so that the caller can decide which method it use. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:40 -07:00
Tao Ma	1061f9c1c9	ocfs2: Return extent flags for xattr value tree. With the new refcount tree, xattr value can also be refcounted among multiple files. So return the appropriate extent flags so that CoW can used it later. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:39 -07:00
Tao Ma	a9063ab9a3	ocfs2: handle file attributes issue for reflink. A reflink creates a snapshot of a file, that means the attributes must be identical except for three exceptions - nlink, ino, and ctime. As for time changes, Here is a brief description: 1. Source file: 1) atime: Ignore. Let the lazy atime code handle that. 2) mtime: don't touch. 3) ctime: If we change the tree (adding REFCOUNTED to at least one extent), update it. 2. Destination file: 1) atime: ignore. 2) mtime: we want it to appear identical to the source. 3) ctime: update. The idea here is that an ls -l will show the same time for the src and target - it shows mtime. Backup software like rsync and tar will treat the new file correctly too. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:39 -07:00
Tao Ma	110a045aca	ocfs2: Add normal functions for reflink a normal file's extents. 2 major functions are added in this patch. ocfs2_attach_refcount_tree will create a new refcount tree to the old file if it doesn't have one and insert all the extent records to the tree if they are not refcounted. ocfs2_create_reflink_node will: 1. set the refcount tree to the new file. 2. call ocfs2_duplicate_extent_list which will iterate all the extents for the old file, insert it to the new file and increase the corresponding referennce count. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:38 -07:00
Tao Ma	37f8a2bfaa	ocfs2: CoW a reflinked cluster when it is truncated. When we truncate a file to a specific size which resides in a reflinked cluster, we need to CoW it since ocfs2_zero_range_for_truncate will zero the space after the size(just another type of write). So we add a "max_cpos" in ocfs2_refcount_cow so that it will stop when it hit the max cluster offset. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:38 -07:00
Tao Ma	293b2f70b4	ocfs2: Integrate CoW in file write. When we use mmap, we CoW the refcountd clusters in ocfs2_write_begin_nolock. While for normal file io(including directio), we do CoW in ocfs2_prepare_inode_for_write. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:37 -07:00
Tao Ma	6ae23c5555	ocfs2: CoW refcount tree improvement. During CoW, if the old extent record is refcounted, we allocate som new clusters and do CoW. Actually we can have some improvement here. If the old extent has refcount=1, that means now it is only used by this file. So we don't need to allocate new clusters, just remove the refcounted flag and it is OK. We also have to remove it from the refcount tree while not deleting it. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:36 -07:00
Tao Ma	6f70fa5199	ocfs2: Add CoW support. This patch try CoW support for a refcounted record. the whole process will be: 1. Calculate how many clusters we need to CoW and where we start. Extents that are not completely encompassed by the write will be broken on 1MB boundaries. 2. Do CoW for the clusters with the help of page cache. 3. Change the b-tree structure with the new allocated clusters. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:36 -07:00
Tao Ma	bcbbb24a6a	ocfs2: Decrement refcount when truncating refcounted extents. Add 'Decrement refcount for delete' in to the normal truncate process. So for a refcounted extent record, call refcount rec decrementation instead of cluster free. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:35 -07:00
Tao Ma	1aa75fea64	ocfs2: Add functions for extents refcounted. Add function ocfs2_mark_extent_refcounted which can mark an extent refcounted. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:34 -07:00
Tao Ma	1823cb0b9f	ocfs2: Add support of decrementing refcount for delete. Given a physical cpos and length, decrement the refcount in the tree. If the refcount for any portion of the extent goes to zero, that portion is queued for freeing. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:33 -07:00
Tao Ma	e73a819db9	ocfs2: Add support for incrementing refcount in the tree. Given a physical cpos and length, increment the refcount in the tree. If the extent has not been seen before, a refcount record is created for it. Refcount records may be merged or split by this operation. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:33 -07:00
Tao Ma	e2e9f6082b	ocfs2: move tree path functions to alloc.h. Now fs/ocfs2/alloc.c has more than 7000 lines. It contains our basic b-tree operation. Although we have already make our b-tree operation generic, the basic structrue ocfs2_path which is used to iterate one b-tree branch is still static and limited to only used in alloc.c. As refcount tree need them and I don't want to add any more b-tree unrelated code to alloc.c, export them out. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:32 -07:00
Tao Ma	fe92441595	ocfs2: Add refcount b-tree as a new extent tree. Add refcount b-tree as a new extent tree so that it can use the b-tree to store and maniuplate ocfs2_refcount_rec. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:31 -07:00
Tao Ma	555936bfcb	ocfs2: Abstract extent split process. ocfs2_mark_extent_written actually does the following things: 1. check the parameters. 2. initialize the left_path and split_rec. 3. call __ocfs2_mark_extent_written. it will do: 1) check the flags of unwritten 2) do the real split work. The whole process is packed tightly somehow. So this patch will abstract 2 different functions so that future b-tree operation can work with it. 1. __ocfs2_split_extent will accept path and split_rec and do the real split work. 2. ocfs2_change_extent_flag will accept a new flag and initialize path and split_rec. So now ocfs2_mark_extent_written will do: 1. check the parameters. 2. call ocfs2_change_extent_flag. 1) initalize the left_path and split_rec. 2) check whether the new flags conflict with the old one. 3) call __ocfs2_split_extent to do the split. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:31 -07:00
Tao Ma	853a3a1439	ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree. Add a new operation eo_ocfs2_extent_contig int the extent tree's operations vector. So that with the new refcount tree, We want this so that refcount trees can always return CONTIG_NONE and prevent extent merging. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:30 -07:00
Tao Ma	8bf396de98	ocfs2: Basic tree root operation. Add basic refcount tree root operation. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:30 -07:00
Tao Ma	374a263e79	ocfs2: Add refcount tree lock mechanism. Implement locking around struct ocfs2_refcount_tree. This protects all read/write operations on refcount trees. ocfs2_refcount_tree has its own lock and its own caching_info, protecting buffers among multiple nodes. User must call ocfs2_lock_refcount_tree before his operation on the tree and unlock it after that. ocfs2_refcount_trees are referenced by the block number of the refcount tree root block, So we create an rb-tree on the ocfs2_super to look them up. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:29 -07:00
Tao Ma	c732eb16bf	ocfs2: Add caching info for refcount tree. refcount tree should use its own caching info so that when we downconvert the refcount tree lock, we can drop all the cached buffer head. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:28 -07:00
Tao Ma	8dec98edfe	ocfs2: Add new refcount tree lock resource in dlmglue. refcount tree lock resource is used to protect refcount tree read/write among multiple nodes. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:28 -07:00
Tao Ma	a433848132	ocfs2: Abstract caching info checkpoint. In meta downconvert, we need to checkpoint the metadata in an inode. For refcount tree, we also need it. So abstract the process out. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:27 -07:00
Tao Ma	f2c870e3b1	ocfs2: Add ocfs2_read_refcount_block. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:26 -07:00
Tao Ma	93c97087a6	ocfs2: Add metaecc for ocfs2_refcount_block. Add metaecc and journal trigger for ocfs2_refcount_block. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:26 -07:00
Tao Ma	721f69c404	ocfs2: Define refcount tree structure. Signed-off-by: Tao Ma <tao.ma@oracle.com>	2009-09-22 20:09:25 -07:00
Linus Torvalds	342ff1a1b5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) trivial: fix typo in aic7xxx comment trivial: fix comment typo in drivers/ata/pata_hpt37x.c trivial: typo in kernel-parameters.txt trivial: fix typo in tracing documentation trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c trivial: remove unnecessary semicolons trivial: Fix duplicated word "options" in comment trivial: kbuild: remove extraneous blank line after declaration of usage() trivial: improve help text for mm debug config options trivial: doc: hpfall: accept disk device to unload as argument trivial: doc: hpfall: reduce risk that hpfall can do harm trivial: SubmittingPatches: Fix reference to renumbered step trivial: fix typos "man[ae]g?ment" -> "management" trivial: media/video/cx88: add __init/__exit macros to cx88 drivers trivial: fix typo in CONFIG_DEBUG_FS in gcov doc trivial: fix missing printk space in amd_k7_smp_check trivial: fix typo s/ketymap/keymap/ in comment trivial: fix typo "to to" in multiple files trivial: fix typos in comments s/DGBU/DBGU/ ...	2009-09-22 07:51:45 -07:00
Alexey Dobriyan	0d54b217a2	const: make struct super_block::s_qcop const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	61e225dc34	const: make struct super_block::dq_op const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Joe Perches	a419aef8b8	trivial: remove unnecessary semicolons Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-09-21 15:14:58 +02:00
Andi Kleen	aa261f549d	HWPOISON: Enable .remove_error_page for migration aware file systems Enable removing of corrupted pages through truncation for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs These should cover most server needs. I chose the set of migration aware file systems for this for now, assuming they have been especially audited. But in general it should be safe for all file systems on the data area that support read/write and truncate. Caveat: the hardware error handler does not take i_mutex for now before calling the truncate function. Is that ok? Cc: tytso@mit.edu Cc: hch@infradead.org Cc: mfasheh@suse.com Cc: aia21@cantab.net Cc: hugh.dickins@tiscali.co.uk Cc: swhiteho@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com>	2009-09-16 11:50:16 +02:00
Jan Kara	d23c937b0f	ocfs2: Update syncing after splicing to match generic version Update ocfs2 specific splicing code to use generic syncing helper. The sync now does not happen under rw_lock because generic_write_sync() acquires i_mutex which ranks above rw_lock. That should not matter because standard fsync path does not hold it either. Acked-by: Joel Becker <Joel.Becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Jan Kara	918941a3f3	ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Use the new helper. We have to submit data pages ourselves in case of O_SYNC write because __generic_file_aio_write does not do it for us. OCFS2 developpers might think about moving the sync out of i_mutex which seems to be easily possible but that's out of scope of this patch. CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:15 +02:00
Jens Axboe	d993831fa7	writeback: add name to backing_dev_info This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 09:20:26 +02:00
Linus Torvalds	ac7ac9f2b9	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: ocfs2_write_begin_nolock() should handle len=0 ocfs2: invalidate dentry if its dentry_lock isn't initialized.	2009-09-05 13:38:37 -07:00
Joel Becker	5e404e9ed1	ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). With this commit, extent tree operations are divorced from inodes and rely on ocfs2_caching_info. Phew! Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:13 -07:00
Joel Becker	a1cf076ba9	ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode. We only allow unwritten extents on data, so the toplevel ocfs2_mark_extent_written() can use an inode all it wants. But the subfunction isn't even using the inode argument. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:12 -07:00
Joel Becker	f3868d0fa2	ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree. Don't use a struct inode anymore. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:11 -07:00
Joel Becker	d231129f44	ocfs2: ocfs2_split_and_insert() no longer needs struct inode. It already has an extent_tree. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:11 -07:00
Joel Becker	dbdcf6a48a	ocfs2: ocfs2_remove_extent() no longer needs struct inode. One more generic btree function that is isolated from struct inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:10 -07:00
Joel Becker	cbee7e1a6a	ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode. One more function that doesn't need a struct inode to pass to its children. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:09 -07:00
Joel Becker	cc79d8c19e	ocfs2: ocfs2_insert_extent() no longer needs struct inode. One more function down, no inode in the entire insert-extent chain. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:09 -07:00
Joel Becker	92ba470c44	ocfs2: Make extent map insertion an extent_tree_operation. ocfs2_insert_extent() wants to insert a record into the extent map if it's an inode data extent. But since many btrees can call that function, let's make it an op on ocfs2_extent_tree. Other tree types can leave it empty. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:08 -07:00
Joel Becker	627961b77e	ocfs2: ocfs2_figure_insert_type() no longer needs struct inode. It's not using it, so remove it from the parameter list. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:08 -07:00
Joel Becker	1ef61b3314	ocfs2: Remove inode from ocfs2_figure_extent_contig(). It already has an ocfs2_extent_tree and doesn't need the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:07 -07:00
Joel Becker	a29702914a	ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type(). We don't want struct inode in generic btree operations. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:07 -07:00
Joel Becker	b4a176515c	ocfs2: ocfs2_extent_contig() only requires the superblock. Don't pass the inode in. We don't want it around for generic btree operations. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:05 -07:00
Joel Becker	3505bec018	ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode. They aren't using it, so remove it from their parameter lists. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:05 -07:00
Joel Becker	c38e52bb1c	ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode. Another on the way to generic btree functions. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:05 -07:00
Joel Becker	d562862314	ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode. Give it an ocfs2_extent_tree and it is happy. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:04 -07:00
Joel Becker	4c911eefca	ocfs2: Make truncating the extent map an extent_tree_operation. ocfs2_remove_extent() wants to truncate the extent map if it's truncating an inode data extent. But since many btrees can call that function, let's make it an op on ocfs2_extent_tree. Other tree types can leave it empty. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:03 -07:00
Joel Becker	043beebb6c	ocfs2: ocfs2_truncate_rec() doesn't need struct inode. It's not using it anymore. Remove it from the parameter list. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:03 -07:00
Joel Becker	d401dc12fc	ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode. ocfs2_grow_branch() not really using it other than to pass it to the subfunctions ocfs2_shift_tree_depth(), ocfs2_find_branch_target(), and ocfs2_add_branch(). The first two weren't it either, so they drop the argument. ocfs2_add_branch() only passed it to ocfs2_adjust_rightmost_branch(), which drops the inode argument and uses the ocfs2_extent_tree as well. ocfs2_append_rec_to_path() can be take an ocfs2_extent_tree instead of the inode. The function ocfs2_adjust_rightmost_records() goes along for the ride. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:02 -07:00
Joel Becker	c495dd24ac	ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode. It's not using it, so remove it from the parameter list. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:02 -07:00
Joel Becker	4fe82c312a	ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode. Drop it from the parameters - they already have ocfs2_extent_list. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:01 -07:00
Joel Becker	70f18c08b4	ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode. It already gets ocfs2_extent_tree, so we can just use that. This chains to the same modification for ocfs2_remove_rightmost_path() and ocfs2_rotate_rightmost_leaf_left(). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:08:00 -07:00
Joel Becker	e46f74dc35	ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode. It already has struct ocfs2_extent_tree, which has the caching info. So we don't need to pass it struct inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:59 -07:00
Joel Becker	1e2dd63fe0	ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode. It already has struct ocfs2_extent_tree, which has the caching info. So we don't need to pass it struct inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:59 -07:00
Joel Becker	09106bae05	ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode. Pass in the extent tree, which is all we need. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:58 -07:00
Joel Becker	1bbf0b8d60	ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode. We don't need struct inode in ocfs2_rotate_tree_right() anymore. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:58 -07:00
Joel Becker	6136ca5f5f	ocfs2: Drop struct inode from ocfs2_extent_tree_operations. We can get to the inode from the caching information. Other parent types don't need it. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:57 -07:00
Joel Becker	7dc0280567	ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root() Get rid of the inode argument. Use extent_tree instead. This means a few more functions have to pass an extent_tree around. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:55 -07:00
Joel Becker	5c601aba8c	ocfs2: Get inode out of ocfs2_rotate_subtree_root_right(). Pass the ocfs2_extent_list down through ocfs2_rotate_tree_right() and get rid of struct inode in ocfs2_rotate_subtree_root_right(). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:55 -07:00
Joel Becker	4619c73e7c	ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all. Completely unused argument. Get rid of it. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:54 -07:00
Joel Becker	6641b0ce32	ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path() ocfs2_unlink_path() doesn't need struct inode, so let's pass it struct ocfs2_extent_tree. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:54 -07:00
Joel Becker	42a5a7a9a5	ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode. Pass struct ocfs2_extent_tree into ocfs2_create_new_meta_bhs(). It no longer needs struct inode or ocfs2_super. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:53 -07:00
Joel Becker	facdb77f54	ocfs2: ocfs2_find_path() only needs the caching info ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent blocks. They need struct ocfs2_caching_info for that, but not struct inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:53 -07:00
Joel Becker	3d03a305de	ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block(). extent blocks belong to btrees on more than just inodes, so we want to pass the ocfs2_caching_info structure directly to ocfs2_read_extent_block(). A number of places in alloc.c can now drop struct inode from their argument list. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:52 -07:00
Joel Becker	d9a0a1f83b	ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree. What do we cache? Metadata blocks. What are most of our non-inode metadata blocks? Extent blocks for our btrees. struct ocfs2_extent_tree is the main structure for managing those. So let's store the associated ocfs2_caching_info there. This means that ocfs2_et_root_journal_access() doesn't need struct inode anymore, and any place that has an et can refer to et->et_ci instead of INODE_CACHE(inode). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:51 -07:00
Joel Becker	0cf2f7632b	ocfs2: Pass struct ocfs2_caching_info to the journal functions. The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:50 -07:00
Joel Becker	292dd27ec7	ocfs2: move ip_created_trans to struct ocfs2_caching_info Similar ip_last_trans, ip_created_trans tracks the creation of a journal managed inode. This specifically tracks what transaction created the inode. This is so the code can know if the inode has ever been written to disk. This behavior is desirable for any journal managed object. We move it to struct ocfs2_caching_info as ci_created_trans so that any object using ocfs2_caching_info can rely on this behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:49 -07:00
Joel Becker	66fb345ddd	ocfs2: move ip_last_trans to struct ocfs2_caching_info We have the read side of metadata caching isolated to struct ocfs2_caching_info, now we need the write side. This means the journal functions. The journal only does a couple of things with struct inode. This change moves the ip_last_trans field onto struct ocfs2_caching_info as ci_last_trans. This field tells the journal whether a pending journal flush is required. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:49 -07:00
Joel Becker	8cb471e8f8	ocfs2: Take the inode out of the metadata read/write paths. We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:48 -07:00
Joel Becker	6e5a3d7538	ocfs2: Change metadata caching locks to an operations structure. We don't really want to cart around too many new fields on the ocfs2_caching_info structure. So let's wrap all our access of the parent object in a set of operations. One pointer on caching_info, and more flexibility to boot. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:48 -07:00
Joel Becker	47460d65a4	ocfs2: Make the ocfs2_caching_info structure self-contained. We want to use the ocfs2_caching_info structure in places that are not inodes. To do that, it can no longer rely on referencing the inode directly. This patch moves the flags to ocfs2_caching_info->ci_flags, stores pointers to the parent's locks on the ocfs2_caching_info, and renames the constants and flags to reflect its independant state. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 16:07:47 -07:00
Sunil Mushran	8379e7c46c	ocfs2: ocfs2_write_begin_nolock() should handle len=0 Bug introduced by mainline commit `e7432675f8` The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-09-04 14:28:31 -07:00
Tao Ma	a1b08e75df	ocfs2: invalidate dentry if its dentry_lock isn't initialized. In commit `a5a0a63092`, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-27 18:10:54 -07:00
Linus Torvalds	2584e7986f	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/dlm: Wait on lockres instead of erroring cancel requests ocfs2: Add missing lock name ocfs2: Don't oops in ocfs2_kill_sb on a failed mount ocfs2: release the buffer head in ocfs2_do_truncate. ocfs2: Handle quota file corruption more gracefully	2009-08-24 14:41:28 -07:00
Goldwyn Rodrigues	c795b33ba1	ocfs2/dlm: Wait on lockres instead of erroring cancel requests In case a downconvert is queued, and a flock receives a signal, BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered because a lock cancel triggers a dlmunlock while an AST is scheduled. To avoid this, allow a LKM_CANCEL to pass through, and let it wait on __dlm_wait_on_lockres(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Acked-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-20 18:42:34 -07:00
Jan Kara	a8b88d3d49	ocfs2: Add missing lock name There is missing name for NFSSync cluster lock. This makes lockdep unhappy because we end up passing NULL to lockdep when initializing lock key. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-20 16:41:53 -07:00
Jan Kara	5fd1318937	ocfs2: Don't oops in ocfs2_kill_sb on a failed mount If we fail to mount the filesystem, we have to be careful not to dereference uninitialized structures in ocfs2_kill_sb. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 14:32:24 -07:00
Tao Ma	60e2ec4866	ocfs2: release the buffer head in ocfs2_do_truncate. In ocfs2_do_truncate, we forget to release last_eb_bh which will cause memleak. So call brelse in the end. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 12:50:35 -07:00
Jan Kara	ada508274b	ocfs2: Handle quota file corruption more gracefully ocfs2_read_virt_blocks() does BUG when we try to read a block from a file beyond its end. Since this can happen due to filesystem corruption, it is not really an appropriate answer. Make ocfs2_read_quota_block() check the condition and handle it by calling ocfs2_error() and returning EIO. [ Modified to print ip_blkno in the error - Joel ] Reported-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-17 12:50:12 -07:00
Linus Torvalds	bc7af9ba15	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits) ocfs2: Fix possible deadlock when extending quota file ocfs2: keep index within status_map[] ocfs2: Initialize the cluster we're writing to in a non-sparse extend ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2: Define credit counts for quota operations ocfs2: Remove syncjiff field from quota info ocfs2: Fix initialization of blockcheck stats ocfs2: Zero out padding of on disk dquot structure ocfs2: Initialize blocks allocated to local quota file ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() ocfs2: Make global quota files blocksize aligned ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. ocfs2: Fix deadlock on umount ocfs2: Add extra credits and access the modified bh in update_edge_lengths. ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2: Fix error return in ocfs2_write_cluster() ocfs2: Fix compilation warning for fs/ocfs2/xattr.c ocfs2: Initialize count in aio_write before generic_write_checks ocfs2: log the actual return value of ocfs2_file_aio_write() ...	2009-08-13 11:17:40 -07:00
Jan Kara	b409d7a0ab	ocfs2: Fix possible deadlock when extending quota file In OCFS2, allocator locks rank above transaction start. Thus we cannot extend quota file from inside a transaction less we could deadlock. We solve the problem by starting transaction not already in ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and ocfs2_global_read_dquot() and we allocate blocks to quota files before starting the transaction. In case we crash, quota files will just have a few blocks more but that's no problem since we just use them next time we extend the quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-10 12:20:22 -07:00
Roel Kluin	8a57a9dda7	ocfs2: keep index within status_map[] Do not exceed array status_map[] Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-07 13:16:50 -07:00
Sunil Mushran	e7432675f8	ocfs2: Initialize the cluster we're writing to in a non-sparse extend In a non-sparse extend, we correctly allocate (and zero) the clusters between the old_i_size and pos, but we don't zero the portions of the cluster we're writing to outside of pos<->len. It handles clustersize > pagesize and blocksize < pagesize. [Cleaned up by Joel Becker.] Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-08-07 13:16:23 -07:00
Goldwyn Rodrigues	ab57a40827	ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() We BUG_ON() the same thing twice. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-31 13:43:44 -07:00
Tao Ma	e9956fae7d	ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2_quota_write needs to release the lock if it fails to read quota block. So use "goto out" instead of "return err". Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-30 11:06:06 -07:00
Jan Kara	0584974a77	ocfs2: Define credit counts for quota operations Numbers of needed credits for some quota operations were written as raw numbers. Create appropriate defines instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:31 -07:00
Jan Kara	4539f1df25	ocfs2: Remove syncjiff field from quota info syncjiff is just a converted value of syncms. Some places which are updating syncms forgot to update syncjiff as well. Since the conversion is just a simple division / multiplication and it does not happen frequently, just remove the syncjiff field to avoid forgotten conversions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:27 -07:00
Jan Kara	1c1d9793ff	ocfs2: Fix initialization of blockcheck stats We just set blockcheck stats to zeros but we should also properly initialize the spinlock there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:24 -07:00
Jan Kara	7669f54c55	ocfs2: Zero out padding of on disk dquot structure Padding fields of on-disk dquot structure were not zeroed. Zero them so that it's easier to use them later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:17 -07:00
Jan Kara	0e7f387bf3	ocfs2: Initialize blocks allocated to local quota file When we extend local quota file, we should initialize data in newly allocated block. Firstly because on recovery we could parse bogus data, secondly so that block checksums are properly computed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:11 -07:00
Jan Kara	4b3fa1904c	ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() In a code path extending local quota files we marked new header buffer uptodate only after calling ocfs2_journal_access_dq() which triggers a bug. Fix it and also call ocfs2 variant of the function marking buffer uptodate. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:59:05 -07:00
Jan Kara	b57ac2c43e	ocfs2: Make global quota files blocksize aligned Change i_size of global quota files so that it always remains aligned to block size. This is mainly because the end of quota block may contain checksum (if checksumming is enabled) and it's a bit awkward for it to be "outside" of quota file (and it makes life harder for ocfs2-tools). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:58:59 -07:00
Tao Ma	82e12644cf	ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. In ocfs2_adjust_adjacent_records, we will adjust adjacent records according to the extent_list in the lower level. But actually the lower level tree will either be a leaf or a branch. If we only use ocfs2_is_empty_extent we will meet with some problem if the lower tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead. And actually only the leaf record can have holes. So add a BUG_ON for non-leaf branch. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-23 10:58:46 -07:00
Jan Kara	f7b1aa69be	ocfs2: Fix deadlock on umount In commit `ea455f8ab6`, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-21 15:47:55 -07:00
Tao Ma	3c5e10683e	ocfs2: Add extra credits and access the modified bh in update_edge_lengths. In normal tree rotation left process, we will never touch the tree branch above subtree_index and ocfs2_extend_rotate_transaction doesn't reserve the credits for them either. But when we want to delete the rightmost extent block, we have to update the rightmost records for all the rightmost branch(See ocfs2_update_edge_lengths), so we have to allocate extra credits for them. What's more, we have to access them also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-21 14:41:54 -07:00
Wengang Wang	1f4cea3790	ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2_get_block() does no allocation. Hole filling for writes should have happened farther up in the call chain. We detect this case and print an error, but we then continue with the function. We should be exiting immediately. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 17:36:07 -07:00
Wengang Wang	cbfa9639ae	ocfs2: Fix error return in ocfs2_write_cluster() A typo caused ocfs2_write_cluster() to return 0 in some error cases. Fix it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 17:32:55 -07:00
Subrata Modak	44d8e4e1e9	ocfs2: Fix compilation warning for fs/ocfs2/xattr.c gcc 4.4.1 generates the following build warning on i386: CC [M] fs/ocfs2/xattr.o fs/ocfs2/xattr.c: In function ‘ocfs2_xattr_block_get’: fs/ocfs2/xattr.c:1055: warning: ‘block_off’ may be used uninitialized in this function The following fix is based on a similar approach by David Howells few days back: http://lkml.org/lkml/2009/7/9/109, Signed-off-by: Subrata Modak<subrata@linux.vnet.ibm.com>, Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 15:52:57 -07:00
Goldwyn Rodrigues	cefcb800fa	ocfs2: Initialize count in aio_write before generic_write_checks generic_write_checks() expects count to be initialized to the size of the write. Writes to files open with O_DIRECT\|O_LARGEFILE write 0 bytes because count is uninitialized. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-20 15:47:58 -07:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Wengang Wang	812e7a6a43	ocfs2: log the actual return value of ocfs2_file_aio_write() in ocfs2_file_aio_write(), log_exit() could don't log the value which is really returned. this patch fixes it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-10 16:53:52 -07:00
Jeff Liu	17ae26b669	ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c logging in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency by comparing to other lines with the similar log info in the same file. Signed-off-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-08 15:34:40 -07:00
Jeff Mahoney	8b712cd58a	ocfs2: Fixup orphan scan cleanup after failed mount If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-07-08 15:34:02 -07:00
Linus Torvalds	cf5434e894	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. ocfs2: Add lockdep annotations vfs: Set special lockdep map for dirs only if not set by fs ocfs2: Disable orphan scanning for local and hard-ro mounts ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ocfs2: Stop orphan scan as early as possible during umount ocfs2: Fix ocfs2_osb_dump() ocfs2: Pin journal head before accessing jh->b_committed_data ocfs2: Update atime in splice read if necessary. ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.	2009-06-23 19:36:02 -07:00
Tao Ma	d246ab307d	ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. Actually ocfs2_sysfile_cluster_lock_key is only used if we enable CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building warning. fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’ defined but not used Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:29 -07:00
Jan Kara	cb25797d45	ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:34:26 -07:00
Sunil Mushran	df152c241d	ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:55 -07:00
Sunil Mushran	3211949f89	ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:53 -07:00
Sunil Mushran	692684e19e	ocfs2: Stop orphan scan as early as possible during umount Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:51 -07:00
Sunil Mushran	c3d38840ab	ocfs2: Fix ocfs2_osb_dump() Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:49 -07:00
Sunil Mushran	94e41ecfe0	ocfs2: Pin journal head before accessing jh->b_committed_data This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:47 -07:00
Tao Ma	1962f39abb	ocfs2: Update atime in splice read if necessary. We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-22 14:24:45 -07:00
Joel Becker	1c520dfbf3	ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>	2009-06-22 14:24:30 -07:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Linus Torvalds	300df7dc89	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/net: Use wait_event() in o2net_send_message_vec() ocfs2: Adjust rightmost path in ocfs2_add_branch. ocfs2: fdatasync should skip unimportant metadata writeout ocfs2: Remove redundant gotos in ocfs2_mount_volume() ocfs2: Add statistics for the checksum and ecc operations. ocfs2 patch to track delayed orphan scan timer statistics ocfs2: timer to queue scan of all orphan slots ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories ocfs2: Fix possible deadlock in quota recovery ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() ocfs2: Fix lock inversion in ocfs2_local_read_info() ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() ocfs2: update comments in masklog.h ocfs2: Don't printk the error when listing too many xattrs.	2009-06-16 12:11:57 -07:00
Sunil Mushran	9af0b38ff3	ocfs2/net: Use wait_event() in o2net_send_message_vec() Replace wait_event_interruptible() with wait_event() in o2net_send_message_vec(). This is because this function is called by the dlm that expects signals to be blocked. Fixes oss bugzilla#1126 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1126 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-15 14:50:14 -07:00
Tao Ma	6b791bcc8b	ocfs2: Adjust rightmost path in ocfs2_add_branch. In ocfs2_add_branch, we use the rightmost rec of the leaf extent block to generate the e_cpos for the newly added branch. In the most case, it is OK but if the parent extent block's rightmost rec covers more clusters than the leaf does, it will cause kernel panic if we insert some clusters in it. The message is something like: (7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression: le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count) (7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28, next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2] [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2] [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2] ... The panic can be easily reproduced by the following small test case (with bs=512, cs=4K, and I remove all the error handling so that it looks clear enough for reading). int main(int argc, char *argv) { int fd, i; char buf[5] = "test"; fd = open(argv[1], O_RDWR\|O_CREAT); for (i = 0; i < 30; i++) { lseek(fd, 40960 i, SEEK_SET); write(fd, buf, 5); } ftruncate(fd, 1146880); lseek(fd, 1126400, SEEK_SET); write(fd, buf, 5); close(fd); return 0; } The reason of the panic is that: the 30 writes and the ftruncate makes the file's extent list looks like: Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 280 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 Now another write at 1126400(275 cluster) whiich will write at the gap between 271 and 280 will trigger ocfs2_add_branch, but the result after the function looks like: Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 280 86183 1 271 0 143592 So the extent record is intersected and make the following operation bug out. This patch just try to remove the gap before we add the new branch, so that the root(branch) rightmost rec will cover the same right position. So in the above case, before adding branch the tree will be changed to Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 271 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 And after branch add, the tree looks like Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 271 86183 1 271 0 143592 Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-15 14:49:43 -07:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	94cb993f2e	ocfs2: remove ->write_super and stop maintaining ->s_dirt Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:05 -04:00
Hisashi Hifumi	e04cc15f52	ocfs2: fdatasync should skip unimportant metadata writeout In ocfs2, fdatasync and fsync are identical. I think fdatasync should skip committing transaction when inode->i_state is set just I_DIRTY_SYNC and this indicates only atime or/and mtime updates. Following patch improves fdatasync throughput. #sysbench --num-threads=16 --max-requests=300000 --test=fileio --file-block-size=4K --file-total-size=16G --file-test-mode=rndwr --file-fsync-mode=fdatasync run Results: -2.6.30-rc8 Test execution summary: total time: 107.1445s total number of events: 119559 total time taken by event execution: 116.1050 per-request statistics: min: 0.0000s avg: 0.0010s max: 0.1220s approx. 95 percentile: 0.0016s Threads fairness: events (avg/stddev): 7472.4375/303.60 execution time (avg/stddev): 7.2566/0.64 -2.6.30-rc8-patched Test execution summary: total time: 86.8529s total number of events: 300016 total time taken by event execution: 24.3077 per-request statistics: min: 0.0000s avg: 0.0001s max: 0.0336s approx. 95 percentile: 0.0001s Threads fairness: events (avg/stddev): 18751.0000/718.75 execution time (avg/stddev): 1.5192/0.05 Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-09 10:45:47 -07:00
Tao Ma	06c59bb896	ocfs2: Remove redundant gotos in ocfs2_mount_volume() Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:20:15 -07:00
Joel Becker	73be192b17	ocfs2: Add statistics for the checksum and ecc operations. It would be nice to know how often we get checksum failures. Even better, how many of them we can fix with the single bit ecc. So, we add a statistics structure. The structure can be installed into debugfs wherever the user wants. For ocfs2, we'll put it in the superblock-specific debugfs directory and pass it down from our higher-level functions. The stats are only registered with debugfs when the filesystem supports metadata ecc. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:15:36 -07:00
Srinivas Eeda	15633a220f	ocfs2 patch to track delayed orphan scan timer statistics Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:31 -07:00
Srinivas Eeda	83273932fb	ocfs2: timer to queue scan of all orphan slots When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:31 -07:00
Jan Kara	edd45c0849	ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin(). So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir() to also use this lock ordering. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:30 -07:00
Jan Kara	80d73f15d1	ocfs2: Fix possible deadlock in quota recovery In ocfs2_finish_quota_recovery() we acquired global quota file lock and started recovering local quota file. During this process we need to get quota structures, which calls ocfs2_dquot_acquire() which gets global quota file lock again. This second lock can block in case some other node has requested the quota file lock in the mean time. Fix the problem by moving quota file locking down into the function where it is really needed. Then dqget() or dqput() won't be called with the lock held. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:30 -07:00
Jan Kara	65bac575e3	ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() We called vfs_dq_transfer() with global quota file lock held. This can lead to deadlocks as if vfs_dq_transfer() has to allocate new quota structure, it calls ocfs2_dquot_acquire() which tries to get quota file lock again and this can block if another node requested the lock in the mean time. Since we have to call vfs_dq_transfer() with transaction already started and quota file lock ranks above the transaction start, we cannot just rely on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock if they need it. We fix the problem by acquiring pointers to all quota structures needed by vfs_dq_transfer() already before calling the function. By this we are sure that all quota structures are properly allocated and they can be freed only after we drop references to them. Thus we don't need quota file lock anywhere inside vfs_dq_transfer(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:29 -07:00
Jan Kara	b4c30de39a	ocfs2: Fix lock inversion in ocfs2_local_read_info() This function is called with dqio_mutex held but it has to acquire lock from global quota file which ranks above this lock. This is not deadlockable lock inversion since this code path is take only during mount when noone else can race with us but let's clean this up to silence lockdep. We just drop the dqio_mutex in the beginning of the function and reacquire it in the end since we don't need it - noone can race with us at this moment. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:29 -07:00
Jan Kara	4e8a301929	ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() It is not possible to get a read lock and then try to get the same write lock in one thread as that can block on downconvert being requested by other node leading to deadlock. So first drop the quota lock for reading and only after that get it for writing. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-06-03 19:14:28 -07:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Joel Becker	a731d12d6d	ocfs2: Use nd_set_link(). ocfs2 was hand-calling vfs_follow_link(), but there's no point to that. Let's use page_follow_link_light() and nd_set_link(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:40 -04:00
Coly Li	2b53bc7bff	ocfs2: update comments in masklog.h In the mainline ocfs2 code, the interface for masklog is in files under /sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h reference the old /proc interface. They are out of date. This patch modifies the comments in cluster/masklog.h, which also provides a bash script example on how to change the log mask bits. Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-05-05 14:48:11 -07:00
Tao Ma	a46fa684fc	ocfs2: Don't printk the error when listing too many xattrs. Currently the kernel defines XATTR_LIST_MAX as 65536 in include/linux/limits.h. This is the largest buffer that is used for listing xattrs. But with ocfs2 xattr tree, we actually have no limit for the number. If filesystem has more names than can fit in the buffer, the kernel logs will be pollluted with something like this when listing: (27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34 (27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34 So don't print "ERROR" message as this is not an ocfs2 error. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-05-05 14:43:24 -07:00
Linus Torvalds	7e567b44e6	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Change repository in MAINTAINERS. ocfs2: Fix a missing credit when deleting from indexed directories. ocfs2/trivial: Remove unused variable in ocfs2_rename. ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock() ocfs2: Fix some printk() warnings. ocfs2: Fix 2 warning during ocfs2 make. ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir.	2009-05-02 16:30:47 -07:00
Joel Becker	dfa13f39b7	ocfs2: Fix a missing credit when deleting from indexed directories. The ocfs2 directory index updates two blocks when we remove an entry - the dx root and the dx leaf. OCFS2_DELETE_INODE_CREDITS was only accounting for the dx leaf. This shows up when ocfs2_delete_inode() runs out of credits in jbd2_journal_dirty_metadata() at "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);". The test that caught this was running dirop_file_racer from the ocfs2-test suite with a 250-character filename PREFIX. Run on a 512B blocksize, it forces the orphan dir index to grow large enough to trigger. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-04-30 13:21:56 -07:00
Tao Ma	7e31a966ad	ocfs2/trivial: Remove unused variable in ocfs2_rename. With indexed dir enabled, now we use ocfs2_dir_lookup_result to wrap all the bh used for dir. So remove the 2 unused variables. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-04-29 10:57:18 -07:00
Sunil Mushran	a5a0a63092	ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock() In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to call iput(inode) because a failure here means no d_instantiate(), which means the normally matching iput() will not be called during dput(dentry). This patch fixes the oops that accompanies the following message: (3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks! kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709! Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-04-23 14:56:13 -07:00
Joel Becker	5b09b507da	ocfs2: Fix some printk() warnings. The old %llu vs u64 battle. Cast them correctly. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-04-21 16:31:20 -07:00
Tao Ma	0fba813748	ocfs2: Fix 2 warning during ocfs2 make. fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’: fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’: fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2009-04-21 16:23:39 -07:00
Miklos Szeredi	328eaaba4e	ocfs2: fix i_mutex locking in ocfs2_splice_to_file() Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-04-15 12:10:12 +02:00
Tao Ma	035a571120	ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir. In ocfs2_expand_inline_dir, we calculate whether we need 1 extra cluster if we can't store the dx inline the root and save it in dx_alloc. So add it when we call ocfs2_reserve_clusters. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-07 09:40:17 -07:00
Miklos Szeredi	7bfac9ecf0	splice: fix deadlock in splicing to file There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-07 08:34:46 -07:00
Srinivas Eeda	9140db04ef	ocfs2: recover orphans in offline slots during recovery and mount During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:26 -07:00
Hisashi Hifumi	1fca3a05ef	ocfs2: Pagecache usage optimization on ocfs2 A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:26 -07:00
wengang wang	6ca497a83e	ocfs2: fix rare stale inode errors when exporting via nfs For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:25 -07:00
Sunil Mushran	9405dccfd3	ocfs2/dlm: Tweak mle_state output The debugfs file, mle_state, now prints the number of largest number of mles in one hash link. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:25 -07:00
Sunil Mushran	516b7e52ab	ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() This patch attempts to fix a fine race between purging and migration. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:24 -07:00
Sunil Mushran	7141514b83	ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry This patch removes struct dlm_lock_name and adds the entries directly to struct dlm_master_list_entry. Under the new scheme, both mles that are backed by a lockres or not, will have the name populated in mle->mname. This allows us to get rid of code that was figuring out the location of the mle name. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:23 -07:00
Sunil Mushran	e64ff14607	ocfs2/dlm: Show the number of lockres/mles in dlm_state This patch shows the number of lockres' and mles in the debugfs file, dlm_state. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:22 -07:00
Sunil Mushran	7d62a978a8	ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner(). Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:21 -07:00
Sunil Mushran	6800791ab7	ocfs2/dlm: Improve lockres counts This patch replaces the lockres counts that tracked the number number of locally and remotely mastered lockres' with a current and total count. The total count is the number of lockres' that have been created since the dlm domain was created. The number of locally and remotely mastered counts can be computed using the locking_state output. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:21 -07:00
Sunil Mushran	2041d8fdce	ocfs2/dlm: Track number of mles The lifetime of a mle is limited to the duration of the lockres mastery process. While typically this lifetime is fairly short, we have noticed the number of mles explode under certain circumstances. This patch tracks the number of each different types of mles and should help us determine how best to speed up the mastery process. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:21 -07:00
Sunil Mushran	67ae1f0604	ocfs2/dlm: Indent dlm_cleanup_master_list() The previous patch explicitly did not indent dlm_cleanup_master_list() so as to make the patch readable. This patch properly indents the function. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:21 -07:00
Sunil Mushran	2ed6c750d6	ocfs2/dlm: Activate dlm->master_hash for master list entries With this patch, the mles are stored in a hash and not a simple list. This should improve the mle lookup time when the number of outstanding masteries is large. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:19 -07:00
Sunil Mushran	e2b66ddcce	ocfs2/dlm: Create and destroy the dlm->master_hash This patch adds code to create and destroy the dlm->master_hash. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:18 -07:00
Sunil Mushran	c2cd4a4433	ocfs2/dlm: Refactor dlm_clean_master_list() This patch refactors dlm_clean_master_list() so as to make it easier to convert the mle list to a hash. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:18 -07:00
Sunil Mushran	f77a9a78c3	ocfs2/dlm: Clean up struct dlm_lock_name For master mle, the name it stored in the attached lockres in struct qstr. For block and migration mle, the name is stored inline in struct dlm_lock_name. This patch attempts to make struct dlm_lock_name look like a struct qstr. While we could use struct qstr, we don't because we want to avoid having to malloc and free the lockname string as the mle's lifetime is fairly short. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:18 -07:00
Sunil Mushran	1c0845773a	ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list This patch encapsulates adding and removing of the mle from the dlm->master_list. This patch is part of the series of patches that converts the mle list to a mle hash. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:18 -07:00
Tao Ma	feb473a6e8	ocfs2: Optimize inode group allocation by recording last used group. In ocfs2, the block group search looks for the "emptiest" group to allocate from. So if the allocator has many equally(or almost equally) empty groups, new block group will tend to get spread out amongst them. So we add osb_inode_alloc_group in ocfs2_super to record the last used inode allocation group. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. I have done some basic test and the results are a ten times improvement on some cold-cache stat workloads. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:18 -07:00
Tao Ma	60ca81e82d	ocfs2: Allocate inode groups from global_bitmap. Inode groups used to be allocated from local alloc file, but since we want all inodes to be contiguous enough, we will try to allocate them directly from global_bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Tao Ma	138211515c	ocfs2: Optimize inode allocation by remembering last group In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	1d46dc08d3	ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs rebalancing. Since we rebalance an entire cluster at a time however, this function needs to calculate the beginning of that cluster, in blocks. The calculation was wrong, which would result in a read of non-leaf blocks. Fix the calculation by adding ocfs2_block_to_cluster_start() which is a more straight-forward way of determining this. Reported-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	b80b549c35	ocfs2: re-order ocfs2_empty_dir checks ocfs2_empty_dir() is far more expensive than checking link count. Since both need to be checked at the same time, we can improve performance by checking link count first. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:17 -07:00
Mark Fasheh	3a8df2b9c3	ocfs2: Enable indexed directories Since the disk format is finalized, we can set this feature bit in the supported mask. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <Joel.Becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	e3a93c2db6	ocfs2: Add total entry count to dx_root_block This little bit of extra accounting speeds up ocfs2_empty_dir() dramatically by allowing us to short-circuit the full directory scan. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	198a1ca3b7	ocfs2: Increase max links count Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	e7c17e4309	ocfs2: Introduce dir free space list The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to find a free block. This patch provides an improvement in directory insert performance by maintaining a singly linked list of directory leaf blocks which have space for additional dirents. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	4ed8a6bb08	ocfs2: Store dir index records inline Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:16 -07:00
Mark Fasheh	9b7895efac	ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:15 -07:00
Mark Fasheh	4a12ca3a00	ocfs2: Introduce dir lookup helper struct Many directory manipulation calls pass around a tuple of dirent, and it's containing buffer_head. Dir indexing has a bit more state, but instead of adding yet more arguments to functions, we introduce 'struct ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but future patches will add more state. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-04-03 11:39:15 -07:00
Sunil Mushran	59b526a307	ocfs2: Remove debugfs file local_alloc_stats This patch removes the debugfs file local_alloc_stats as that information is now included in the fs_state debugfs file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:15 -07:00
Sunil Mushran	50397507e8	ocfs2: Expose the file system state via debugfs This patch creates a per mount debugfs file, fs_state, which exposes information like, cluster stack in use, states of the downconvert, recovery and commit threads, number of journal txns, some allocation stats, list of all slots, etc. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:15 -07:00
Sunil Mushran	96a6c64b53	ocfs2: Move struct recovery_map to a header file Move the definition of struct recovery_map from journal.c to journal.h. This is preparation for the next patch. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:14 -07:00
Sunil Mushran	87d3d3f393	ocfs2/hb: Expose the list of heartbeating nodes via debugfs This patch creates a debugfs file, o2hb/livesnodes, which exposes the aggregate list of heartbeating node across all heartbeat regions. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-04-03 11:39:14 -07:00
Linus Torvalds	8fe74cf053	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Remove two unneeded exports and make two symbols static in fs/mpage.c Cleanup after commit `585d3bc06f` Trim includes of fdtable.h Don't crap into descriptor table in binfmt_som Trim includes in binfmt_elf Don't mess with descriptor table in load_elf_binary() Get rid of indirect include of fs_struct.h New helper - current_umask() check_unsafe_exec() doesn't care about signal handlers sharing New locking/refcounting for fs_struct Take fs_struct handling to new file (fs/fs_struct.c) Get rid of bumping fs_struct refcount in pivot_root(2) Kill unsharing fs_struct in __set_personality()	2009-04-02 21:09:10 -07:00
Nick Piggin	c2ec175c39	mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Al Viro	ce3b0f8d5c	New helper - current_umask() current->fs->umask is what most of fs_struct users are doing. Put that into a helper function. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-31 23:00:26 -04:00
Al Viro	d8fba0ffe5	constify dentry_operations: OCFS2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-03-27 14:44:02 -04:00
Tao Ma	712e53e46a	ocfs2: Use xs->bucket to set xattr value outside A long time ago, xs->base is allocated a 4K size and all the contents in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to abstract xattr bucket and xs->base is initialized to the start of the bu_bhs[0]. So xs->base + offset will overflow when the value root is stored outside the first block. Then why we can survive the xattr test by now? It is because we always read the bucket contiguously now and kernel mm allocate continguous memory for us. We are lucky, but we should fix it. So just get the right value root as other callers do. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-03-12 16:46:09 -07:00
Tao Ma	74e77eb30d	ocfs2: Fix a bug found by sparse check. We need to use le32_to_cpu to test rec->e_cpos in ocfs2_dinode_insert_check. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-03-12 16:46:01 -07:00
Tiger Yang	d9ae49d6e2	ocfs2: tweak to get the maximum inline data size with xattr Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-03-12 16:45:46 -07:00
Tiger Yang	6c9fd1dc0a	ocfs2: reserve xattr block for new directory with inline data If this is a new directory with inline data, we choose to reserve the entire inline area for directory contents and force an external xattr block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-03-12 16:45:40 -07:00
wengang wang	28d57d4377	ocfs2: add IO error check in ocfs2_get_sector() Check for IO error in ocfs2_get_sector(). Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:12 -08:00
Tiger Yang	4442f51826	ocfs2: set gap to seperate entry and value when xattr in bucket This patch set a gap (4 bytes) between xattr entry and name/value when xattr in bucket. This gap use to seperate entry and name/value when a bucket is full. It had already been set when xattr in inode/block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Tao Ma	c8b9cf9a7c	ocfs2: lock the metaecc process for xattr bucket For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks with io_mutex held. While for xattr bucket, it is calculated by the whole buckets. So we have to add a spin_lock to prevent multiple processes calculating metaecc. Signed-off-by: Tao Ma <tao.ma@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Tao Ma	89a907afe0	ocfs2: Use the right access_* method in ctime update of xattr. In ctime updating of xattr, it use the wrong type of access for inode, so use ocfs2_journal_access_di instead. Reported-and-Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Sunil Mushran	53ecd25e14	ocfs2/dlm: Make dlm_assert_master_handler() kill itself instead of the asserter In dlm_assert_master_handler(), if we get an incorrect assert master from a node that, we reply with EINVAL asking the asserter to die. The problem is that an assert is sent after so many hoops, it is invariably the node that thinks the asserter is wrong, is actually wrong. So instead of killing the asserter, this patch kills the assertee. This patch papers over a race that is still being addressed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:11 -08:00
Sunil Mushran	dabc47de7a	ocfs2/dlm: Use ast_lock to protect ast_list The code was using dlm->spinlock instead of dlm->ast_lock to protect the ast_list. This patch fixes the issue. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Sunil Mushran	c74ff8bb22	ocfs2: Cleanup the lockname print in dlmglue.c The dentry lock has a different format than other locks. This patch fixes ocfs2_log_dlm_error() macro to make it print the dentry lock correctly. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Sunil Mushran	7dc102b737	ocfs2/dlm: Retract fix for race between purge and migrate Mainline commit `d4f7e650e5` attempts to delay the dlm_thread from sending the drop ref message if the lockres is being migrated. The problem is that we make the dlm_thread wait for the migration to complete. This causes a deadlock as dlm_thread also participates in the lockres migration process. A better fix for the original oss bugzilla#1012 is in testing. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Tao Ma	47be12e4ee	ocfs2: Access and dirty the buffer_head in mark_written. In __ocfs2_mark_extent_written, when we meet with the situation of c_split_covers_rec, the old solution just replace the extent record and forget to access and dirty the buffer_head. This will cause a problem when the unwritten extent is in an extent block. So access and dirty it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-26 11:51:09 -08:00
Jan Kara	7f5aa21508	jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>	2009-02-10 11:15:34 -05:00
Mark Fasheh	fd4ef23196	ocfs2: add quota call to ocfs2_remove_btree_range() We weren't reclaiming the clusters which get free'd from this function, so any user punching holes in a file would still have those bytes accounted against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this. Interestingly enough, the journal credits calculation already took this into account. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jan Kara <jack@suse.cz>	2009-02-02 14:20:20 -08:00
Sunil Mushran	a4b91965d3	ocfs2: Wakeup the downconvert thread after a successful cancel convert When two nodes holding PR locks on a resource concurrently attempt to upconvert the locks to EX, the master sends a BAST to one of the nodes. This message tells that node to first cancel convert the upconvert request, followed by downconvert to a NL. Only when this lock is downconverted to NL, can the master upconvert the first node's lock to EX. While the fs was doing the cancel convert, it was forgetting to wake up the dc thread after a successful cancel, leading to a deadlock. Reported-and-Tested-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-02 14:20:19 -08:00
Tao Ma	554e7f9e04	ocfs2: Access the xattr bucket only before modifying it. In ocfs2_xattr_value_truncate, we may call b-tree codes which will extend the journal transaction. It has a potential problem that it may let the already-accessed-but-not-dirtied buffers gone. So we'd better access the bucket after we call ocfs2_xattr_value_truncate. And as for the root buffer for the xattr value, b-tree code will acess and dirty it, so we don't need to worry about it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-02 14:20:18 -08:00
Jan Kara	f8afead716	ocfs2: Fix possible deadlock in ocfs2_write_dquot() It could happen that some limit has been set via quotactl() and in parallel ->mark_dirty() is called from another thread doing e.g. dquot_alloc_space(). In such case ocfs2_write_dquot() must not try to sync the dquot because that needs global quota lock but that ranks above transaction start. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-02 14:20:17 -08:00
Jan Kara	ea455f8ab6	ocfs2: Push out dropping of dentry lock to ocfs2_wq Dropping of last reference to dentry lock is a complicated operation involving dropping of reference to inode. This can get complicated and quota code in particular needs to obtain some quota locks which leads to potential deadlock. Thus we defer dropping of inode reference to ocfs2_wq. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-02-02 14:20:16 -08:00
Linus Torvalds	cc597bc3d3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: ocfs2: Remove ocfs2_dquot_initialize() and ocfs2_dquot_drop() quota: Improve locking	2009-01-26 10:41:00 -08:00
Alexey Dobriyan	2fe4371dff	fs/Kconfig: move ocfs2 out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-01-22 13:15:54 +03:00
Jan Kara	c475146d8f	ocfs2: Remove ocfs2_dquot_initialize() and ocfs2_dquot_drop() Since ->acquire_dquot and ->release_dquot callbacks aren't called under dqptr_sem anymore, we don't have to start a transaction and obtain locks so early. So we can just remove all this complicated stuff. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.de>	2009-01-21 15:25:57 +01:00
Coly Li	73ac36ea14	fix similar typos to successfull When I review ocfs2 code, find there are 2 typos to "successfull". After doing grep "successfull " in kernel tree, 22 typos found totally -- great minds always think alike :) This patch fixes all the similar typos. Thanks for Randy's ack and comments. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:15 -08:00
Fernando Carrijo	c19a28e119	remove lots of double-semicolons Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:14 -08:00
Frederik Schwarzer	025dfdafe7	trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:06 +01:00
Linus Torvalds	10cc04f5a0	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits) ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. ocfs2: use min_t in ocfs2_quota_read() ocfs2: remove unneeded lvb casts ocfs2: Add xattr support checking in init_security ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle ocfs2: calculate and reserve credits for xattr value in mknod ocfs2/xattr: fix credits calculation during index create ocfs2/xattr: Always updating ctime during xattr set. ocfs2/xattr: Remove extend_trans call and add its credits from the beginning ocfs2/dlm: Fix race during lockres mastery ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() ocfs2/dlm: Fix a race between migrate request and exit domain ocfs2: One more hamming code optimization. ocfs2: Another hamming code optimization. ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). ocfs2: Enable metadata checksums. ocfs2: Validate superblock with checksum and ecc. ocfs2: Checksum and ECC for directory blocks. ...	2009-01-05 18:32:43 -08:00
Al Viro	56ff5efad9	zero i_uid/i_gid on inode allocation ... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Tao Ma	9047beabb8	ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*() functions", the wrong buffer_head is accessed. So change it to the right buffer_head. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:37 -08:00
Mark Fasheh	dad7d975e4	ocfs2: use min_t in ocfs2_quota_read() This is preferred to min(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:37 -08:00
Mark Fasheh	a641dc2a5a	ocfs2: remove unneeded lvb casts dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb(). This is pointless however, as ocfs2_dlm_lvb() returns void *. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	38d59ef61c	ocfs2: Add xattr support checking in init_security We must check whether ocfs2 volume support xattr in init_security, if not support xattr and security is enable, would cause failure of mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	008aafaf0b	ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle In extreme situation, may need xattr bucket for setting security entry and acl entries during mknod. This only happens when block size is too small. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	0e445b6fe9	ocfs2: calculate and reserve credits for xattr value in mknod We extend the credits for xattr's large value in set_value_outside before, this can give rise to a credits issue when we set one security entry and two acl entries duing mknod. As we remove extend_trans form set_value_outside, we must calculate and reserve the credits for xattr's large value in mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	90cb546cad	ocfs2/xattr: fix credits calculation during index create When creating a xattr index block, the old calculation forget to add credits for the meta change of the alloc file. So add more credits and more comments to explain it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	4b3f6209bf	ocfs2/xattr: Always updating ctime during xattr set. In xattr set, we should always update ctime if the operation goes sucessfully. The old one mistakenly put it in ocfs2_xattr_set_entry which is only called when we set xattr in inode or xattr block. The side benefit is that it resolve the bug 1052 since in that scenario, ocfs2_calc_xattr_set_need only calc out the xattr set credits while ocfs2_xattr_set_entry update the inode also which isn't concerned with the process of xattr set. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	71d548a6af	ocfs2/xattr: Remove extend_trans call and add its credits from the beginning Actually, when setting a new xattr value, we know it from the very beginning, and it isn't like the extension of bucket in which case we can't figure it out. So remove ocfs2_extend_trans in that function and calculate it before the transaction. It also relieve acl operation from the worry about the side effect of ocfs2_extend_trans. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Sunil Mushran	7b791d6856	ocfs2/dlm: Fix race during lockres mastery dlm_get_lock_resource() is supposed to return a lock resource with a proper master. If multiple concurrent threads attempt to lookup the lockres for the same lockid while the lock mastery in underway, one or more threads are likely to return a lockres without a proper master. This patch makes the threads wait in dlm_get_lock_resource() while the mastery is underway, ensuring all threads return the lockres with a proper master. This issue is known to be limited to users using the flock() syscall. For all other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each lockid. Users encountering this bug will see flock() return EINVAL and dmesg have the following error: ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource <LOCKID>: bad api args Reported-by: Coly Li <coyli@suse.de> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	b0d4f817ba	ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list This patch adds a new lock, dlm->tracking_lock, to protect adding/removing lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock for the same, but that proved inadequate as we could be freeing a lockres from a context that did not hold that lock. As the new lock only protects this list, we can explicitly take it when removing the lockres from the tracking list. This bug was exposed when testing multiple processes concurrently flock() the same file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	d4f7e650e5	ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating During lockres purge, o2dlm sends a drop reference message to the lockres master. This patch delays the message if the lockres is being migrated. Fixes oss bugzilla#1012 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	57dff2676e	ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() Patch cleans printed errors in dlm_proxy_ast_handler(). The errors now includes the node number that sent the (b)ast. Also it reduces the number of endian swaps of the cookie. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	2b83256407	ocfs2/dlm: Fix a race between migrate request and exit domain Patch address a racing migrate request message and an exit domain message. Instead of blocking exit domains for the duration of the migrate, we ignore failure to deliver that message. This is because an exiting domain should not have any active locks and thus has no role to play in the migration. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	58896c4d0e	ocfs2: One more hamming code optimization. The previous optimization used a fast find-highest-bit-set operation to give us a good starting point in calc_code_bit(). This version lets the caller cache the previous code buffer bit offset. Thus, the next call always starts where the last one left off. This reduces the calculation another 39%, for a total 80% reduction from the original, naive implementation. At least, on my machine. This also brings the parity calculation to within an order of magnitude of the crc32 calculation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	7bb458a585	ocfs2: Another hamming code optimization. In the calc_code_bit() function, we must find all powers of two beneath the code bit number, after it's shifted by those powers of two. This requires a loop to see where it ends up. We can optimize it by starting at its most significant bit. This shaves 32% off the time, for a total of 67.6% shaved off of the original, naive implementation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	e798b3f8a9	ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). When I wrote ocfs2_hamming_encode(), I was following documentation of the algorithm and didn't have quite the (possibly still imperfect) grasp of it I do now. As part of this, I literally hand-coded xor. I would test a bit, and then add that bit via xor to the parity word. I can, of course, just do a single xor of the parity word and the source word (the code buffer bit offset). This cuts CPU usage by 53% on a mostly populated buffer (an inode containing utmp.h inline). Joel Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	9d28cfb73f	ocfs2: Enable metadata checksums. Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	d030cc978e	ocfs2: Validate superblock with checksum and ecc. The superblock is read via a raw call. Validate it after we find it from its signature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	c175a518b4	ocfs2: Checksum and ECC for directory blocks. Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the dirblocks. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Mark Fasheh	87d35a74b1	ocfs2: Add directory block trailers. Future ocfs2 features metaecc and indexed directories need to store a little bit of data in each dirblock. For compatibility, we place this in a trailer at the end of the dirblock. The trailer plays itself as an empty dirent, so that if the features are turned off, it can be reused without requiring a tunefs scan. This code adds the trailer and validates it when the block is read in. [ Mark is the original author, but I reinserted this code before his dir index work. -- Joel ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	8400897249	ocfs2: Use proper journal_access function in xattr.c Change the rest of the naked ocfs2_journal_access() calls in fs/ocfs2/xattr.c to use the appropriate ocfs2_journal_access_*() call for their metadata type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	4311901daa	ocfs2: Pass value buf to ocfs2_remove_value_outside(). ocfs2_remove_value_outside() needs to know the type of buffer it is looking at. Pass in an ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	512620f44d	ocfs2: Use ocfs2_xattr_value_buf in ocfs2_xattr_set_entry(). ocfs2_xattr_set_entry is the function that knows what type of block it is setting into. This is what we wanted from ocfs2_xattr_value_buf. Plus, moving the value buf up into ocfs2_xattr_set_entry() allows us to pass it into ocfs2_xattr_set_value_outside() and ocfs2_xattr_cleanup(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	0c748e9532	ocfs2: Pass value buf to ocfs2_xattr_update_entry(). ocfs2_xattr_update_entry() updates the entry portion of an xattr buffer. This can be part of multiple metadata block types, so pass the buffer in via an ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	b3e5d37905	ocfs2: Pass ocfs2_xattr_value_buf into ocfs2_xattr_value_truncate(). The callers of ocfs2_xattr_value_truncate() now pass in ocfs2_xattr_value_bufs. These callers are the ones that calculated the xv location, so they are the right starting point. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	19b801f45f	ocfs2: Pull ocfs2_xattr_value_buf up into ocfs2_xattr_value_truncate(). Place an ocfs2_xattr_value_buf in ocfs2_xattr_value_truncate() and pass it down to ocfs2_xattr_shrink_size(). We can also pass it into ocfs2_xattr_extend_allocation(), replacing its ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	d72cc72d57	ocfs2: Pull ocfs2_xattr_value_buf up from __ocfs2_remove_xattr_range(). Place an ocfs2_xattr_value_buf in __ocfs2_xattr_shrink_size() and pass it down to __ocfs2_remove_xattr_range(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	2a50a743bd	ocfs2: Create ocfs2_xattr_value_buf. When an ocfs2 extended attribute is large enough to require its own allocation tree, we root it with an ocfs2_xattr_value_root. However, these roots can be a part of inodes, xattr blocks, or xattr buckets. Thus, they need a different journal access function for each container. We wrap the bh, its journal access function, and the value root (xv) in a structure called ocfs2_xattr_valu_buf. This is a package that can be passed around. In this first pass, we simply pass it to the extent tree code. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	4d0e214ee8	ocfs2: Add ecc and checksums to ocfs2 xattr buckets. The xattr bucket can span multiple blocks on disk. We have wrappers for this structure in the code. We use the new multi-block ecc calls to calculate and validate the bucket. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	13723d00e3	ocfs2: Use metadata-specific ocfs2_journal_access_() functions. The per-metadata-type ocfs2_journal_access_() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	ffdd7a5463	ocfs2: Wrap up the common use cases of ocfs2_new_path(). The majority of ocfs2_new_path() calls are: ocfs2_new_path(path_root_bh(otherpath), path_root_el(otherpath)); Let's call that ocfs2_new_path_from_path(). The rest do similar things from struct ocfs2_extent_tree. Let's call those ocfs2_new_path_from_et(). This will make the next change easier. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	50655ae9e9	ocfs2: Add journal_access functions with jbd2 triggers. We create wrappers for ocfs2_journal_access() that are specific to the type of metadata block. This allows us to associate jbd2 commit triggers with the block. The triggers will compute metadata ecc in a future commit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	d6b32bbb3e	ocfs2: block read meta ecc. Add block check calls to the read_block validate functions. This is the almost all of the read-side checking of metaecc. xattr buckets are not checked yet. Writes are also unchecked, and so a read-write mount will quickly fail. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	684ef27837	ocfs2: Add a validation hook for quota block reads. Add a currently-returns-success hook for quota block reads. We'll be adding checks to this. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	70ad1ba7b4	ocfs2: Add the underlying blockcheck code. This is the code that computes crc32 and ecc for ocfs2 metadata blocks. There are high-level functions that check whether the filesystem has the ecc feature, mid-level functions that work on a single block or array of buffer_heads, and the low-level ecc hamming code that can handle multiple buffers like crc32_le(). It's not hooked up to the filesystem yet. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	ab552d5467	ocfs2: Add the on-disk structures for metadata checksums. Define struct ocfs2_block_check, an 8-byte structure containing a 32bit crc32_le and a 16bit hamming code ecc. This will be used for metadata checksums. Add the structure to free spaces in the various metadata structures. Add the OCFS2_FEATURE_INCOMPAT_META_ECC bit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Tao Ma	754938c142	ocfs2/quota: Add QUOTA in mlog_attribute. A new mlog mask has to be added into mlog_attribute before it can be really used in mlog. ML_QUOTA is only added in masklog.h, so add it to the array to enable it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	91f2033fa9	ocfs2: Pass xs->bucket into ocfs2_add_new_xattr_bucket(). Pass the actual target bucket for insert through to ocfs2_add_new_xattr_bucket(). Now growing a bucket has no buffer_head knowledge. ocfs2_add_new_xattr_bucket() leavs xs->bucket in the proper state for insert. However, it doesn't update the rest of the search fields in xs, so we still have to relse() and re-find. That's OK, because everything is cached. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	ed29c0ca14	ocfs2: Move buckets up into ocfs2_add_new_xattr_bucket(). Lift the buckets from ocfs2_add_new_xattr_cluster() up into ocfs2_add_new_xattr_bucket(). Now ocfs2_add_new_xattr_cluster() doesn't deal with buffer_heads. In fact, we no longer have to play get_bh() tricks at all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	012ee91087	ocfs2: Move buckets up into ocfs2_add_new_xattr_cluster(). Lift the buckets from ocfs2_adjust_xattr_cross_cluster() up into ocfs2_add_new_xattr_cluster(). Now ocfs2_adjust_xattr_cross_cluster() doesn't deal with buffer_heads. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	41cb814866	ocfs2: Pass buckets into ocfs2_mv_xattr_bucket_cross_cluster(). Now that ocfs2_adjust_xattr_cross_cluster() has buckets, it can pass them into ocfs2_mv_xattr_bucket_cross_cluster(). It no longer has to care about buffer_heads. The manipulation of first_bh and header_bh moves up to ocfs2_adjust_xattr_cross_cluster(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	92cf3adf48	ocfs2: Start using buckets in ocfs2_adjust_xattr_cross_cluster(). We want to be passing around buckets instead of buffer_heads. Let's get them into ocfs2_adjust_xattr_cross_cluster. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	c58b6032f9	ocfs2: Use ocfs2_mv_xattr_buckets() in ocfs2_mv_xattr_bucket_cross_cluster(). Now that ocfs2_mv_xattr_buckets() can move a partial cluster's worth of buckets, ocfs2_mv_xattr_bucket_cross_cluster() can use it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	54ecb6b6df	ocfs2: ocfs2_mv_xattr_buckets() can handle a partial cluster now. If you look at ocfs2_mv_xattr_bucket_cross_cluster(), you'll notice that two-thirds of the code is almost identical to ocfs2_mv_xattr_buckets(). The only difference is that ocfs2_mv_xattr_buckets() moves a whole cluster's worth, while ocfs2_mv_xattr_bucket_cross_cluster() moves half the cluster. We change ocfs2_mv_xattr_buckets() to allow moving partial clusters. The original caller of ocfs2_mv_xattr_buckets() still moves the whole cluster's worth - it just passes a start_bucket of 0. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	874d65af1c	ocfs2: Rename ocfs2_cp_xattr_cluster() to ocfs2_mv_xattr_buckets(). ocfs2_cp_xattr_cluster() takes the last cluster of an xattr extent, copies its buckets to the front of a new extent, and then shrinks the bucket count of the original extent. So it's really moving the data, not copying it. While we're here, the function doesn't need a buffer_head for the old extent, just the block number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	b5c03e7469	ocfs2: Use ocfs2_cp_xattr_bucket() in ocfs2_mv_xattr_bucket_cross_cluster(). The buffer copy loop of ocfs2_mv_xattr_bucket_cross_cluster() actually looks a lot like ocfs2_cp_xattr_bucket(). Let's just use that instead. We also use bucket operations to update the buckets at the start of each extent. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	2b656c1d6f	ocfs2: Explain t_is_new in ocfs2_cp_xattr_cluster(). I was unsure of the JOURNAL_ACCESS parameters in ocfs2_cp_xattr_cluster(). They're based on the function argument 't_is_new', but I couldn't quite figure out how t_is_new mapped to allocation. ocfs2_cp_xattr_cluster() actually overwrites the target, regardless of t_is_new. Well, I just figured it out. So I'm adding a big fat comment for those who come after me. ocfs2_divide_xattr_cluster() has the same behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	15d609293d	ocfs2: Dirty the entire first bucket in ocfs2_cp_xattr_cluster(). ocfs2_cp_xattr_cluster() takes the last bucket of a full extent and copies it over to a new extent. It then updates the headers of both extents to reflect the new state. It is passed the first bh of the first bucket in order to update that first extent's bucket count. It reads and dirties the first bh of the new extent for the same reason. However, future code wants to always dirty the entire bucket when it is changed. So it is changed to read the entire bucket it is updating for both extents. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	92de109ade	ocfs2: Dirty the entire first bucket in ocfs2_extend_xattr_bucket() ocfs2_extend_xattr_bucket() takes an extent of buckets and shifts some of them down to make room for a new xattr. It is passed the first bh of the first bucket, because that is where we store the number of buckets in the extent. However, future code wants to always dirty the entire bucket when it is changed. So let's pass the entire bucket into this function, skip any block reads (we have them), and add the access/dirty logic. We also can skip passing in the target bucket bh - we only need its block number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	88c3b0622a	ocfs2: Narrow the transaction for deleting xattrs from a bucket. We move the transaction into the loop because in ocfs2_remove_extent, we will double the credits in function ocfs2_extend_rotate_transaction. So if we have a large loop number, we will soon waste much the journal space. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Joel Becker	548b0f22bb	ocfs2: Dirty the entire bucket in ocfs2_bucket_value_truncate() ocfs2_bucket_value_truncate() currently takes the first bh of the bucket, and magically plays around with the value bh - even though the bucket structure in the calling function already has it. In addition, future code wants to always dirty the entire bucket when it is changed. So let's pass the entire bucket into this function, skip any block reads (we have them), and add the access/dirty logic. ocfs2_xattr_update_value_size() is no longer necessary, as it only did one thing other than journal access/dirty. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	df32b3343a	ocfs2/quota: sparse fixes for quota Fix 2 minor things in quota. They are both found by sparse check. 1. an endian bug in ocfs2_local_quota_add_chunk. 2. change olq_alloc_dquot to static. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	e35ff98f7c	ocfs2: fix indendation in ocfs2_dquot_drop_slow Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Jan Kara	9a2f3866c8	ocfs2: Fix build warnings (64-bit types vs long long) fs/ocfs2/quota_local.c: In function 'olq_set_dquot': fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_global.c: In function '__ocfs2_sync_dquot': fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	53a3604610	ocfs2: Make ocfs2_get_quota_block() consistent with ocfs2_read_quota_block() Make function return error status and not buffer pointer so that it's consistent with ocfs2_read_quota_block(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	af09e51b68	ocfs2: Fix oops when extending quota files We have to mark buffer as uptodate before calling ocfs2_journal_access() and ocfs2_set_buffer_uptodate() does not do this for us. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Joel Becker	85eb8b73d6	ocfs2: Fix ocfs2_read_quota_block() error handling. ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to match ocfs2_read_blocks(). The quota code, converting from ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in ocfs2_read_quota_block(). Unfortunately, the prototype of ocfs2_read_quota_block() matches the old prototype of ocfs2_bread(). The problem is that ocfs2_bread() returned the buffer head, and callers assumed that a NULL pointer was indicative of error. It wasn't. This is why ocfs2_bread() took an int*err argument as well. The new prototype of ocfs2_read_virt_blocks() avoids this error handling confusion. Let's change ocfs2_read_quota_block() to match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	57a09a7b3d	ocfs2: Add missing initialization Add missing variable initialization to ocfs2_dquot_drop_slow(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Mark Fasheh	b86c86fa1f	ocfs2: Use BH_JBDPrivateStart instead of BH_Unshadow This is safer. We no longer have to worry about tracking changes to jbd_state_bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	19ece546a4	ocfs2: Enable quota accounting on mount, disable on umount Enable quota usage tracking on mount and disable it on umount. Also add support for quota on and quota off quotactls and usrquota and grpquota mount options. Add quota features among supported ones. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	2205363dce	ocfs2: Implement quota recovery Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Mark Fasheh	171bf93ce1	ocfs2: Periodic quota syncing This patch creates a work queue for periodic syncing of locally cached quota information to the global quota files. We constantly queue a delayed work item, to get the periodic behavior. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jan Kara <jack@suse.cz>	2009-01-05 08:40:24 -08:00
Jan Kara	a90714c150	ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	9e33d69f55	ocfs2: Implementation of local and global quota file handling For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	bbbd0eb34b	ocfs2: Mark system files as not subject to quota accounting Mark system files as not subject to quota accounting. This prevents possible recursions into quota code and thus deadlocks. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	1a224ad11e	ocfs2: Assign feature bits and system inodes to quota feature and quota files Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	90e86a63ea	ocfs2: Support nested transactions OCFS2 can easily support nested transactions. We just have to take care and not spoil statistics acquire semaphore unnecessarily. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Tao Ma	9f868f16e4	ocfs2/xattr: Restore not_found in xis During an xattr set, when we move a xattr which was stored in inode to the outside bucket, we have to delete it and it will use the old value of xis->not_found. xis->not_found is removed by ocfs2_calc_xattr_set_need though, so we must restore it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Tao Ma	97aff52ae1	ocfs2/xattr: Fix a bug in xattr allocation estimation When we extend one xattr's value to a large size, the old value size might be smaller than the size of a value root. In those cases, we still need to guess the metadata allocation. Reported-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Mark Fasheh	53ef99cad9	ocfs2: Remove JBD compatibility layer JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Joel Becker	511308d90b	ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks() Now that we've centralized the ocfs2_read_virt_blocks() code, let's use it in ocfs2_read_dir_block(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Joel Becker	a8549fb5ab	ocfs2: Wrap virtual block reads in ocfs2_read_virt_blocks() The ocfs2_read_dir_block() function really maps an inode's virtual blocks to physical ones before calling ocfs2_read_blocks(). Let's extract that to common code, because other places might want to do that. Other than the block number being virtual, ocfs2_read_virt_blocks() takes the same arguments as ocfs2_read_blocks(). It converts those virtual block numbers to physical before calling ocfs2_read_blocks() directly. If the blocks asked for are discontiguous, this can mean multiple calls to ocfs2_read_blocks(), but this is mostly hidden from the caller. Like ocfs2_read_blocks(), the caller can pass in an existing buffer_head. This is usually done to pick up some readahead I/O. ocfs2_read_virt_blocks() checks the buffer_head's block number against the extent map - it must match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:54 -08:00
Joel Becker	970e4936d7	ocfs2: Validate metadata only when it's read from disk. Add an optional validation hook to ocfs2_read_blocks(). Now the validation function is only called when a block was actually read off of disk. It is not called when the buffer was in cache. We add a buffer state bit BH_NeedsValidate to flag these buffers. It must always be one higher than the last JBD2 buffer state bit. The dinode, dirblock, extent_block, and xattr_block validators are lifted to this scheme directly. The group_descriptor validator needs to be split into two pieces. The first part only needs the gd buffer and is passed to ocfs2_read_block(). The second part requires the dinode as well, and is called every time. It's only 3 compares, so it's tiny. This also allows us to clean up the non-fatal gd check used by resize.c. It now has no magic argument. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	4ae1d69bed	ocfs2: Wrap xattr block reads in a dedicated function We weren't consistently checking xattr blocks after we read them. Most places checked the signature, but none checked xb_blkno or xb_fs_signature. Create a toplevel ocfs2_read_xattr_block() that does the read and the validation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	a22305cc69	ocfs2: Wrap dirblock reads in a dedicated function. We have ocfs2_bread() as a vestige of the original ext-based dir code. It's only used by directories, though. Turn it into ocfs2_read_dir_block(), with a prototype matching the other metadata read functions. It's set up to validate dirblocks when the time comes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	5e96581a37	ocfs2: Wrap extent block reads in a dedicated function. We weren't consistently checking extent blocks after we read them. Most places checked the signature, but none checked h_blkno or h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does the read and the validation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	4203530613	ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks. Random places in the code would check a group descriptor bh to see if it was valid. The previous commit unified descriptor block reads, validating all block reads in the same place. Thus, these checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated descriptor read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	68f64d471b	ocfs2: Wrap group descriptor reads in a dedicated function. We have a clean call for validating group descriptors, but every place that wants the always does a read_block()+validate() call pair. Create a toplevel ocfs2_read_group_descriptor() that does the right thing. This allows us to leverage the single call point later for fancier handling. We also add validation of gd->bg_generation against the superblock and gd->bg_blkno against the block we thought we read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	57e3e79711	ocfs2: Consolidate validation of group descriptors. Currently the validation of group descriptors is directly duplicated so that one version can error the filesystem and the other (resize) can just report the problem. Consolidate to one function that takes a boolean. Wrap that function with the old call for the old users. This is in preparation for lifting the read+validate step into a single function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	10995aa245	ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. Random places in the code would check a dinode bh to see if it was valid. Not only did they do different levels of validation, they handled errors in different ways. The previous commit unified inode block reads, validating all block reads in the same place. Thus, these haphazard checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated inode read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Joel Becker	b657c95c11	ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Tiger Yang	a68979b857	ocfs2: add mount option and Kconfig option for acl This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL" and mount options "acl" to enable acls in Ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Tiger Yang	89c38bd0ad	ocfs2: add ocfs2_init_acl in mknod We need to get the parent directories acls and let the new child inherit it. To this, we add additional calculations for data/metadata allocation. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	060bc66dd5	ocfs2: add ocfs2_acl_chmod This function is used to update acl xattrs during file mode changes. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00

... 5 6 7 8 9 ...

1378 Commits