OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dmitry Monakhov	84061e07c5	ext4: init statistics after journal recovery Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 08:00:00 -04:00
Eric Sandeen	6b0310fbf0	ext4: don't return to userspace after freezing the fs with a mutex held ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 02:00:00 -04:00
Linus Torvalds	39f1cd635c	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs ext4: Don't use delayed allocation by default when used instead of ext3 ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3 ext4: Fix estimate of # of blocks needed to write indirect-mapped files	2010-03-25 14:10:53 -07:00
Jan Kara	ba69f9ab7d	ext4: Don't use delayed allocation by default when used instead of ext3 When ext4 driver is used to mount a filesystem instead of the ext3 file system driver (through CONFIG_EXT4_USE_FOR_EXT23), do not enable delayed allocation by default since some ext3 users and application writers have developed unfortunate expectations about the safety of writing files on systems subject to sudden and violent death without using fsync(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-24 20:18:37 -04:00
Theodore Ts'o	37f328eb60	ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3 Oops. (Blush.) Thanks to Sedat Dilek for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-24 20:06:41 -04:00
Linus Torvalds	c32da02342	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits) doc: fix typo in comment explaining rb_tree usage Remove fs/ntfs/ChangeLog doc: fix console doc typo doc: cpuset: Update the cpuset flag file Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed Remove drivers/parport/ChangeLog Remove drivers/char/ChangeLog doc: typo - Table 1-2 should refer to "status", not "statm" tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h devres/irq: Fix devm_irq_match comment Remove reference to kthread_create_on_cpu tree-wide: Assorted spelling fixes tree-wide: fix 'lenght' typo in comments and code drm/kms: fix spelling in error message doc: capitalization and other minor fixes in pnp doc devres: typo fix s/dev/devm/ Remove redundant trailing semicolons from macros fix typo "definetly" -> "definitely" in comment tree-wide: s/widht/width/g typo in comments ... Fix trivial conflict in Documentation/laptops/00-INDEX	2010-03-12 16:04:50 -08:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Linus Torvalds	e213e26ab3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c	2010-03-05 13:20:53 -08:00
Christoph Hellwig	871a293155	dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	9f75475802	dquot: cleanup dquot drop routine Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	257ba15ced	dquot: move dquot drop responsibility into the filesystem Currently clear_inode calls vfs_dq_drop directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the drop inside the ->clear_inode superblock operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	b43fa8284d	dquot: cleanup dquot transfer routine Get rid of the transfer dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_transfer helper to __dquot_transfer and vfs_dq_transfer to dquot_transfer to have a consistent namespace, and make the new dquot_transfer return a normal negative errno value which all callers expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:29 +01:00
Christoph Hellwig	63936ddaa1	dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Christoph Hellwig	5dd4056db8	dquot: cleanup space allocation / freeing routines Get rid of the alloc_space, free_space, reserve_space, claim_space and release_rsv dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Move shared logic into the common __dquot_alloc_space, dquot_claim_space_nodirty and __dquot_free_space low-level methods, and rationalize the wrappers around it to move as much as possible code into the common block for CONFIG_QUOTA vs not. Also rename all these helpers to be named dquot_* instead of vfs_dq_*. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Dmitry Monakhov	67eeb5685d	ext4: Fix ext4_quota_write cross block boundary behaviour We always assume what dquot update result in changes in one data block But ext4_quota_write() function may handle cross block boundary writes In fact if this ever happen it will result in incorrect journal credits reservation, and later a BUG_ON. As soon this never happen the boundary cross loop is NOOP. In order to make things straight let's remove this loop and assert cross boundary condition. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 08:08:51 -05:00
Frank Mayhar	273df556b6	ext4: Convert BUG_ON checks to use ext4_error() instead Convert a bunch of BUG_ONs to emit a ext4_error() message and return EIO. This is a first pass and most notably does _not_ cover mballoc.c, which is a morass of void functions. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 11:46:09 -05:00
Jiaying Zhang	744692dc05	ext4: use ext4_get_block_write in buffer write Allocate uninitialized extent before ext4 buffer write and convert the extent to initialized after io completes. The purpose is to make sure an extent can only be marked initialized after it has been written with new data so we can safely drop the i_mutex lock in ext4 DIO read without exposing stale data. This helps to improve multi-thread DIO read performance on high-speed disks. Skip the nobh and data=journal mount cases to make things simple for now. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-04 16:14:02 -05:00
Jiaying Zhang	c7064ef13b	ext4: mechanical rename some of the direct I/O get_block's identifiers This commit renames some of the direct I/O's block allocation flags, variables, and functions introduced in Mingming's "Direct IO for holes and fallocate" patches so that they can be used by ext4's buffered write path as well. Also changed the related function comments accordingly to cover both direct write and buffered write cases. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-02 13:28:44 -05:00
Dmitry Monakhov	437ca0fda3	ext4: deprecate obsoleted mount options Declare following list of mount options as deprecated: - bsddf, miniddf - grpid, bsdgroups, nogrpid, sysvgroups Declare following list of default mount options as deprecated: - bsdgroups Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-01 22:29:21 -05:00
Dmitry Monakhov	56c50f11f4	ext4: trivial quota cleanup The patch is aimed to reorganize and simplify quota code a bit. Quota code is itself complex enough, but we can make it more readable in some places: - Move quota option parsing to separate functions. - Simplify old-quota and journaled-quota mix check. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-01 23:28:41 -05:00
Dmitry Monakhov	482a74258f	ext4: mount flags manipulation cleanup Replace intermediate EXT4_MOUNT_XXX flags manipulation to corresponding macro. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-24 11:35:32 -05:00
Eric Sandeen	12062dddda	ext4: move __func__ into a macro for ext4_warning, ext4_error Just a pet peeve of mine; we had a mishash of calls with either __func__ or "function_name" and the latter tends to get out of sync. I think it's easier to just hide the __func__ in a macro, and it'll be consistent from then on. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 14:19:27 -05:00
Thadeu Lima de Souza Cascardo	d6b198bc8a	fix ext3/ext4 comment typo compain -> complain Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:36 +01:00
Theodore Ts'o	9d0be50230	ext4: Calculate metadata requirements more accurately In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit 0637c6f, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-01-01 02:41:30 -05:00
Andrew Morton	a6b43e3825	ext4: fix unsigned long long printk warning in super.c sparc64 allmodconfig: fs/ext4/super.c: In function `lifetime_write_kbytes_show': fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-23 07:48:08 -05:00
Theodore Ts'o	51b7e3c9fb	ext4: add module aliases for ext2 and ext3 Add module aliases for ext2 and ext3 when CONFIG_EXT4_USE_FOR_EXT23 is set. This makes the existing user-space stuff like mkinitrd working as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-21 10:56:09 -05:00
Dmitry Monakhov	a9e7f44720	ext4: Convert to generic reserved quota's space management. This patch also fixes write vs chown race condition. Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-23 13:33:55 +01:00
André Goddard Rosa	e7d2860b69	tree-wide: convert open calls to remove spaces to skip_spaces() lib function Makes use of skip_spaces() defined in lib/string.c for removing leading spaces from strings all over the tree. It decreases lib.a code size by 47 bytes and reuses the function tree-wide: text data bss dec hex filename 64688 584 592 65864 10148 (TOTALS-BEFORE) 64641 584 592 65817 10119 (TOTALS-AFTER) Also, while at it, if we see (str && isspace(str)), we can be sure to remove the first condition (str) as the second one (isspace(str)) also evaluates to 0 whenever str == 0, making it redundant. In other words, "a char equals zero is never a space". Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below, and found occurrences of this pattern on 3 more files: drivers/leds/led-class.c drivers/leds/ledtrig-timer.c drivers/video/output.c @@ expression str; @@ ( // ignore skip_spaces cases while (str && isspace(str)) { \(str++;\\|++str;\) } \| - str && isspace(*str) ) Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Julia Lawall <julia@diku.dk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Neil Brown <neilb@suse.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: David Howells <dhowells@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:32 -08:00
Linus Torvalds	3126c136bc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits) ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() ext3: Fix data / filesystem corruption when write fails to copy data ext4: Support for 64-bit quota format ext3: Support for vfsv1 quota format quota: Implement quota format with 64-bit space and inode limits quota: Move definition of QFMT_OCFS2 to linux/quota.h ext2: fix comment in ext2_find_entry about return values ext3: Unify log messages in ext3 ext2: clear uptodate flag on super block I/O error ext2: Unify log messages in ext2 ext3: make "norecovery" an alias for "noload" ext3: Don't update the superblock in ext3_statfs() ext3: journal all modifications in ext3_xattr_set_handle ext2: Explicitly assign values to on-disk enum of filetypes quota: Fix WARN_ON in lookup_one_len const: struct quota_format_ops ubifs: remove manual O_SYNC handling afs: remove manual O_SYNC handling kill wait_on_page_writeback_range vfs: Implement proper O_SYNC semantics ...	2009-12-11 15:31:13 -08:00
Jan Kara	5a20bdfcdc	ext4: Support for 64-bit quota format Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:54 +01:00
Theodore Ts'o	a214238d3b	ext4: Do not override ext2 or ext3 if built they are built as modules The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-09 21:09:58 -05:00
Eric Sandeen	15121c18a2	ext4: Fix optional-arg mount options We have 2 mount options, "barrier" and "auto_da_alloc" which may or may not take a 1/0 argument. This causes the ext4 superblock mount code to subtract uninitialized pointers and pass the result to kmalloc, which results in very noisy failures. Per Ted's suggestion, initialize the args struct so that we know whether match_token() found an argument for the option, and skip match_int() if not. Also, return error (0) from parse_options if we thought we found an argument, but match_int() Fails. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 20:17:55 -05:00
Jan Kara	b436b9bef8	ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 23:51:10 -05:00
Josef Bacik	d4edac314e	ext4: wait for log to commit when umounting There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-08 21:48:58 -05:00
Theodore Ts'o	24b584240a	ext4: Use ext4 file system driver for ext2/ext3 file system mounts Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-12-07 14:08:51 -05:00
Eric Sandeen	e3bb52ae2b	ext4: make "norecovery" an alias for "noload" Users on the linux-ext4 list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext4. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-19 14:28:50 -05:00
Eric Sandeen	5328e63531	ext4: make trim/discard optional (and off by default) It is anticipated that when sb_issue_discard starts doing real work on trim-capable devices, we may see issues. Make this mount-time optional, and default it to off until we know that things are working out OK. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-19 14:25:42 -05:00
Theodore Ts'o	3f8fb9490e	ext4: don't update the superblock in ext4_statfs() commit `a71ce8c6c9` updated ext4_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-11-23 07:24:52 -05:00
Theodore Ts'o	cf40db137c	ext4: remove failed journal checksum check Now that we are checking for failed journal checksums in the jbd2 layer, we don't need to check in the ext4 mount path --- since a checksum fail will result in ext4_load_journal() returning an error, causing the file system to refuse to be mounted until e2fsck can deal with the problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-11-22 21:00:01 -05:00
Theodore Ts'o	503358ae01	ext4: avoid divide by zero when trying to mount a corrupted file system If s_log_groups_per_flex is greater than 31, then groups_per_flex will will overflow and cause a divide by zero error. This can cause kernel BUG if such a file system is mounted. Thanks to Nageswara R Sastry for analyzing the failure and providing an initial patch. http://bugzilla.kernel.org/show_bug.cgi?id=14287 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-11-23 07:24:46 -05:00
Linus Torvalds	d4da6c9ccf	Revert "ext4: Remove journal_checksum mount option and enable it by default" This reverts commit `d0646f7b63`, as requested by Eric Sandeen. It can basically cause an ext4 filesystem to miss recovery (and thus get mounted with errors) if the journal checksum does not match. Quoth Eric: "My hand-wavy hunch about what is happening is that we're finding a bad checksum on the last partially-written transaction, which is not surprising, but if we have a wrapped log and we're doing the initial scan for head/tail, and we abort scanning on that bad checksum, then we are essentially running an unrecovered filesystem. But that's hand-wavy and I need to go look at the code. We lived without journal checksums on by default until now, and at this point they're doing more harm than good, so we should revert the default-changing commit until we can fix it and do some good power-fail testing with the fixes in place." See http://bugzilla.kernel.org/show_bug.cgi?id=14354 for all the gory details. Requested-by: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mathias Burén <mathias.buren@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-11-02 10:15:27 -08:00
Eric Sandeen	f0e2dfa7f3	ext4: drop ext4dev compat Kconfig & super.c promised it'd be gone by 2.6.31, so it's about time to drop it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-10-01 02:21:07 -04:00
Theodore Ts'o	296c355cd6	ext4: Use tracepoints for mb_history trace file The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-30 00:32:42 -04:00
Theodore Ts'o	90576c0b9a	ext4, jbd2: Drop unneeded printks at mount and unmount time There are a number of kernel printk's which are printed when an ext4 filesystem is mounted and unmounted. Disable them to economize space in the system logs. In addition, disabling the mballoc stats by default saves a number of unneeded atomic operations for every block allocation or deallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 15:51:30 -04:00
Curt Wohlgemuth	d3d1faf6a7	ext4: Handle nested ext4_journal_start/stop calls without a journal This patch fixes a problem with handling nested calls to ext4_journal_start/ext4_journal_stop, when there is no journal present. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 11:01:03 -04:00
Mingming Cao	8d5d02e6b1	ext4: async direct IO for holes and fallocate support For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2009-09-28 15:48:29 -04:00
Mingming Cao	4c0425ff68	ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-28 15:48:41 -04:00
Theodore Ts'o	55138e0bc2	ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks Work around problems in the writeback code to force out writebacks in larger chunks than just 4mb, which is just too small. This also works around limitations in the ext4 block allocator, which can't allocate more than 2048 blocks at a time. So we need to defeat the round-robin characteristics of the writeback code and try to write out as many blocks in one inode before allowing the writeback code to move on to another inode. We add a a new per-filesystem tunable, max_writeback_mb_bump, which caps this to a default of 128mb per inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-29 13:31:31 -04:00
Alexey Dobriyan	0d54b217a2	const: make struct super_block::s_qcop const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Alexey Dobriyan	61e225dc34	const: make struct super_block::dq_op const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:24 -07:00
Eric Sandeen	fb0a387dcd	ext4: limit block allocations for indirect-block files to < 2^32 Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-16 14:45:10 -04:00
Theodore Ts'o	3661d28615	ext4: Fix include/trace/events/ext4.h to work with Systemtap Using relative pathnames in #include statements interacts badly with SystemTap, since the fs/ext4/*.h header files are not packaged up as part of a distribution kernel's header files. Since systemtap doesn't use TP_fast_assign(), we can use a blind structure definition and then make sure the needed header files are defined before the ext4 source files #include the trace/events/ext4.h header file. https://bugzilla.redhat.com/show_bug.cgi?id=512478 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-14 22:59:50 -04:00
Theodore Ts'o	7ad9bb651f	ext4: Fix initalization of s_flex_groups The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-11 16:51:28 -04:00
Theodore Ts'o	71290b368a	ext4: Don't update superblock write time when filesystem is read-only This avoids updating the superblock write time when we are mounting the root file system read/only but we need to replay the journal; at that point, for people who are east of GMT and who make their clock tick in localtime for Windows bug-for-bug compatibility, and this will cause e2fsck to complain and force a full file system check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-10 17:31:04 -04:00
Theodore Ts'o	d0646f7b63	ext4: Remove journal_checksum mount option and enable it by default There's no real cost for the journal checksum feature, and we should make sure it is enabled all the time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-09-05 12:50:43 -04:00
Eric Sandeen	a13fb1a453	ext4: Add feature set check helper for mount & remount paths A user reported that although his root ext4 filesystem was mounting fine, other filesystems would not mount, with the: "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF" error on his 32-bit box built without CONFIG_LBDAF. This is because the test at mount time for this situation was not being re-checked on remount, and the normal boot process makes an ro->rw transition, so this was being missed. Refactor to make a common helper function to test the filesystem features against the type of mount request (RO vs. RW) so that we stay consistent. Addresses Red-Hat-Bugzilla: #517650 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-18 00:20:23 -04:00
Eric Sandeen	bf43d84b18	ext4: reject too-large filesystems on 32-bit kernels ext4 will happily mount a > 16T filesystem on a 32-bit box, but this is not safe; writes to the block device will wrap past 16T and the page cache can't index past 16T (232 index * 4k pages). Adding another test to the existing "too many sectors" test should do the trick. Add a comment, a relevant return value, and fix the reference to the CONFIG_LBD(AF) option as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-08-17 23:48:51 -04:00
Theodore Ts'o	78f1ddbb49	ext4: Avoid null pointer dereference when decoding EROFS w/o a journal We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 23:09:47 -04:00
Al Viro	d4bfe2f76d	switch ext4 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Linus Torvalds	31583d6acf	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR	2009-06-19 17:43:04 -07:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Andreas Dilger	11013911da	ext4: teach the inode allocator to use a goal inode number Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:45:35 -04:00
Theodore Ts'o	4ab2f15b7f	ext4: move the abort flag from s_mount_opts to s_mount_flags We're running out of space in the mount options word, and EXT4_MOUNT_ABORT isn't really a mount option, but a run-time flag. So move it to become EXT4_MF_FS_ABORTED in s_mount_flags. Also remove bogus ext2_fs.h / ext4.h simultaneous #include protection, which can never happen. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:36 -04:00
Theodore Ts'o	7f4520cc62	ext4: change s_mount_opt to be an unsigned int We can only fit 32 options in s_mount_opt because an unsigned long is 32-bits on a x86 machine. So use an unsigned int to save space on 64-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:41 -04:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	ebc1ac1645	->write_super lock_super pushdown Push down lock_super into ->write_super instances and remove it from the caller. Following filesystem don't need ->s_lock in ->write_super and are skipped: * bfs, nilfs2 - no other uses of s_lock and have internal locks in ->write_super * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in ->write_super * xfs - no other uses of s_lock and uses internal lock (buffer lock on superblock buffer) to serialize ->write_super. Also xfs_fs_write_super is superflous and will go away in the next merge window Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:09 -04:00
Al Viro	bbd6851a32	Push lock_super() into the ->remount_fs() of filesystems that care about it Note that since we can't run into contention between remount_fs and write_super (due to exclusion on s_umount), we have to care only about filesystems that touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs do need it; fat doesn't since its ->remount_fs() only accesses assign-once data (basically, it's "we have no atime on directories and only have atime on files for vfat; force nodiratime and possibly noatime into *flags"). [folded a build fix from hch] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Al Viro	a9e220f832	No need to do lock_super() for exclusion in generic_shutdown_super() We can't run into contention on it. All other callers of lock_super() either hold s_umount (and we have it exclusive) or hold an active reference to superblock in question, which prevents the call of generic_shutdown_super() while the reference is held. So we can replace lock_super(s) with get_fs_excl() in generic_shutdown_super() (and corresponding change for unlock_super(), of course). Since ext4 expects s_lock held for its put_super, take lock_super() into it. The rest of filesystems do not care at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	8c85e12512	remove ->write_super call in generic_shutdown_super We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Eric Sandeen	b31e15527a	ext4: Change all super.c messages to print the device This patch changes ext4 super.c to include the device name with all warning/error messages, by using a new utility function ext4_msg. It's a rather large patch, but very mechanic. I left debug printks alone. This is a straightforward port of a patch which Andi Kleen did for ext3. Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-04 17:36:36 -04:00
Andreas Dilger	0b8e58a140	ext4: super.c whitespace cleanup Cleanup of whitespace and formatting. Initially driven by confusing indents for the ext4_{block,inode}_bitmap() et. al. helper routines, but figured I'd cleanup some other 80-column wrapping and other indenting problems at the same time. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-03 17:59:28 -04:00
Theodore Ts'o	88b6edd17c	ext4: Clean up calls to ext4_get_group_desc() If the caller isn't planning on modifying the block group descriptors, there's no need to pass in a pointer to a struct buffer_head. Nuking this saves a tiny amount of CPU time and stack space usage. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-25 11:50:39 -04:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Manish Katiyar	f68301656b	ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 23:52:44 -04:00
Theodore Ts'o	6fd058f779	ext4: Add a comprehensive block validity check to ext4_get_blocks() To catch filesystem bugs or corruption which could lead to the filesystem getting severly damaged, this patch adds a facility for tracking all of the filesystem metadata blocks by contiguous regions in a red-black tree. This allows quick searching of the tree to locate extents which might overlap with filesystem metadata blocks. This facility is also used by the multi-block allocator to assure that it is not allocating blocks out of the system zone, as well as by the routines used when reading indirect blocks and extents information from disk to make sure their contents are valid. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 15:38:01 -04:00
Aneesh Kumar K.V	955ce5f5be	ext4: Convert ext4_lock_group to use sb_bgl_lock We have sb_bgl_lock() and ext4_group_info.bb_state bit spinlock to protech group information. The later is only used within mballoc code. Consolidate them to use sb_bgl_lock(). This makes the mballoc.c code much simpler and also avoid confusion with two locks protecting same info. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 20:35:09 -04:00
Curt Wohlgemuth	f40339031b	ext4: Make the length of the mb_history file tunable In memory-constrained systems with many partitions, the ~68K for each partition for the mb_history buffer can be excessive. This patch adds a new mount option, mb_history_length, as well as a way of setting the default via a module parameter (or via a sysfs parameter in /sys/module/ext4/parameter/default_mb_history_length). If the mb_history_length is set to zero, the mb_history facility is disabled entirely. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 20:27:20 -04:00
Theodore Ts'o	bb23c20a85	ext4: Move fs/ext4/group.h into ext4.h Move the function prototypes in group.h into ext4.h so they are all defined in one place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 19:44:44 -04:00
Theodore Ts'o	596397b77c	ext4: Move fs/ext4/namei.h into ext4.h The fs/ext4/namei.h header file had only a single function declaration, and should have never been a standalone file. Move it into ext4.h, where should have been from the beginning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 13:49:15 -04:00
Theodore Ts'o	9bffad1ed2	ext4: convert instrumentation from markers to tracepoints Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 11:48:11 -04:00
Theodore Ts'o	32ed5058ce	ext4: Replace lock/unlock_super() with an explicit lock for resizing Use a separate lock to protect s_groups_count and the other block group descriptors which get changed via an on-line resize operation, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:53:39 -04:00
Theodore Ts'o	3b9d4ed266	ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Use a separate lock to protect the orphan list, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:54:04 -04:00
Theodore Ts'o	a63c9eb2ce	ext4: ext4_mark_recovery_complete() doesn't need to use lock_super The function ext4_mark_recovery_complete() is called from two call paths: either (a) while mounting the filesystem, in which case there's no danger of any other CPU calling write_super() until the mount is completed, and (b) while remounting the filesystem read-write, in which case the fs core has already locked the superblock. This also allows us to take out a very vile unlock_super()/lock_super() pair in ext4_remount(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 01:59:42 -04:00
Theodore Ts'o	114e9fc907	ext4: Remove outdated comment about lock_super() ext4_fill_super() is no longer called by read_super(), and it is no longer called with the superblock locked. The unlock_super()/lock_super() is no longer present, so this comment is entirely superfluous. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 15:48:07 -04:00
Theodore Ts'o	8df9675f8b	ext4: Avoid races caused by on-line resizing and SMP memory reordering Ext4's on-line resizing adds a new block group and then, only at the last step adjusts s_groups_count. However, it's possible on SMP systems that another CPU could see the updated the s_group_count and not see the newly initialized data structures for the just-added block group. For this reason, it's important to insert a SMP read barrier after reading s_groups_count and before reading any (for example) the new block group descriptors allowed by the increased value of s_groups_count. Unfortunately, we rather blatently violate this locking protocol documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes happen relatively rarely, and (2) it seems rare that the filesystem code will immediately try to use just-added block group before any memory ordering issues resolve themselves. So apparently problems here are relatively hard to hit, since ext3 has been vulnerable to the same issue for years with no one apparently complaining. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 08:50:38 -04:00
Theodore Ts'o	9ca92389c5	ext4: Use separate super_operations structure for no_journal filesystems By using a separate super_operations structure for filesystems that have and don't have journals, we can simply ext4_write_super() --- which is only needed when no journal is present --- and ext4_freeze(), ext4_unfreeze(), and ext4_sync_fs(), which are only needed when the journal is present. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 12:52:25 -04:00
Theodore Ts'o	7234ab2a55	ext4: Fix and simplify s_dirt handling The s_dirt flag wasn't completely handled correctly, but it didn't really matter when journalling was enabled. It turns out that when ext4 runs without a journal, we don't clear s_dirt in places where we should have, with the result that the high-level write_super() function was writing the superblock when it wasn't necessary. So we fix this by making ext4_commit_super() clear the s_dirt flag, and removing many of the other places where s_dirt is manipulated. When journalling is enabled, the s_dirt flag might be left set more often, but s_dirt really doesn't matter when journalling is enabled. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-30 21:24:04 -04:00
Theodore Ts'o	e2d670523c	ext4: Simplify ext4_commit_super()'s function signature The ext4_commit_super() function took both a struct super_block * and a struct ext4_super_block *, but the struct ext4_super_block can be derived from the struct super_block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 00:33:44 -04:00
Theodore Ts'o	f7c439504c	ext4: Use is_power_of_2() for clarity Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 23:31:59 -04:00
Theodore Ts'o	c5ca7c7636	ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array For very large filesystems, the s_flex_groups array can get quite big. For example, a filesystem that can be resized up to 16TB will have 8192 flex groups (assuming the default flex_bg size of 16), so the array is 96k, which is very marginal for kmalloc(). On the other hand, a 160GB filesystem without the resize_inode feature will only require 960 bytes. So we try to allocate the array first using kmalloc(), and if that fails, we'll try to use vmalloc() instead. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-27 22:48:48 -04:00
From: Thiemo Nagel	0f2ddca66d	ext4: check block device size on mount Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-07 14:07:47 -04:00
Theodore Ts'o	06705bff91	ext4: Regularize mount options Add support for using the mount options "barrier" and "nobarrier", and "auto_da_alloc" and "noauto_da_alloc", which is more consistent than "barrier=<0\|1>" or "auto_da_alloc=<0\|1>". Most other ext3/ext4 mount options use the foo/nofoo naming convention. We allow the old forms of these mount options for backwards compatibility. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-28 10:59:57 -04:00
Theodore Ts'o	afd4672dc7	ext4: Add auto_da_alloc mount option Add a mount option which allows the user to disable automatic allocation of blocks whose allocation by delayed allocation when the file was originally truncated or when the file is renamed over an existing file. This feature is intended to save users from the effects of naive application writers, but it reduces the effectiveness of the delayed allocation code. This mount option disables this safety feature, which may be desirable for prodcutions systems where the risk of unclean shutdowns or unexpected system crashes is low. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-16 23:12:23 -04:00
Theodore Ts'o	7d39db14a4	ext4: Use struct flex_groups to calculate get_orlov_stats() Instead of looping over all of the block groups in a flex group summing their summary statistics, start tracking used_dirs in struct flex_groups, and use struct flex_groups instead. This should save a bit of CPU for mkdir-heavy workloads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:31:53 -05:00
Theodore Ts'o	9f24e4208f	ext4: Use atomic_t's in struct flex_groups Reduce pressure on the sb_bgl_lock family of locks by using atomic_t's to track the number of free blocks and inodes in each flex_group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:09:10 -05:00
Theodore Ts'o	b713a5ec55	ext4: remove /proc tuning knobs Remove tuning knobs in /proc/fs/ext4/<dev/* since they have been replaced by knobs in sysfs at /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-31 09:11:14 -04:00
Theodore Ts'o	3197ebdb13	ext4: Add sysfs support Add basic sysfs support so that information about the mounted filesystem and various tuning parameters can be accessed via /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-31 09:10:09 -04:00
Theodore Ts'o	afc32f7ee9	ext4: Track lifetime disk writes Add a new superblock value which tracks the lifetime amount of writes to the filesystem. This is useful in estimating the amount of wear on solid state drives (SSD's) caused by writes to the filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 19:39:58 -05:00
Pekka Enberg	705895b611	ext4: allocate ->s_blockgroup_lock separately As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit which makes it a very bad fit for SLAB allocators. The culprit of the wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when NR_CPUS >= 32. To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2 page in the worst case, separately. This shinks down struct ext4_sb_info enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of 32 KB saving 14 KB of memory. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-15 18:07:52 -05:00
Jan Kara	a269eb1829	ext4: Use lowercase names of quota functions Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mingming Cao <cmm@us.ibm.com> CC: linux-ext4@vger.kernel.org	2009-03-26 02:18:36 +01:00
Mingming Cao	60e58e0f30	ext4: quota reservation for delayed allocation Uses quota reservation/claim/release to handle quota properly for delayed allocation in the three steps: 1) quotas are reserved when data being copied to cache when block allocation is defered 2) when new blocks are allocated. reserved quotas are converted to the real allocated quota, 2) over-booked quotas for metadata blocks are released back. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:34 +01:00
Jan Kara	edf7245362	ext4: Remove unnecessary quota functions ext4_dquot_initialize() and ext4_dquot_drop() is no longer needed because of modified quota locking. Signed-off-by: Jan Kara <jack@suse.cz>	2009-03-26 02:18:34 +01:00
Theodore Ts'o	8b1a8ff8b3	ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze() Commit `c4be0c1d` added error checking to ext4_freeze() when calling ext4_commit_super(). Unfortunately the patch failed to remove the original call to ext4_commit_super(), with the net result that when freezing the filesystem, the superblock gets written twice, the first time without error checking. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-28 00:08:53 -05:00
Jan Kara	9eddacf9e9	Revert "ext4: wait on all pending commits in ext4_sync_fs()" This undoes commit `14ce0cb411`. Since jbd2_journal_start_commit() is now fixed to return 1 when we started a transaction commit, there's some transaction waiting to be committed or there's a transaction already committing, we don't need to call ext4_force_commit() in ext4_sync_fs(). Furthermore ext4_force_commit() can unnecessarily create sync transaction which is expensive so it's worthwhile to remove it when we can. http://bugzilla.kernel.org/show_bug.cgi?id=12224 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: linux-ext4@vger.kernel.org	2009-02-10 06:46:05 -05:00
Takashi Sato	c4be0c1dc4	filesystem freeze: add error handling of write_super_lockfs/unlockfs Currently, ext3 in mainline Linux doesn't have the freeze feature which suspends write requests. So, we cannot take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. In many case, a commercial filesystem (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. If Linux's standard filesystem ext3 has the freeze feature, we can do it without a commercial filesystem. So I have implemented the ioctls of the freeze feature. I think we can take the consistent backup with the following steps. 1. Freeze the filesystem with the freeze ioctl. 2. Separate the replication volume or create the snapshot with the storage device's feature. 3. Unfreeze the filesystem with the unfreeze ioctl. 4. Take the backup from the separated replication volume or the snapshot. This patch: VFS: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they can return an error. Rename write_super_lockfs and unlockfs of the super block operation freeze_fs and unfreeze_fs to avoid a confusion. ext3, ext4, xfs, gfs2, jfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that write_super_lockfs returns an error if needed, and unlockfs always returns 0. reiserfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they always return 0 (success) to keep a current behavior. Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-09 16:54:42 -08:00
Linus Torvalds	2150edc6c5	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...	2009-01-08 17:14:59 -08:00
Theodore Ts'o	83982b6f47	ext4: Remove "extents" mount option This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 14:53:16 -05:00
Theodore Ts'o	abda141892	ext4: Make printk's consistently prefixed with "EXT4-fs: " Previously, some were "ext4: ", and some were "EXT4: "; change them to be consistent with most ext4 printk's, which is to use "EXT4-fs: ". Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 00:20:32 -05:00
Theodore Ts'o	4ec1102813	ext4: Add sanity checks for the superblock before mounting the filesystem This avoids insane superblock configurations that could lead to kernel oops due to null pointer derefences. http://bugzilla.kernel.org/show_bug.cgi?id=12371 Thanks to David Maciejak at Fortinet's FortiGuard Global Security Research Team who discovered this bug independently (but at approximately the same time) as Thiemo Nagel, who submitted the patch. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-06 14:53:26 -05:00
Theodore Ts'o	b3881f74b3	ext4: Add mount option to set kjournald's I/O priority Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>	2009-01-05 22:46:26 -05:00
Jan Kara	a5b5ee3201	ext4: Add default allocation routines for quota structures Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Jan Kara	17bd13b31c	ext4: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Theodore Ts'o	c319106723	ext4: Remove code to create the journal inode This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 11:14:25 -05:00
Toshiyuki Okajima	c39a7f84d7	ext4: provide function to release metadata pages under memory pressure Pages in the page cache belonging to ext4 data files are released via the ext4_releasepage() function specified in the ext4 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a release_metadata function which calls jbd2_journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org	2009-01-05 22:38:48 -05:00
Aneesh Kumar K.V	560671a0d3	ext4: Use high 16 bits of the block group descriptor's free counts fields Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:20:24 -05:00
Aneesh Kumar K.V	5d1b1b3f49	ext4: fix BUG when calling ext4_error with locked block group The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:19:52 -05:00
Jens Axboe	b3a6ffe16b	Get rid of CONFIG_LSF We have two seperate config entries for large devices/files. One is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF that handles large files. This doesn't make a lot of sense, you typically want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording to indicate that it covers both. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-12-29 08:29:51 +01:00
Aneesh Kumar K.V	3a06d778df	ext4: sparse fixes * Change EXT4_HAS__FEATURE to return a boolean Add a function prototype for ext4_fiemap() in ext4.h * Make ext4_ext_fiemap_cb() and ext4_xattr_fiemap() be static functions * Add lock annotations to mb_free_blocks() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-22 15:04:59 -05:00
Theodore Ts'o	a9df9a4910	ext4: Make ext4_group_t be an unsigned int Nearly all places in the ext3/4 code which uses "unsigned long" is probably a bug, since on 32-bit systems a ulong a 32-bits, which means we are wasting stack space on 64-bit systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:18:16 -05:00
Theodore Ts'o	30773840c1	ext4: add fsync batch tuning knobs Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-03 20:27:38 -05:00
Theodore Ts'o	fde4d95ad8	ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() This removes annoying blank syslog entries emitted by ext4_error() or ext4_warning(), since these functions add their own newline. Signed-off-by: Nick Warne <nick@ukfsn.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:17:35 -05:00
Frank Mayhar	0390131ba8	ext4: Allow ext4 to run without a journal A few weeks ago I posted a patch for discussion that allowed ext4 to run without a journal. Since that time I've integrated the excellent comments from Andreas and fixed several serious bugs. We're currently running with this patch and generating some performance numbers against both ext2 (with backported reservations code) and ext4 with and without a journal. It just so happens that running without a journal is slightly faster for most everything. We did iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2 which creates 4 threads, each of which create and do reads and writes on a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens to bypass the page cache. Results: ext2 ext4, default ext4, no journal initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s reads 15.2 MB/s 16.9 MB/s 17.2 MB/s re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s So it seems that, so far, this was a useful exercise. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-07 00:06:22 -05:00
Roel Kluin	23475e264c	ext4: Use simple_strtol() instead of simple_strtoul() in ext4_ui_proc_open Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-26 02:23:19 -05:00
Aneesh Kumar K.V	565a9617b2	ext4: avoid ext4_error when mounting a fs with a single bg Remove some completely unneeded code which which caused an ext4_error to be generated when mounting a file system with only a single block group. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-05 21:51:07 -05:00
Theodore Ts'o	14ce0cb411	ext4: wait on all pending commits in ext4_sync_fs() In ext4_sync_fs, we only wait for a commit to finish if we started it, but there may be one already in progress which will not be synced. In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. To ensure all commits are synced, we flush all journal commits now when sync_fs'ing ext4. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org>	2008-11-03 18:10:55 -05:00
Aneesh Kumar K.V	d94e99a64c	ext4: Convert to host order before using the values. Use le16_to_cpu to read the s_reserved_gdt_blocks values from super block. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-04 09:11:26 -05:00
Theodore Ts'o	f99b25897a	ext4: Add support for non-native signed/unsigned htree hash algorithms The original ext3 hash algorithms assumed that variables of type char were signed, as God and K&R intended. Unfortunately, this assumption is not true on some architectures. Userspace support for marking filesystems with non-native signed/unsigned chars was added two years ago, but the kernel-side support was never added (until now). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-28 13:21:44 -04:00
Hidehiro Kawai	ef2cabf7c6	ext4: fix a bug accessing freed memory in ext4_abort Vegard Nossum reported a bug which accesses freed memory (found via kmemcheck). When journal has been aborted, ext4_put_super() calls ext4_abort() after freeing the journal_t object, and then ext4_abort() accesses it. This patch fix it. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-27 22:53:05 -04:00
Linus Torvalds	2248485640	Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits) [PATCH] kill the rest of struct file propagation in block ioctls [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET [PATCH] get rid of blkdev_locked_ioctl() [PATCH] get rid of blkdev_driver_ioctl() [PATCH] sanitize blkdev_get() and friends [PATCH] remember mode of reiserfs journal [PATCH] propagate mode through swsusp_close() [PATCH] propagate mode through open_bdev_excl/close_bdev_excl [PATCH] pass fmode_t to blkdev_put() [PATCH] kill the unused bsize on the send side of /dev/loop [PATCH] trim file propagation in block/compat_ioctl.c [PATCH] end of methods switch: remove the old ones [PATCH] switch sr [PATCH] switch sd [PATCH] switch ide-scsi [PATCH] switch tape_block [PATCH] switch dcssblk [PATCH] switch dasd [PATCH] switch mtd_blkdevs [PATCH] switch mmc ...	2008-10-23 10:23:07 -07:00
Al Viro	8264613def	[PATCH] switch quota_on-related stuff to kern_path() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-23 05:12:44 -04:00
Al Viro	9a1c354276	[PATCH] pass fmode_t to blkdev_put() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:58 -04:00
Theodore Ts'o	f287a1a561	ext4: Remove automatic enabling of the HUGE_FILE feature flag If the HUGE_FILE feature flag is not set, don't allow the creation of large files, instead of automatically enabling the feature flag. Recent versions of mke2fs will set the HUGE_FILE flag automatically anyway for ext4 filesystems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-16 22:50:48 -04:00
Theodore Ts'o	01436ef2e4	ext4: Remove unused mount options: nomballoc, mballoc, nocheck These mount options don't actually do anything any more, so remove them. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-17 07:22:35 -04:00
Eric Sesterhenn	5128273a32	ext4: Add missing newlines to printk messages There are some newlines missing in ext4_check_descriptors, which cause the printk level to be printed out when the next printk call is made: [ 778.847265] EXT4-fs: ext4_check_descriptors: Block bitmap for group 0 not in group (block 1509949442)!<3>EXT4-fs: group descriptors corrupted! [ 802.646630] EXT4-fs: ext4_check_descriptors: Inode bitmap for group 0 not in group (block 9043971)!<3>EXT4-fs: group descriptors corrupted! Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-17 09:16:19 -04:00
Aneesh Kumar K.V	c2774d84fd	ext4: Do mballoc init before doing filesystem recovery During filesystem recovery we may be doing a truncate which expects some of the mballoc data structures to be initialized. So do ext4_mb_init before recovery. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 20:07:20 -04:00
Steven Whitehouse	a447c09324	vfs: Use const for kernel parser table This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-13 10:10:37 -07:00
Alexander Beregalov	3244fcb1ae	ext4: fix build failure without procfs fs/ext4/super.c: In function 'ext4_fill_super': fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use in this function) fs/ext4/super.c:2226: error: (Each undeclared identifier is reported only once fs/ext4/super.c:2226: error: for each function it appears in.) Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-12 17:27:49 -04:00
Hidehiro Kawai	5bf5683a33	ext4: add an option to control error handling on file data If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount an ext4 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Here is the corresponding patch of the ext3 version: http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374 Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 22:12:43 -04:00
Hidehiro Kawai	7ffe1ea894	ext4: add checks for errors from jbd2 If the journal has aborted due to a checkpointing failure, we have to keep the contents of the journal space. Otherwise, the filesystem will lose uncheckpointed metadata completely and become inconsistent. To avoid this, we need to keep needs_recovery flag if checkpoint has failed. With this patch, ext4_put_super() detects a checkpointing failure from the return value of journal_destroy(), then it invokes ext4_abort() to make the filesystem read only and keep needs_recovery flag. Errors from jbd2_journal_flush() are also handled by this patch in some places. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-10-10 20:29:21 -04:00
Theodore Ts'o	03010a3350	ext4: Rename ext4dev to ext4 The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 20:02:48 -04:00
Andi Kleen	39d80c33a0	ext4: Avoid double dirtying of super block in ext4_put_super() While reading code I noticed that ext4_put_super() dirties the superblock bh twice. It is always done in ext4_commit_super() too. Remove the redundant dirty operation. Should be a nop semantically. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-10-06 21:37:44 -04:00
Theodore Ts'o	ede86cc473	ext4: Add debugging markers that can be used by systemtap This debugging markers are designed to debug problems such as the random filesystem latency problems reported by Arjan. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-05 20:50:06 -04:00
Theodore Ts'o	c2ea3fde61	ext4: Remove old legacy block allocator Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 09:40:52 -04:00
Theodore Ts'o	240799cdf2	ext4: Use readahead when reading an inode from the inode table With modern hard drives, reading 64k takes roughly the same time as reading a 4k block. So request readahead for adjacent inode table blocks to reduce the time it takes when iterating over directories (especially when doing this in htree sort order) in a cold cache case. With this patch, the time it takes to run "git status" on a kernel tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches" is reduced by 21%. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-09 23:53:47 -04:00
Theodore Ts'o	5e8814f2f7	ext4: Combine proc file handling into a single set of functions Previously mballoc created a separate set of functions for each proc file. This combines the tunables into a single set of functions which gets used for all of the per-superblock proc files, saving approximately 2k of compiled object code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-23 18:07:35 -04:00
Theodore Ts'o	9f6200bbfc	ext4: move /proc setup and teardown out of mballoc.c ...and into the core setup/teardown code in fs/ext4/super.c so that other parts of ext4 can define tuning parameters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-23 09:18:24 -04:00
Theodore Ts'o	914258bf2c	ext4/jbd2: Avoid WARN() messages when failing to write to the superblock This fixes some very common warnings reported by kerneloops.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-06 21:35:40 -04:00
Li Zefan	7ee1ec4ca3	ext4: add missing unlock in ext4_check_descriptors() on error path If there group descriptors are corrupted we need unlock the block group lock before returning from the function; else we will oops when freeing a spinlock which is still being held. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 10:47:19 -04:00
Theodore Ts'o	05496769e5	jbd2: clean up how the journal device name is printed Calculate the journal device name once and stash it away in the journal_s structure. This avoids needing to call bdevname() everywhere and reduces stack usage by not needing to allocate an on-stack buffer. In addition, we eliminate the '/' that can appear in device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that can cause problems when creating proc directory names, and include the inode number to support ocfs2 which creates multiple journals with different inode numbers. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-16 14:36:17 -04:00
Frederic Bohe	c62a11fd95	Update flex_bg free blocks and free inodes counters when resizing. This fixes a bug which prevented the newly created inodes after a resize from being used on filesystems with flex_bg. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 10:20:24 -04:00
Aneesh Kumar K.V	6bc6e63fcd	ext4: Add percpu dirty block accounting. This patch adds dirty block accounting using percpu_counters. Delayed allocation block reservation is now done by updating dirty block counter. In a later patch we switch to non delalloc mode if the filesystem free blocks is greater than 150% of total filesystem dirty blocks Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 09:39:00 -04:00
Theodore Ts'o	af5bc92dde	ext4: Fix whitespace checkpatch warnings/errors Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 22:25:24 -04:00
Theodore Ts'o	e5f8eab885	ext4: Fix long long checkpatch warnings Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 22:25:04 -04:00
Theodore Ts'o	4776004f54	ext4: Add printk priority levels to clean up checkpatch warnings Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 23:00:52 -04:00
Aneesh Kumar K.V	91246c0090	ext4: Initialize writeback_index to 0 when allocating a new inode The write_cache_pages() function uses the mapping->writeback_index as the starting index to write out when range_cyclic is set. Properly initialize writeback_index so that we start the writeout at index 0. This was found when debugging the small file fragmentation on ext4. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-08-19 21:14:52 -04:00
Linus Torvalds	8f616cd524	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: remove write-only variables from ext4_ordered_write_end ext4: unexport jbd2_journal_update_superblock ext4: Cleanup whitespace and other miscellaneous style issues ext4: improve ext4_fill_flex_info() a bit ext4: Cleanup the block reservation code path ext4: don't assume extents can't cross block groups when truncating ext4: Fix lack of credits BUG() when deleting a badly fragmented inode ext4: Fix ext4_ext_journal_restart() ext4: fix ext4_da_write_begin error path jbd2: don't abort if flushing file data failed ext4: don't read inode block if the buffer has a write error ext4: Don't allow lg prealloc list to be grow large. ext4: Convert the usage of NR_CPUS to nr_cpu_ids. ext4: Improve error handling in mballoc ext4: lock block groups when initializing ext4: sync up block and inode bitmap reading functions ext4: Allow read/only mounts with corrupted block group checksums ext4: Fix data corruption when writing to prealloc area	2008-08-03 10:50:44 -07:00
Al Viro	77e69dac3c	[PATCH] fix races and leaks in vfs_quota_on() users * new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the pathname resolution. * callers of vfs_quota_on() that do their own pathname resolution and checks based on it are switched to vfs_quota_on_path(); that way we avoid the races. * reiserfs leaked dentry/vfsmount references on several failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:25 -04:00
Theodore Ts'o	2b2d6d0197	ext4: Cleanup whitespace and other miscellaneous style issues Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-26 16:15:44 -04:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Li Zefan	ec05e868ac	ext4: improve ext4_fill_flex_info() a bit - use kzalloc() instead of kmalloc() + memset() - improve a printk info Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-07-24 12:49:59 -04:00
Eric Sandeen	b5f10eed81	ext4: lock block groups when initializing I noticed when filling a 1T filesystem with 4 threads using the fs_mark benchmark: fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0 that I occasionally got checksum mismatch errors: EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935 etc. I'd reliably get 4-5 of them during the run. It appears that the problem is likely a race to init the bg's when the uninit_bg feature is enabled. With the patch below, which adds sb_bgl_locking around initialization, I was able to complete several runs with no errors or warnings. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-08-02 21:21:08 -04:00
Theodore Ts'o	8a266467b8	ext4: Allow read/only mounts with corrupted block group checksums If the block group checksums are corrupted, still allow the mount to succeed, so e2fsck can have a chance to try to fix things up. Add code in the remount r/w path to make sure the block group checksums are valid before allowing the filesystem to be remounted read/write. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-26 14:34:21 -04:00
Eric Sandeen	e4079a11f5	ext4: do not set extents feature from the kernel We've talked for a while about getting rid of any feature- setting from the kernel; this gets rid of the code which would set the INCOMPAT_EXTENTS flag on the first file write when mounted as ext4[dev]. With this patch, if the extents feature is not already set on disk, then mounting as ext4 will fall back to noextents with a warning, and if -o extents is explicitly requested, the mount will fail, also with warning. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Aneesh Kumar K.V	c07651b556	ext4: Don't allow nonextenst mount option for large filesystem The block mapped inode format can address only blocks within 232. This causes a number of issues, the biggest of which is that the block allocator needs to be taught that certain inodes can not utilize block numbers > 232. So until this is fixed, it is simplest to fail mounting of file systems with more than 2**32 blocks if the -o noextents option is given. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Aneesh Kumar K.V	dd919b9822	ext4: Enable delalloc by default. Enable delalloc by default to ensure it gets sufficient testing and because it makes the filesystem much more efficient. Add a nodealalloc option to disable delayed allocation, and update ext4_show_options to show delayed allocation off if it is disabled. If the data=journal mount option is used, disable delayed allocation since the delalloc code doesn't support data=journal yet. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-07-11 19:27:31 -04:00
Mingming Cao	d2a1763791	ext4: delayed allocation ENOSPC handling This patch does block reservation for delayed allocation, to avoid ENOSPC later at page flush time. Blocks(data and metadata) are reserved at da_write_begin() time, the freeblocks counter is updated by then, and the number of reserved blocks is store in per inode counter. At the writepage time, the unused reserved meta blocks are returned back. At unlink/truncate time, reserved blocks are properly released. Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to fix the oldallocator block reservation accounting with delalloc, added lock to guard the counters and also fix the reservation for meta blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-07-14 17:52:37 -04:00
Alex Tomas	64769240bd	ext4: Add delayed allocation support in data=writeback mode Updated with fixes from Mingming Cao <cmm@us.ibm.com> to unlock and release the page from page cache if the delalloc write_begin failed, and properly handle preallocated blocks. Also added a fix to clear buffer_delay in block_write_full_page() after allocating a delayed buffer. Updated with fixes from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to update i_disksize properly and to add bmap support for delayed allocation. Updated with a fix from Valerie Clement <valerie.clement@bull.net> to avoid filesystem corruption when the filesystem is mounted with the delalloc option and blocksize < pagesize. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-07-11 19:27:31 -04:00
Jan Kara	678aaf4814	ext4: Use new framework for data=ordered mode in JBD2 This patch makes ext4 use inode-based implementation of data=ordered mode in JBD2. It allows us to unify some data=ordered and data=writeback paths (especially writepage since we don't have to start a transaction anymore) and remove some buffer walking. Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to fix file system hang due to corrupt jinode values. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Jan Kara	47b4a50beb	ext4: Set journal pointer to NULL when journal is released Set sbi->s_journal to NULL after we call journal_destroy(). This will be later needed because after journal_destroy() is called, ext4_clear_inode() can still be called for some inodes (e.g. root inode) and we'll need to detect there that journal doesn't exists anymore. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Aneesh Kumar K.V	7477827f66	ext4: Fix sparse warning Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Li Zefan	f9a8ac99fd	ext4: remove redundant code in ext4_fill_super() The previous sb_min_blocksize() has already set the block size. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Jose R. Santos	772cb7c83b	ext4: New inode allocation for FLEX_BG meta-data groups. This patch mostly controls the way inode are allocated in order to make ialloc aware of flex_bg block group grouping. It achieves this by bypassing the Orlov allocator when block group meta-data are packed toghether through mke2fs. Since the impact on the block allocator is minimal, this patch should have little or no effect on other block allocation algorithms. By controlling the inode allocation, it can basically control where the initial search for new block begins and thus indirectly manipulate the block allocator. This allocator favors data and meta-data locality so the disk will gradually be filled from block group zero upward. This helps improve performance by reducing seek time. Since the group of inode tables within one flex_bg are treated as one giant inode table, uninitialized block groups would not need to partially initialize as many inode table as with Orlov which would help fsck time as the filesystem usage goes up. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Theodore Ts'o	7ad72ca60b	ext4: Remove unused variable from ext4_show_options Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Jan Kara	4d04e4fbf8	ext4: add missing unlock to an error path in ext4_quota_write() When write in ext4_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-04 10:40:05 -07:00
Eric Sandeen	571640cad3	ext4: enable barriers by default I can't think of any valid reason for ext4 to not use barriers when they are available; I believe this is necessary for filesystem integrity in the face of a volatile write cache on storage. An administrator who trusts that the cache is sufficiently battery- backed (and power supplies are sufficiently redundant, etc...) can always turn it back off again. SuSE has carried such a patch for ext3 for quite some time now. Also document the mount option while we're at it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-26 12:29:46 -04:00
Theodore Ts'o	cd0b6a39a1	ext4: Display the journal_async_commit mount option in /proc/mounts Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-26 10:28:28 -04:00
Theodore Ts'o	624080eded	jbd2: If a journal checksum error is detected, propagate the error to ext4 If a journal checksum error is detected, the ext4 filesystem will call ext4_error(), and the mount will either continue, become a read-only mount, or cause a kernel panic based on the superblock flags indicating the user's preference of what to do in case of filesystem corruption being detected. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-06-06 17:50:40 -04:00
Jan Kara	2c8be6b222	ext4: fix typos in messages and comments (journalled -> journaled) Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-13 21:27:55 -04:00
Jan Kara	0623543b33	ext4: fix synchronization of quota files in journal=data mode In journal=data mode, it is not enough to do write_inode_now as done in vfs_quota_on() to write all data to their final location (which is needed for quota_read to work correctly). Calling journal_flush() does its job. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-13 19:11:51 -04:00
Jan Kara	cd59e7b978	ext4: Fix mount messages when quota disabled When quota is disabled, we should not print 'journaled quota not supported' when user tried to mount non-journaled quota. Also fix typo in the message. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-13 19:11:51 -04:00
Jan Kara	dfc5d03f12	ext4: correct mount option parsing to detect when quota options can be changed We should not allow user to change quota mount options when quota is just suspended. It would make mount options and internal quota state inconsistent. Also we should not allow user to change quota format when quota is turned on. On the other hand we can just silently ignore when some option is set to the value it already has (mount does this on remount). Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-05-13 19:11:51 -04:00
Josef Bacik	c19204b0ae	ext4: don't use ext4_error in ext4_check_descriptors Because ext4_check_descriptors is called at mount time you can't use ext4_error as it calls ext4_commit_sb, which since the sb isn't all the way initialized causes bad things to happen (ie a panic). This patch changes the ext4_error's to printk's to keep this problem from happening. Thanks much, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 22:00:28 -04:00
Christoph Hellwig	3dcf54515a	ext4: move headers out of include/linux Move ext4 headers out of include/linux. This is just the trivial move, there's some more thing that could be done later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 18:13:32 -04:00
Jan Kara	2887df139c	ext4: Fix hang on umount with quotas when journal is aborted Call dquot_drop() from ext4_dquot_drop() even if we fail to start a transaction. Otherwise we never get to dropping references to quota structures from the inode and umount will hang indefinitely. Thanks to Payphone LIOU for spotting the problem. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> CC: Payphone LIOU <lioupayphone@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 22:02:07 -04:00
Harvey Harrison	46e665e9d2	ext4: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-17 10:38:59 -04:00
Marcin Slusarz	216c34b2b8	ext4: convert byte order of constant instead of variable Convert byte order of constant instead of variable which can be done at compile time (vs run time). Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-17 10:38:59 -04:00
Marcin Slusarz	e8546d0615	ext4: le*_add_cpu conversion replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: sct@redhat.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: adilger@clusterfs.com Cc: Mingming Cao <cmm@us.ibm.com>	2008-04-17 10:38:59 -04:00
Josef Bacik	f3f12faa74	ext4: fix mount option parsing The "resize" option won't be noticed as it comes after the NULL option, so if you try to mount (or in this case remount) with that option it won't be recognized. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 22:05:28 -04:00
Josef Bacik	97bd42b9c8	ext4: check return of ext4_orphan_get properly This patch fix a panic while running fsfuzzer. We are improperly checking the return of ext4_orphan_get. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 22:04:56 -04:00
Jan Kara	6f28e08794	quota: ext4: make ext4 handle quotaon on remount Update ext4 to handle quotaon on remount RW. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:33 -07:00
Jan Blunck	1d957f9bf8	Introduce path_put() * Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	4ac9137858	Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 `6894212` 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Theodore Tso	469108ff3d	ext4: Add new "development flag" to the ext4 filesystem This flag is simply a generic "this is a crash/burn test filesystem" marker. If it is set, then filesystem code which is "in development" will be allowed to mount the filesystem. Filesystem code which is not considered ready for prime-time will check for this flag, and if it is not set, it will refuse to touch the filesystem. As we start rolling ext4 out to distro's like Fedora, et. al, this makes it less likely that a user might accidentally start using ext4 on a production filesystem; a bad thing, since that will essentially make it be unfsckable until e2fsprogs catches up. Signed-off-by: Theodore Tso <tytso@MIT.EDU> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-02-10 01:11:44 -05:00
David Howells	1d1fe1ee02	iget: stop EXT4 from using iget() and read_inode() Stop the EXT4 filesystem from using iget() and read_inode(). Replace ext4_read_inode() with ext4_iget(), and call that instead of iget(). ext4_iget() then uses iget_locked() directly and returns a proper error code instead of an inode in the event of an error. ext4_fill_super() returns any error incurred when getting the root inode instead of EINVAL. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:27 -08:00
Akinobu Mita	197cd65acc	ext[234]: use ext[234]_get_group_desc() Use ext[234]_get_group_desc() to get group descriptor from group number. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Aneesh Kumar K.V	ce40733ce9	ext4: Check for return value from sb_set_blocksize sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Miklos Szeredi	cb45bbe44b	ext4: Add stripe= option to /proc/mounts Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00

... 2 3 4 5 6 ...

421 Commits