Commit Graph

165 Commits

Author SHA1 Message Date
Linus Torvalds 4b89eed93e Write back inode data pages even when the inode itself is locked
In __writeback_single_inode(), when we find a locked inode and we're not
doing a data-integrity sync, we used to just skip writing entirely,
since we didn't want to wait for the inode to unlock.

However, there's really no reason to skip writing the data pages, which
are likely to be the the bulk of the dirty state anyway (and the main
reason why writeback was started for the non-data-integrity case, of
course!)

Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andrew Morton <akpm@osdl.org>,
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 12:53:20 -08:00
David Howells 7b0de42d7c [PATCH] BLOCK: Remove dependence on existence of blockdev_superblock [try #6]
Move blockdev_superblock extern declaration from fs/fs-writeback.c to a
headerfile and remove the dependence on it by wrapping it in a macro.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:26 +02:00
David Howells 07f3f05c1e [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6]
Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.

Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30 20:52:18 +02:00
Christoph Lameter fd39fc8561 [PATCH] zoned vm counters: conversion of nr_unstable to per zone counter
Conversion of nr_unstable to a per zone counter

We need to do some special modifications to the nfs code since there are
multiple cases of disposition and we need to have a page ref for proper
accounting.

This converts the last critical page state of the VM and therefore we need to
remove several functions that were depending on GET_PAGE_STATE_LAST in order
to make the kernel compile again.  We are only left with event type counters
in page state.

[akpm@osdl.org: bugfixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:36 -07:00
Christoph Lameter b1e7a8fd85 [PATCH] zoned vm counters: conversion of nr_dirty to per zone counter
This makes nr_dirty a per zone counter.  Looping over all processors is
avoided during writeback state determination.

The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones.  Someone more familiar with
NFS should probably review what I have done.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:35 -07:00
Jens Axboe b31dc66a54 [PATCH] Kill PF_SYNCWRITE flag
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23 17:10:39 +02:00
OGAWA Hirofumi 111ebb6e6f [PATCH] writeback: fix range handling
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request.  Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".

To make all this sane, the patch changes range of writeback_control.

So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.

And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.

This patch does,

    - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
      -1 is usually ok for range_end (type is long long). But, if someone did,

		range_end += val;		range_end is "val - 1"
		u64val = range_end >> bits;	u64val is "~(0ULL)"

      or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
      things, and uses LLONG_MAX for range_end.

    - All callers of ->writepages() sets range_start/end or range_cyclic.

    - Fix updates of ->writeback_index. It seems already bit strange.
      If it starts at 0 and ended by check of nr_to_write, this last
      index may reduce chance to scan end of file.  So, this updates
      ->writeback_index only if range_cyclic is true or whole-file is
      scanned.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:49 -07:00
OGAWA Hirofumi 4ffc844425 [PATCH] Move cond_resched() after iput() in sync_sb_inodes()
In here, I think the following order is more cache-friendly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Randy Dunlap b8887e6e8c [PATCH] kernel-docs: fix kernel-doc format problems
Convert to proper kernel-doc format.

Some have extra blank lines (not allowed immed.  after the function name)
or need blank lines (after all parameters).  Function summary must be only
one line.

Colon (":") in a function description does weird things (causes kernel-doc
to think that it's a new section head sadly).

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:55 -08:00
Andrew Morton 49364ce253 [PATCH] write_inode_now(): write inode if not BDI_CAP_NO_WRITEBACK
If the backing_dev_info doesn't have BDI_CAP_NO_WRITEBACK we're not supposed
to write back an inode's pages.  But in this situation write_inode_now()
refuses to write the inode itself as well.  Fix.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:35 -08:00
Andrea Arcangeli 659603ef69 [PATCH] fix __writeback_single_inode WARN_ON
When the inode count is zero in inode writeback, the

	WARN_ON(!(inode->i_state & I_WILL_FREE));

is broken, and needs to test for either I_WILL_FREE|I_FREEING.

When the inode is in I_FREEING state, it's already out of the visibility
of the vm so it can't be freed so it doesn't require the __iget and the
generic_delete_inode path can call the sync internally to the lowlevel
fs callback during the last iput. So the inode being in I_FREEING is
also a valid condition for calling the sync with i_count == 0.

The specific stack trace is this:

  0xc00000007b8fb6e0  0xc00000000010118c  .__writeback_single_inode +0x5c
  0xc00000007b8fb6e0  0xc0000000001014dc (lr) .sync_inode +0x3c
  0xc00000007b8fb790  0xc0000000001014dc  .sync_inode +0x3c
  0xc00000007b8fb820  0xc0000000001a5020  .ext2_sync_inode +0x64
  0xc00000007b8fb8f0  0xc0000000001a65b4  .ext2_truncate +0x3f8
  0xc00000007b8fba40  0xc0000000001a6940  .ext2_delete_inode +0xdc
  0xc00000007b8fbac0  0xc0000000000f7a5c  .generic_delete_inode +0x124
  0xc00000007b8fbb50  0xc0000000000f5fe0  .iput +0xb8
  0xc00000007b8fbbe0  0xc0000000000e9fd4  .sys_unlink +0x2a8
  0xc00000007b8fbd10  0xc00000000001048c  .ret_from_syscall_1 +0x0

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-31 14:22:04 -08:00
Andrea Arcangeli 7f04c26d71 [PATCH] fix nr_unused accounting, and avoid recursing in iput with I_WILL_FREE set
list_move(&inode->i_list, &inode_in_use);
 		} else {
 			list_move(&inode->i_list, &inode_unused);
+			inodes_stat.nr_unused++;
 		}
 	}
 	wake_up_inode(inode);

Are you sure the above diff is correct? It was added somewhere between
2.6.5 and 2.6.8. I think it's wrong.

The only way I can imagine the i_count to be zero in the above path, is
that I_WILL_FREE is set.  And if I_WILL_FREE is set, then we must not
increase nr_unused.  So I believe the above change is buggy and it will
definitely overstate the number of unused inodes and it should be backed
out.

Note that __writeback_single_inode before calling __sync_single_inode, can
drop the spinlock and we can have both the dirty and locked bitflags clear
here:

		spin_unlock(&inode_lock);
		__wait_on_inode(inode);
		iput(inode);
XXXXXXX
		spin_lock(&inode_lock);
	}
	use inode again here

a construct like the above makes zero sense from a reference counting
standpoint.

Either we don't ever use the inode again after the iput, or the
inode_lock should be taken _before_ executing the iput (i.e. a __iput
would be required). Taking the inode_lock after iput means the iget was
useless if we keep using the inode after the iput.

So the only chance the 2.6 was safe to call __writeback_single_inode
with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
prevent the VM to free the inode in XXXXX).

Potentially calling the above iput with I_WILL_FREE was also wrong
because it would recurse in iput_final (the second mainline bug).

The below (untested) patch fixes the nr_unused accounting, avoids recursing
in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
a reference on the inode!

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:26 -08:00
Kirill Korotaev 618f06362a [PATCH] O(1) sb list traversing on syncs
This patch removes O(n^2) super block loops in sync_inodes(),
sync_filesystems() etc.  in favour of using __put_super_and_need_restart()
which I introduced earlier.  We faced a noticably long freezes on sb
syncing when there are thousands of super blocks in the system.

Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:27 -07:00
Martin Waitz 67be2dd1ba [PATCH] DocBook: fix some descriptions
Some KernelDoc descriptions are updated to match the current code.
No code changes.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:26 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00