linux-sg2042/mm
Hisashi Hifumi 8ab22b9abb vfs: pagecache usage optimization for pagesize!=blocksize
When we read some part of a file through pagecache, if there is a
pagecache of corresponding index but this page is not uptodate, read IO
is issued and this page will be uptodate.

I think this is good for pagesize == blocksize environment but there is
room for improvement on pagesize != blocksize environment.  Because in
this case a page can have multiple buffers and even if a page is not
uptodate, some buffers can be uptodate.

So I suggest that when all buffers which correspond to a part of a file
that we want to read are uptodate, use this pagecache and copy data from
this pagecache to user buffer even if a page is not uptodate.  This can
reduce read IO and improve system throughput.

I wrote a benchmark program and got result number with this program.

This benchmark do:

  1: mount and open a test file.

  2: create a 512MB file.

  3: close a file and umount.

  4: mount and again open a test file.

  5: pwrite randomly 300000 times on a test file.  offset is aligned
     by IO size(1024bytes).

  6: measure time of preading randomly 100000 times on a test file.

The result was:
	2.6.26
        330 sec

	2.6.26-patched
        226 sec

Arch:i386
Filesystem:ext3
Blocksize:1024 bytes
Memory: 1GB

On ext3/4, a file is written through buffer/block.  So random read/write
mixed workloads or random read after random write workloads are optimized
with this patch under pagesize != blocksize environment.  This test result
showed this.

The benchmark program is as follows:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>

#define LEN 1024
#define LOOP 1024*512 /* 512MB */

main(void)
{
	unsigned long i, offset, filesize;
	int fd;
	char buf[LEN];
	time_t t1, t2;

	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	memset(buf, 0, LEN);
	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}
	for (i = 0; i < LOOP; i++)
		write(fd, buf, LEN);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
		perror("cannot mount\n");
		exit(1);
	}
	fd = open("/root/test1/testfile", O_RDWR);
	if (fd < 0) {
		perror("cannot open file\n");
		exit(1);
	}

	filesize = LEN * LOOP;
	for (i = 0; i < 300000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pwrite(fd, buf, LEN, offset);
	}
	printf("start test\n");
	time(&t1);
	for (i = 0; i < 100000; i++){
		offset = (random() % filesize) & (~(LEN - 1));
		pread(fd, buf, LEN, offset);
	}
	time(&t2);
	printf("%ld sec\n", t2-t1);
	close(fd);
	if (umount("/root/test1/") < 0) {
		perror("cannot umount\n");
		exit(1);
	}
}

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@ucw.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-28 16:30:21 -07:00
..
Kconfig mmu-notifiers: core 2008-07-28 16:30:21 -07:00
Makefile mmu-notifiers: core 2008-07-28 16:30:21 -07:00
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c bootmem: replace node_boot_start in struct bootmem_data 2008-07-24 10:47:20 -07:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c vfs: pagecache usage optimization for pagesize!=blocksize 2008-07-28 16:30:21 -07:00
filemap_xip.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
fremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c mm/hugetlb.c must #include <asm/io.h> 2008-07-28 16:30:21 -07:00
internal.h mm: export prep_compound_page to mm 2008-07-24 10:47:17 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
memcontrol.c memcg: limit change shrink usage 2008-07-25 10:53:37 -07:00
memory.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
memory_hotplug.c memory-hotplug: add sysfs removable attribute for hotplug memory remove 2008-07-24 10:47:21 -07:00
mempolicy.c hugetlb: modular state for hugetlb page size 2008-07-24 10:47:17 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mm_init.c mm: create /sys/kernel/mm 2008-07-24 10:47:17 -07:00
mmap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mremap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c tracehook: tracehook_expect_breakpoints 2008-07-26 12:00:09 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page-writeback.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
page_alloc.c stop_machine: Wean existing callers off stop_machine_run() 2008-07-28 12:16:31 +10:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c pdflush: use time_after() instead of open-coding it 2008-07-25 10:53:28 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: readahead scan lockless 2008-07-26 12:00:06 -07:00
rmap.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
shmem.c tmpfs: fix kernel BUG in shmem_delete_inode 2008-07-28 16:30:20 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
slab.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
slob.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
slub.c SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c make mm/sparse.c: make a function static 2008-07-26 12:00:12 -07:00
swap.c mm: remove initialization of static per-cpu variables 2008-07-24 10:47:21 -07:00
swap_state.c mm: print swapcache page count in show_swap_cache_info() 2008-07-26 12:00:10 -07:00
swapfile.c mm/swapfile.c: make code static 2008-07-26 12:00:12 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
util.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-07-26 20:17:56 -07:00
vmalloc.c Use WARN() in mm/vmalloc.c 2008-07-26 12:00:07 -07:00
vmscan.c mm: spinlock tree_lock 2008-07-26 12:00:06 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00