linux-sg2042

History

Wu Fengguang 6c14ae1e92 writeback: dirty position control bdi_position_ratio() provides a scale factor to bdi->dirty_ratelimit, so that the resulted task rate limit can drive the dirty pages back to the global/bdi setpoints. Old scheme is, \| free run area \| throttle area ----------------------------------------+----------------------------> thresh^ dirty pages New scheme is, ^ task rate limit \| \| * \| * \| * \|[free run] * [smooth throttled] \| * \| * \| * ..bdi->dirty_ratelimit..........* \| . * \| . * \| . * \| . * \| . * +-------------------------------.-----------------------------------> setpoint^ limit^ dirty pages The slope of the bdi control line should be 1) large enough to pull the dirty pages to setpoint reasonably fast 2) small enough to avoid big fluctuations in the resulted pos_ratio and hence task ratelimit Since the fluctuation range of the bdi dirty pages is typically observed to be within 1-second worth of data, the bdi control line's slope is selected to be a linear function of bdi write bandwidth, so that it can adapt to slow/fast storage devices well. Assume the bdi control line pos_ratio = 1.0 + k (dirty - bdi_setpoint) where k is the negative slope. If targeting for 12.5% fluctuation range in pos_ratio when dirty pages are fluctuating in range [bdi_setpoint - write_bw/2, bdi_setpoint + write_bw/2], we get slope k = - 1 / (8 * write_bw) Let pos_ratio(x_intercept) = 0, we get the parameter used in code: x_intercept = bdi_setpoint + 8 * write_bw The global/bdi slopes are nicely complementing each other when the system has only one major bdi (indicated by bdi_thresh ~= thresh): 1) slope of global control line => scaling to the control scope size 2) slope of main bdi control line => scaling to the writeout bandwidth so that - in memory tight systems, (1) becomes strong enough to squeeze dirty pages inside the control scope - in large memory systems where the "gravity" of (1) for pulling the dirty pages to setpoint is too weak, (2) can back (1) up and drive dirty pages to bdi_setpoint ~= setpoint reasonably fast. Unfortunately in JBOD setups, the fluctuation range of bdi threshold is related to memory size due to the interferences between disks. In this case, the bdi slope will be weighted sum of write_bw and bdi_thresh. Given equations span = x_intercept - bdi_setpoint k = df/dx = - 1 / span and the extremum values span = bdi_thresh dx = bdi_thresh we get df = - dx / span = - 1.0 That means, when bdi_dirty deviates bdi_thresh up, pos_ratio and hence task ratelimit will fluctuate by -100%. peter: use 3rd order polynomial for the global control line CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>		2011-10-03 21:08:56 +08:00
..
Kconfig	mm Kconfig typo: cleancacne -> cleancache	2011-06-10 14:47:52 +02:00
Kconfig.debug	mm: debug-pagealloc: fix kconfig dependency warning	2011-03-22 17:44:02 -07:00
Makefile	mm: cleancache core ops functions and config	2011-05-26 10:01:36 -06:00
backing-dev.c	writeback: account per-bdi accumulated dirtied pages	2011-10-03 21:08:56 +08:00
bootmem.c	crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn	2011-03-23 19:47:19 -07:00
bounce.c	bounce: call flush_dcache_page() after bounce_copy_vec()	2010-09-09 18:57:25 -07:00
cleancache.c	mm: cleancache core ops functions and config	2011-05-26 10:01:36 -06:00
compaction.c	mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2	2011-06-15 20:04:02 -07:00
debug-pagealloc.c	generic debug pagealloc	2009-04-01 08:59:13 -07:00
dmapool.c	devres: fix possible use after free	2011-07-25 20:57:14 -07:00
fadvise.c	readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM	2010-03-06 11:26:25 -08:00
failslab.c	fault-injection: add ability to export fault_attr in arbitrary directory	2011-08-03 14:25:20 -10:00
filemap.c	mm: account skipped entries to avoid looping in find_get_pages	2011-09-14 18:17:56 -07:00
filemap_xip.c	mm: Convert i_mmap_lock to a mutex	2011-05-25 08:39:18 -07:00
fremap.c	mm: don't access vm_flags as 'int'	2011-05-26 09:20:31 -07:00
highmem.c	mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.	2011-08-17 13:00:20 -07:00
huge_memory.c	mm/huge_memory.c: minor lock simplification in __khugepaged_exit	2011-07-25 20:57:09 -07:00
hugetlb.c	mm: hugetlb: fix coding style issues	2011-07-25 20:57:09 -07:00
hwpoison-inject.c	Fix common misspellings	2011-03-31 11:26:23 -03:00
init-mm.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
internal.h	mm: nommu: sort mm->mmap list properly	2011-05-25 08:39:05 -07:00
kmemcheck.c	kmemcheck: Fix build errors due to missing slab.h	2010-03-30 22:02:32 +09:00
kmemleak-test.c	kmemleak: remove memset by using kzalloc	2011-01-27 18:31:51 +00:00
kmemleak.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
ksm.c	ksm: fix NULL pointer dereference in scan_get_next_rmap_item()	2011-06-15 20:04:02 -07:00
maccess.c	maccess,probe_kernel: Make write/read src const void *	2011-05-25 19:56:23 -04:00
madvise.c	fs: kill i_alloc_sem	2011-07-20 20:47:46 -04:00
memblock.c	mm/memblock.c: avoid abuse of RED_INACTIVE	2011-07-25 20:57:09 -07:00
memcontrol.c	memcg: Revert "memcg: add memory.vmscan_stat"	2011-09-14 18:09:38 -07:00
memory-failure.c	HWPoison: add memory_failure_queue()	2011-08-03 11:15:58 -04:00
memory.c	mm/futex: fix futex writes on archs with SW tracking of dirty & young	2011-07-25 20:57:11 -07:00
memory_hotplug.c	mm: extend memory hotplug API to allow memory hotplug in virtual machines	2011-07-25 20:57:08 -07:00
mempolicy.c	mm/mempolicy.c: make copy_from_user() provably correct	2011-09-14 18:09:36 -07:00
mempool.c	mm: remove broken 'kzalloc' mempool	2009-09-22 07:17:35 -07:00
migrate.c	migrate: don't account swapcache as shmem	2011-06-16 15:01:24 -07:00
mincore.c	mm: clarify the radix_tree exceptional cases	2011-08-03 14:25:24 -10:00
mlock.c	mm: don't access vm_flags as 'int'	2011-05-26 09:20:31 -07:00
mm_init.c	…
mmap.c	mmap: fix and tidy up overcommit page arithmetic	2011-07-25 20:57:09 -07:00
mmu_context.c	exit: fix oops in sync_mm_rss	2010-03-24 16:31:21 -07:00
mmu_notifier.c	thp: mmu_notifier_test_young	2011-01-13 17:32:46 -08:00
mmzone.c	mm: page allocator: adjust the per-cpu counter threshold when memory is low	2011-01-13 17:32:31 -08:00
mprotect.c	thp: mprotect: transparent huge page support	2011-01-13 17:32:44 -08:00
mremap.c	mm: Convert i_mmap_lock to a mutex	2011-05-25 08:39:18 -07:00
msync.c	sanitize vfs_fsync calling conventions	2010-05-21 18:31:21 -04:00
nobootmem.c	memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high()	2011-05-25 08:39:31 -07:00
nommu.c	mmap: fix and tidy up overcommit page arithmetic	2011-07-25 20:57:09 -07:00
oom_kill.c	oom: task->mm == NULL doesn't mean the memory was freed	2011-08-01 15:24:12 -10:00
page-writeback.c	writeback: dirty position control	2011-10-03 21:08:56 +08:00
page_alloc.c	fault-injection: add ability to export fault_attr in arbitrary directory	2011-08-03 14:25:20 -10:00
page_cgroup.c	mm/page_cgroup.c: simplify code by using SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macros	2011-07-25 20:57:09 -07:00
page_io.c	block: kill off REQ_UNPLUG	2011-03-10 08:52:27 +01:00
page_isolation.c	mm: page_isolation: codeclean fix comment and rm unneeded val init	2010-10-26 16:52:11 -07:00
pagewalk.c	pagewalk: fix code comment for THP	2011-07-25 20:57:09 -07:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	mm: remove gfp mask from pcpu_get_vm_areas	2011-01-13 17:32:34 -08:00
percpu.c	Merge branch 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu	2011-05-24 11:53:42 -07:00
pgtable-generic.c	mm/pgtable-generic.c: fix CONFIG_SWAP=n build	2011-01-26 10:49:58 +10:00
prio_tree.c	sanitize <linux/prefetch.h> usage	2011-05-20 12:50:29 -07:00
quicklist.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
readahead.c	readahead: readahead page allocations are OK to fail	2011-05-25 08:39:25 -07:00
rmap.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback	2011-07-26 10:39:54 -07:00
shmem.c	mm: clarify the radix_tree exceptional cases	2011-08-03 14:25:24 -10:00
slab.c	slab, lockdep: Annotate the locks before using them	2011-08-04 10:18:00 +02:00
slob.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
slub.c	slub: add slab with one free object to partial list tail	2011-08-27 11:58:59 +03:00
sparse-vmemmap.c	tree-wide: fix comment/printk typos	2010-11-01 15:38:34 -04:00
sparse.c	mm: make some struct page's const	2011-07-25 20:57:07 -07:00
swap.c	mm: batch activate_page() to reduce lock contention	2011-05-25 08:39:37 -07:00
swap_state.c	block: remove per-queue plugging	2011-03-10 08:52:07 +01:00
swapfile.c	mm: let swap use exceptional entries	2011-08-03 14:25:22 -10:00
thrash.c	mm: swap-token: add a comment for priority aging	2011-07-25 20:57:08 -07:00
truncate.c	mm: a few small updates for radix-swap	2011-08-03 14:25:24 -10:00
util.c	mm: nommu: sort mm->mmap list properly	2011-05-25 08:39:05 -07:00
vmalloc.c	mm: sync vmalloc address space page tables in alloc_vm_area()	2011-09-14 18:09:38 -07:00
vmscan.c	memcg: Revert "memcg: add memory.vmscan_stat"	2011-09-14 18:09:38 -07:00
vmstat.c	numa: fix NUMA compile error when sysfs and procfs are disabled	2011-09-14 18:09:37 -07:00