linux-sg2042/include/trace/events
Mel Gorman c6286c9839 mm: add tracepoints for LRU activation and insertions
Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec --
http://www.spinics.net/lists/linux-ext4/msg37340.html .

Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too
quickly and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.

Patch 1 adds two tracepoints for LRU page activation and insertion. Using
	these processes it's possible to build a model of pages in the
	LRU that can be processed offline.

Patch 2 defers making the decision on what LRU to add a page to until when
	the pagevec is drained.

Patch 3 searches the local pagevec for pages to mark PageActive on
	mark_page_accessed. The changelog explains why only the local
	pagevec is examined.

Patches 4 and 5 tidy up the API.

postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a
result of the patch.

Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds.  Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this
test.  The average age of pages in milliseconds were

				    vanilla deferdrain
Average age mapped anon:               1454       1250
Average age mapped file:             127841     155552
Average age unmapped anon:               85        235
Average age unmapped file:            73633      38884
Average age unmapped buffers:         74054     116155

The LRU activity was mostly files which you'd expect for a dd-based
workload.  Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages
are now getting added to the active list when drained from the pagevecs.
Note that the average age of the unmapped file data is decreased as they
are still added to the inactive list and are reclaimed before the
buffers.

There is no guarantee this is a universal win for all workloads and it
would be nice if the filesystem people gave some thought as to whether
this decision is generally a win or a loss.

This patch:

Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types.  This can be used to
debug problems related to premature reclaim of pages of particular
types.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
Cc: Andrew Perepechko <anserper@ya.ru>
Cc: Robin Dong <sanbai@taobao.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:31 -07:00
..
9p.h net/9p: Convert net/9p protocol dumps to tracepoints 2011-10-24 11:13:12 -05:00
asoc.h ASoC: dapm: Fix x86_64 build warning. 2012-04-23 13:15:35 +01:00
bcache.h bcache: A block layer cache 2013-03-23 16:11:31 -07:00
block.h Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
btrfs.h Btrfs: parse parent 0 into correct value in tracepoint 2012-12-16 20:46:18 -05:00
compaction.h UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers 2012-10-02 18:01:25 +01:00
ext3.h jbd: change journal_invalidatepage() to accept length 2013-05-21 23:26:36 -04:00
ext4.h ext4: translate flag bits to strings in tracepoints 2013-07-01 08:12:40 -04:00
f2fs.h f2fs: add a tracepoint on f2fs_new_inode 2013-04-29 10:52:01 +09:00
filemap.h mm: trace filemap add and del 2013-04-29 15:54:28 -07:00
gfpflags.h mm: add a __GFP_KMEMCG flag 2012-12-18 15:02:12 -08:00
gpio.h gpio: add trace events for setting direction and value 2011-05-20 00:40:19 -06:00
host1x.h gpu: host1x: Add channel support 2013-04-22 12:32:43 +02:00
irq.h rcu: Use softirq to address performance regression 2011-06-14 15:25:39 -07:00
jbd.h jbd: Write journal superblock with WRITE_FUA after checkpointing 2012-05-15 23:34:37 +02:00
jbd2.h jbd2: trace when lock_buffer in do_get_write_access takes a long time 2013-04-21 16:47:54 -04:00
kmem.h UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers 2012-10-02 18:01:25 +01:00
kvm.h KVM: Extract generic irqchip logic into irqchip.c 2013-04-26 20:27:17 +02:00
lock.h tracing: Factorize lock events in a lock class 2010-05-09 13:45:35 +02:00
mce.h tracing: Fix event alignment: mce:mce_record 2011-03-10 10:34:28 -05:00
migrate.h mm: migrate: Add a tracepoint for migrate_pages 2012-12-11 14:28:35 +00:00
module.h include: replace linux/module.h with "struct module" wherever possible 2011-10-31 19:32:32 -04:00
napi.h napi: Convert trace_napi_poll to TRACE_EVENT 2010-09-07 17:51:01 +02:00
net.h net: tracepoint of net_dev_xmit sees freed skb and causes panic 2011-06-02 14:06:31 -07:00
nmi.h x86: Add NMI duration tracepoints 2013-06-23 11:52:58 +02:00
oom.h mm, oom: change type of oom_score_adj to short 2012-12-11 17:22:27 -08:00
pagemap.h mm: add tracepoints for LRU activation and insertions 2013-07-03 16:07:31 -07:00
power.h PM / tracing: remove deprecated power trace API 2013-01-26 00:39:12 +01:00
printk.h printk/tracing: rework console tracing 2013-04-29 18:28:13 -07:00
random.h random: add tracepoints for easier debugging and verification 2012-07-14 20:17:48 -04:00
ras.h aerdrv: Trace Event for PCI Express Advanced Error Reporting 2013-01-03 14:31:44 -08:00
rcu.h rcu: Repurpose no-CBs event tracing to future-GP events 2013-03-26 08:04:54 -07:00
regmap.h regmap: async: Add tracepoints for async I/O 2013-03-04 10:28:29 +08:00
regulator.h regulator: Add basic trace facilities 2011-01-12 14:33:00 +00:00
rpm.h device.h: audit and cleanup users in main include dir 2012-03-16 10:38:24 -04:00
sched.h kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
scsi.h [SCSI] Include protection operation in SCSI command trace 2011-03-14 18:36:02 -05:00
signal.h tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info 2012-01-13 18:48:50 +01:00
skb.h tracing: Fix event alignment: skb:kfree_skb 2011-03-10 10:34:31 -05:00
sock.h core: add tracepoints for queueing skb to rcvbuf 2011-06-21 16:06:10 -07:00
sunrpc.h SUNRPC: Adding status trace points 2012-02-06 10:37:53 -05:00
syscalls.h tracing: Allow raw syscall trace events for non privileged users 2010-11-18 14:37:43 +01:00
task.h mm, oom: change type of oom_score_adj to short 2012-12-11 17:22:27 -08:00
timer.h Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-05-05 13:23:27 -07:00
udp.h udp: add tracepoints for queueing skb to rcvbuf 2011-06-21 16:06:10 -07:00
vmscan.h UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers 2012-10-02 18:01:25 +01:00
workqueue.h workqueue: rename cpu_workqueue to pool_workqueue 2013-02-13 19:29:12 -08:00
writeback.h writeback: replace custom worker pool implementation with unbound workqueue 2013-04-01 19:08:06 -07:00
xen.h xen/mmu: Use Xen specific TLB flush instead of the generic one. 2012-10-31 12:38:31 -04:00