Linear search over all p4 MSRs should be fine if only
we would not use it in events scheduling routine which
is pretty time critical. Lets use hashes. It should speed
scheduling up significantly.
v2: Steven proposed to use more gentle approach than issue
BUG on error, so we use WARN_ONCE now
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100512174242.GA5190@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
RAW events are special and we should be ready for user passing
in insane event index values.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.315897547@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The caller already has done such a check.
And it was wrong anyway, it had to be '>=' rather than '>'
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.130386882@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven reported:
|
| I'm getting:
|
| Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727
| Call Trace:
| [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0
| [<ffffffff81019874>] p4_hw_config+0x2b/0x15c
| [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f
| [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be
| [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c
| [<ffffffff810c68b2>] T.850+0x273/0x42e
| [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1
| [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69
| [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b
|
| When running perf record in latest tip/perf/core
|
Due to the fact that p4 counters are shared between HT threads
we synthetically divide the whole set of counters into two
non-intersected subsets. And while we're "borrowing" counters
from these subsets we should not be preempted (well, strictly
speaking in p4_hw_config we just pre-set reference to the
subset which allow to save some cycles in schedule routine
if it happens on the same cpu). So use get_cpu/put_cpu pair.
Also p4_pmu_schedule_events should use smp_processor_id rather
than raw_ version. This allow us to catch up preemption issue
(if there will ever be).
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112716.963478928@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If an event is not RAW we should not exit p4_hw_config
early but call x86_setup_perfctr as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perfctr setup calls are in the corresponding .hw_config()
functions now. This makes it possible to introduce config functions
for other pmu events that are not perfctr specific.
Also, all of a sudden the code looks much nicer.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All variables that have __initconst should also be const.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane noticed that the ANY flag was in generic arch code, and Cyrill
reported that it broke the P4 code.
Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and
provide intel_pmu and amd_pmu specific versions of this callback.
The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra
event bits AMD64 has.
Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Robert Richter <robert.richter@amd.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269968113.5258.442.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The big rename:
cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events
accidentally renamed some members of stucts that were named after
registers in the spec. To avoid confusion this patch reverts some
changes. The related specs are MSR descriptions in AMD's BKDGs and the
ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32
Architectures Software Developer's Manuals.
This patch does:
$ sed -i -e 's:num_events:num_counters:g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event_amd.c \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_p6.c \
arch/x86/kernel/cpu/perf_event_p4.c \
arch/x86/oprofile/op_model_ppro.c
$ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \
arch/x86/kernel/cpu/perf_event_amd.c \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_p6.c \
arch/x86/kernel/cpu/perf_event_p4.c
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement the workaround for Intel Errata AAK100 and AAP53.
Also, remove the Core-i7 name for Nehalem events since there are
also Westmere based i7 chips.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269608924.12097.147.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The adding of raw event support lead to complete code
refactoring. I hope is became more readable then it was.
The list of changes:
1) The 64bit config field is enough to hold all information we need
to track event details. To achieve it we used *own* enum for
events selection in ESCR register and map this key into proper
value at moment of event enabling.
For the same reason we use 12LSB bits in CCCR register -- to track
which exactly cache trace event was requested. And we cear this bits
at real 'write' moment.
2) There is no per-cpu area reserved for P4 PMU anymore. We
don't need it. All is held by config.
3) Now we may use any available counter, ie we try to grab any
possible counter.
v2:
- Lin Ming reported the lack of ESCR selector in CCCR for cache events
v3:
- Don't loose cache event codes at config unpacking procedure, we may
need it one day so no obscure hack behind our back, better to clear
reserved bits explicitly when needed (thanks Ming for pointing out)
- Lin Ming fixed misplaced opcodes in cache events
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com>
[ v4: did a few whitespace fixlets ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- A few ESCR have escaped fixing at previous attempt.
- p4_escr_map is read only, make it const.
Nothing serious.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100318211256.GH5062@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the HT bit setting code from p4_pmu_event_map to
p4_hw_config. So the cache events can get HT bit set correctly.
Tested on my P4 desktop, below 6 cache events work:
L1-dcache-load-misses
LLC-load-misses
dTLB-load-misses
dTLB-store-misses
iTLB-loads
iTLB-load-misses
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, we use opcode(Event and Event-Selector) + emask to
look up template in p4_templates.
But cache events (L1-dcache-load-misses, LLC-load-misses, etc)
use the same event(P4_REPLAY_EVENT) to do the counting, ie, they
have the same opcode and emask. So we can not use current lookup
mechanism to find the template for cache events.
This patch introduces a "key", which is the index into
p4_templates. The low 12 bits of CCCR are reserved, so we can
hide the "key" in the low 12 bits of hwc->config.
We extract the key from hwc->config and then quickly find the
template.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC
case we're safe to remove this conditional compilation and clean up
the code a bit.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: fweisbec@gmail.com
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: peterz@infradead.org
LKML-Reference: <20100317104356.232371479@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This should turn on instruction counting on P4s, which was missing in
the first version of the new PMU driver.
It's inaccurate for now, we still need dependant event to tag mops
before we can count them precisely. The result is that the number of
instruction may be lifted up.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo reported:
|
| There's a build failure on -tip with the P4 driver, on UP 32-bit, if
| PERF_EVENTS is enabled but UP_APIC is disabled:
|
| arch/x86/built-in.o: In function `p4_pmu_handle_irq':
| perf_event.c:(.text+0xa756): undefined reference to `apic'
| perf_event.c:(.text+0xa76e): undefined reference to `apic'
|
So we have to unmask LVTPC only if we're configured to have one.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100313081116.GA5179@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The netburst PMU is way different from the "architectural
perfomance monitoring" specification that current CPUs use.
P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
perfomance monitoring events.
A few implementational details:
1) We need a separate x86_pmu::hw_config helper in struct
x86_pmu since register bit-fields are quite different from P6,
Core and later cpu series.
2) For the same reason is a x86_pmu::schedule_events helper
introduced.
3) hw_perf_event::config consists of packed ESCR+CCCR values.
It's allowed since in reality both registers only use a half
of their size. Of course before making a real write into a
particular MSR we need to unpack the value and extend it to
a proper size.
4) The tuple of packed ESCR+CCCR in hw_perf_event::config
doesn't describe the memory address of ESCR MSR register
so that we need to keep a mapping between these tuples
used and available ESCR (various P4 events may use same
ESCRs but not simultaneously), for this sake every active
event has a per-cpu map of hw_perf_event::idx <--> ESCR
addresses.
5) Since hw_perf_event::idx is an offset to counter/control register
we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
strips it down to 8 registers and event armed may never be turned
off (ie the bit in active_mask is set but the loop never reaches
this index to check), thanks to Peter Zijlstra
Restrictions:
- No cascaded counters support (do we ever need them?)
- No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
doesn't work for now)
- There are events with same counters which can't work simultaneously
(need to use intersected ones due to broken counter 1)
- No PERF_COUNT_HW_CACHE_ events yet
Todo:
- Implement dependent events
- Need proper hashing for event opcodes (no linear search, good for
debugging stage but not in real loads)
- Some events counted during a clock cycle -- need to set threshold
for them and count every clock cycle just to get summary statistics
(ie to behave the same way as other PMUs do)
- Need to swicth to use event_constraints
- To support RAW events we need to encode a global list of P4 events
into p4_templates
- Cache events need to be added
Event support status matrix:
Event status
-----------------------------
cycles works
cache-references works
cache-misses works
branch-misses works
bus-cycles partially (does not work on 64bit cpu with HT enabled)
instruction doesnt work (needs dependent event [mop tagging])
branches doesnt work
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100311165439.GB5129@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>