2011-08-31 07:41:05 +08:00
|
|
|
#include <linux/bitops.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/slab.h>
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
#include <asm/perf_event.h>
|
2012-02-10 06:20:58 +08:00
|
|
|
#include <asm/insn.h>
|
2011-08-31 07:41:05 +08:00
|
|
|
|
|
|
|
#include "perf_event.h"
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
/* The size of a BTS record in bytes: */
|
|
|
|
#define BTS_RECORD_SIZE 24
|
|
|
|
|
|
|
|
#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
|
|
|
|
#define PEBS_BUFFER_SIZE PAGE_SIZE
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
#define PEBS_FIXUP_SIZE PAGE_SIZE
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* pebs_record_32 for p4 and core not supported
|
|
|
|
|
|
|
|
struct pebs_record_32 {
|
|
|
|
u32 flags, ip;
|
|
|
|
u32 ax, bc, cx, dx;
|
|
|
|
u32 si, di, bp, sp;
|
|
|
|
};
|
|
|
|
|
|
|
|
*/
|
|
|
|
|
2013-01-24 23:10:32 +08:00
|
|
|
union intel_x86_pebs_dse {
|
|
|
|
u64 val;
|
|
|
|
struct {
|
|
|
|
unsigned int ld_dse:4;
|
|
|
|
unsigned int ld_stlb_miss:1;
|
|
|
|
unsigned int ld_locked:1;
|
|
|
|
unsigned int ld_reserved:26;
|
|
|
|
};
|
|
|
|
struct {
|
|
|
|
unsigned int st_l1d_hit:1;
|
|
|
|
unsigned int st_reserved1:3;
|
|
|
|
unsigned int st_stlb_miss:1;
|
|
|
|
unsigned int st_locked:1;
|
|
|
|
unsigned int st_reserved2:26;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Map PEBS Load Latency Data Source encodings to generic
|
|
|
|
* memory data source information
|
|
|
|
*/
|
|
|
|
#define P(a, b) PERF_MEM_S(a, b)
|
|
|
|
#define OP_LH (P(OP, LOAD) | P(LVL, HIT))
|
|
|
|
#define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS))
|
|
|
|
|
|
|
|
static const u64 pebs_data_source[] = {
|
|
|
|
P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
|
|
|
|
OP_LH | P(LVL, L1) | P(SNOOP, NONE), /* 0x01: L1 local */
|
|
|
|
OP_LH | P(LVL, LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */
|
|
|
|
OP_LH | P(LVL, L2) | P(SNOOP, NONE), /* 0x03: L2 hit */
|
|
|
|
OP_LH | P(LVL, L3) | P(SNOOP, NONE), /* 0x04: L3 hit */
|
|
|
|
OP_LH | P(LVL, L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */
|
|
|
|
OP_LH | P(LVL, L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */
|
|
|
|
OP_LH | P(LVL, L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */
|
|
|
|
OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */
|
|
|
|
OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
|
|
|
|
OP_LH | P(LVL, LOC_RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */
|
|
|
|
OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */
|
|
|
|
OP_LH | P(LVL, LOC_RAM) | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */
|
|
|
|
OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */
|
|
|
|
OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */
|
|
|
|
OP_LH | P(LVL, UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
|
|
|
|
};
|
|
|
|
|
2013-01-24 23:10:34 +08:00
|
|
|
static u64 precise_store_data(u64 status)
|
|
|
|
{
|
|
|
|
union intel_x86_pebs_dse dse;
|
|
|
|
u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
|
|
|
|
|
|
|
|
dse.val = status;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bit 4: TLB access
|
|
|
|
* 1 = stored missed 2nd level TLB
|
|
|
|
*
|
|
|
|
* so it either hit the walker or the OS
|
|
|
|
* otherwise hit 2nd level TLB
|
|
|
|
*/
|
|
|
|
if (dse.st_stlb_miss)
|
|
|
|
val |= P(TLB, MISS);
|
|
|
|
else
|
|
|
|
val |= P(TLB, HIT);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bit 0: hit L1 data cache
|
|
|
|
* if not set, then all we know is that
|
|
|
|
* it missed L1D
|
|
|
|
*/
|
|
|
|
if (dse.st_l1d_hit)
|
|
|
|
val |= P(LVL, HIT);
|
|
|
|
else
|
|
|
|
val |= P(LVL, MISS);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bit 5: Locked prefix
|
|
|
|
*/
|
|
|
|
if (dse.st_locked)
|
|
|
|
val |= P(LOCK, LOCKED);
|
|
|
|
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
2014-08-12 03:27:13 +08:00
|
|
|
static u64 precise_datala_hsw(struct perf_event *event, u64 status)
|
2013-06-18 08:36:52 +08:00
|
|
|
{
|
|
|
|
union perf_mem_data_src dse;
|
|
|
|
|
2014-08-12 03:27:12 +08:00
|
|
|
dse.val = PERF_MEM_NA;
|
|
|
|
|
|
|
|
if (event->hw.flags & PERF_X86_EVENT_PEBS_ST_HSW)
|
|
|
|
dse.mem_op = PERF_MEM_OP_STORE;
|
|
|
|
else if (event->hw.flags & PERF_X86_EVENT_PEBS_LD_HSW)
|
|
|
|
dse.mem_op = PERF_MEM_OP_LOAD;
|
2014-05-15 23:56:44 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* L1 info only valid for following events:
|
|
|
|
*
|
|
|
|
* MEM_UOPS_RETIRED.STLB_MISS_STORES
|
|
|
|
* MEM_UOPS_RETIRED.LOCK_STORES
|
|
|
|
* MEM_UOPS_RETIRED.SPLIT_STORES
|
|
|
|
* MEM_UOPS_RETIRED.ALL_STORES
|
|
|
|
*/
|
2014-08-12 03:27:13 +08:00
|
|
|
if (event->hw.flags & PERF_X86_EVENT_PEBS_ST_HSW) {
|
|
|
|
if (status & 1)
|
|
|
|
dse.mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
|
|
|
|
else
|
|
|
|
dse.mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_MISS;
|
|
|
|
}
|
2013-06-18 08:36:52 +08:00
|
|
|
return dse.val;
|
|
|
|
}
|
|
|
|
|
2013-01-24 23:10:32 +08:00
|
|
|
static u64 load_latency_data(u64 status)
|
|
|
|
{
|
|
|
|
union intel_x86_pebs_dse dse;
|
|
|
|
u64 val;
|
|
|
|
int model = boot_cpu_data.x86_model;
|
|
|
|
int fam = boot_cpu_data.x86;
|
|
|
|
|
|
|
|
dse.val = status;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* use the mapping table for bit 0-3
|
|
|
|
*/
|
|
|
|
val = pebs_data_source[dse.ld_dse];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Nehalem models do not support TLB, Lock infos
|
|
|
|
*/
|
|
|
|
if (fam == 0x6 && (model == 26 || model == 30
|
|
|
|
|| model == 31 || model == 46)) {
|
|
|
|
val |= P(TLB, NA) | P(LOCK, NA);
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* bit 4: TLB access
|
|
|
|
* 0 = did not miss 2nd level TLB
|
|
|
|
* 1 = missed 2nd level TLB
|
|
|
|
*/
|
|
|
|
if (dse.ld_stlb_miss)
|
|
|
|
val |= P(TLB, MISS) | P(TLB, L2);
|
|
|
|
else
|
|
|
|
val |= P(TLB, HIT) | P(TLB, L1) | P(TLB, L2);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* bit 5: locked prefix
|
|
|
|
*/
|
|
|
|
if (dse.ld_locked)
|
|
|
|
val |= P(LOCK, LOCKED);
|
|
|
|
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
struct pebs_record_core {
|
|
|
|
u64 flags, ip;
|
|
|
|
u64 ax, bx, cx, dx;
|
|
|
|
u64 si, di, bp, sp;
|
|
|
|
u64 r8, r9, r10, r11;
|
|
|
|
u64 r12, r13, r14, r15;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct pebs_record_nhm {
|
|
|
|
u64 flags, ip;
|
|
|
|
u64 ax, bx, cx, dx;
|
|
|
|
u64 si, di, bp, sp;
|
|
|
|
u64 r8, r9, r10, r11;
|
|
|
|
u64 r12, r13, r14, r15;
|
|
|
|
u64 status, dla, dse, lat;
|
|
|
|
};
|
|
|
|
|
2013-06-18 08:36:47 +08:00
|
|
|
/*
|
|
|
|
* Same as pebs_record_nhm, with two additional fields.
|
|
|
|
*/
|
|
|
|
struct pebs_record_hsw {
|
2013-09-06 11:37:39 +08:00
|
|
|
u64 flags, ip;
|
|
|
|
u64 ax, bx, cx, dx;
|
|
|
|
u64 si, di, bp, sp;
|
|
|
|
u64 r8, r9, r10, r11;
|
|
|
|
u64 r12, r13, r14, r15;
|
|
|
|
u64 status, dla, dse, lat;
|
2013-09-12 19:00:47 +08:00
|
|
|
u64 real_ip, tsx_tuning;
|
2013-09-06 11:37:39 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
union hsw_tsx_tuning {
|
|
|
|
struct {
|
|
|
|
u32 cycles_last_block : 32,
|
|
|
|
hle_abort : 1,
|
|
|
|
rtm_abort : 1,
|
|
|
|
instruction_abort : 1,
|
|
|
|
non_instruction_abort : 1,
|
|
|
|
retry : 1,
|
|
|
|
data_conflict : 1,
|
|
|
|
capacity_writes : 1,
|
|
|
|
capacity_reads : 1;
|
|
|
|
};
|
|
|
|
u64 value;
|
2013-06-18 08:36:47 +08:00
|
|
|
};
|
|
|
|
|
2013-09-20 22:40:40 +08:00
|
|
|
#define PEBS_HSW_TSX_FLAGS 0xff00000000ULL
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void init_debug_store_on_cpu(int cpu)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
|
|
|
|
|
|
|
if (!ds)
|
|
|
|
return;
|
|
|
|
|
|
|
|
wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA,
|
|
|
|
(u32)((u64)(unsigned long)ds),
|
|
|
|
(u32)((u64)(unsigned long)ds >> 32));
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void fini_debug_store_on_cpu(int cpu)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
if (!per_cpu(cpu_hw_events, cpu).ds)
|
|
|
|
return;
|
|
|
|
|
|
|
|
wrmsr_on_cpu(cpu, MSR_IA32_DS_AREA, 0, 0);
|
|
|
|
}
|
|
|
|
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
static DEFINE_PER_CPU(void *, insn_buffer);
|
|
|
|
|
2010-10-19 20:15:04 +08:00
|
|
|
static int alloc_pebs_buffer(int cpu)
|
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
2010-10-19 20:55:33 +08:00
|
|
|
int node = cpu_to_node(cpu);
|
2010-10-19 20:15:04 +08:00
|
|
|
int max, thresh = 1; /* always use a single PEBS record */
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
void *buffer, *ibuffer;
|
2010-10-19 20:15:04 +08:00
|
|
|
|
|
|
|
if (!x86_pmu.pebs)
|
|
|
|
return 0;
|
|
|
|
|
2013-08-30 04:59:17 +08:00
|
|
|
buffer = kzalloc_node(PEBS_BUFFER_SIZE, GFP_KERNEL, node);
|
2010-10-19 20:15:04 +08:00
|
|
|
if (unlikely(!buffer))
|
|
|
|
return -ENOMEM;
|
|
|
|
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
/*
|
|
|
|
* HSW+ already provides us the eventing ip; no need to allocate this
|
|
|
|
* buffer then.
|
|
|
|
*/
|
|
|
|
if (x86_pmu.intel_cap.pebs_format < 2) {
|
|
|
|
ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node);
|
|
|
|
if (!ibuffer) {
|
|
|
|
kfree(buffer);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
per_cpu(insn_buffer, cpu) = ibuffer;
|
|
|
|
}
|
|
|
|
|
2010-10-19 20:15:04 +08:00
|
|
|
max = PEBS_BUFFER_SIZE / x86_pmu.pebs_record_size;
|
|
|
|
|
|
|
|
ds->pebs_buffer_base = (u64)(unsigned long)buffer;
|
|
|
|
ds->pebs_index = ds->pebs_buffer_base;
|
|
|
|
ds->pebs_absolute_maximum = ds->pebs_buffer_base +
|
|
|
|
max * x86_pmu.pebs_record_size;
|
|
|
|
|
|
|
|
ds->pebs_interrupt_threshold = ds->pebs_buffer_base +
|
|
|
|
thresh * x86_pmu.pebs_record_size;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-10-19 20:08:29 +08:00
|
|
|
static void release_pebs_buffer(int cpu)
|
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
|
|
|
|
|
|
|
if (!ds || !x86_pmu.pebs)
|
|
|
|
return;
|
|
|
|
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
kfree(per_cpu(insn_buffer, cpu));
|
|
|
|
per_cpu(insn_buffer, cpu) = NULL;
|
|
|
|
|
2010-10-19 20:08:29 +08:00
|
|
|
kfree((void *)(unsigned long)ds->pebs_buffer_base);
|
|
|
|
ds->pebs_buffer_base = 0;
|
|
|
|
}
|
|
|
|
|
2010-10-19 20:15:04 +08:00
|
|
|
static int alloc_bts_buffer(int cpu)
|
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
2010-10-19 20:55:33 +08:00
|
|
|
int node = cpu_to_node(cpu);
|
2010-10-19 20:15:04 +08:00
|
|
|
int max, thresh;
|
|
|
|
void *buffer;
|
|
|
|
|
|
|
|
if (!x86_pmu.bts)
|
|
|
|
return 0;
|
|
|
|
|
2014-07-01 07:04:08 +08:00
|
|
|
buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node);
|
|
|
|
if (unlikely(!buffer)) {
|
|
|
|
WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__);
|
2010-10-19 20:15:04 +08:00
|
|
|
return -ENOMEM;
|
2014-07-01 07:04:08 +08:00
|
|
|
}
|
2010-10-19 20:15:04 +08:00
|
|
|
|
|
|
|
max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE;
|
|
|
|
thresh = max / 16;
|
|
|
|
|
|
|
|
ds->bts_buffer_base = (u64)(unsigned long)buffer;
|
|
|
|
ds->bts_index = ds->bts_buffer_base;
|
|
|
|
ds->bts_absolute_maximum = ds->bts_buffer_base +
|
|
|
|
max * BTS_RECORD_SIZE;
|
|
|
|
ds->bts_interrupt_threshold = ds->bts_absolute_maximum -
|
|
|
|
thresh * BTS_RECORD_SIZE;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-10-19 20:08:29 +08:00
|
|
|
static void release_bts_buffer(int cpu)
|
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
|
|
|
|
|
|
|
if (!ds || !x86_pmu.bts)
|
|
|
|
return;
|
|
|
|
|
|
|
|
kfree((void *)(unsigned long)ds->bts_buffer_base);
|
|
|
|
ds->bts_buffer_base = 0;
|
|
|
|
}
|
|
|
|
|
2010-10-19 20:37:23 +08:00
|
|
|
static int alloc_ds_buffer(int cpu)
|
|
|
|
{
|
2010-10-19 20:55:33 +08:00
|
|
|
int node = cpu_to_node(cpu);
|
2010-10-19 20:37:23 +08:00
|
|
|
struct debug_store *ds;
|
|
|
|
|
2013-08-30 04:59:17 +08:00
|
|
|
ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node);
|
2010-10-19 20:37:23 +08:00
|
|
|
if (unlikely(!ds))
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
per_cpu(cpu_hw_events, cpu).ds = ds;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void release_ds_buffer(int cpu)
|
|
|
|
{
|
|
|
|
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
|
|
|
|
|
|
|
|
if (!ds)
|
|
|
|
return;
|
|
|
|
|
|
|
|
per_cpu(cpu_hw_events, cpu).ds = NULL;
|
|
|
|
kfree(ds);
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void release_ds_buffers(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
|
|
|
|
if (!x86_pmu.bts && !x86_pmu.pebs)
|
|
|
|
return;
|
|
|
|
|
|
|
|
get_online_cpus();
|
|
|
|
for_each_online_cpu(cpu)
|
|
|
|
fini_debug_store_on_cpu(cpu);
|
|
|
|
|
|
|
|
for_each_possible_cpu(cpu) {
|
2010-10-19 20:08:29 +08:00
|
|
|
release_pebs_buffer(cpu);
|
|
|
|
release_bts_buffer(cpu);
|
2010-10-19 20:37:23 +08:00
|
|
|
release_ds_buffer(cpu);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
put_online_cpus();
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void reserve_ds_buffers(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
2010-10-19 20:22:50 +08:00
|
|
|
int bts_err = 0, pebs_err = 0;
|
|
|
|
int cpu;
|
|
|
|
|
|
|
|
x86_pmu.bts_active = 0;
|
|
|
|
x86_pmu.pebs_active = 0;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
if (!x86_pmu.bts && !x86_pmu.pebs)
|
2010-10-19 20:50:02 +08:00
|
|
|
return;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (!x86_pmu.bts)
|
|
|
|
bts_err = 1;
|
|
|
|
|
|
|
|
if (!x86_pmu.pebs)
|
|
|
|
pebs_err = 1;
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
get_online_cpus();
|
|
|
|
|
|
|
|
for_each_possible_cpu(cpu) {
|
2010-10-19 20:22:50 +08:00
|
|
|
if (alloc_ds_buffer(cpu)) {
|
|
|
|
bts_err = 1;
|
|
|
|
pebs_err = 1;
|
|
|
|
}
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (!bts_err && alloc_bts_buffer(cpu))
|
|
|
|
bts_err = 1;
|
|
|
|
|
|
|
|
if (!pebs_err && alloc_pebs_buffer(cpu))
|
|
|
|
pebs_err = 1;
|
2010-10-19 20:15:04 +08:00
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (bts_err && pebs_err)
|
2010-10-19 20:15:04 +08:00
|
|
|
break;
|
2010-10-19 20:22:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (bts_err) {
|
|
|
|
for_each_possible_cpu(cpu)
|
|
|
|
release_bts_buffer(cpu);
|
|
|
|
}
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (pebs_err) {
|
|
|
|
for_each_possible_cpu(cpu)
|
|
|
|
release_pebs_buffer(cpu);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (bts_err && pebs_err) {
|
|
|
|
for_each_possible_cpu(cpu)
|
|
|
|
release_ds_buffer(cpu);
|
|
|
|
} else {
|
|
|
|
if (x86_pmu.bts && !bts_err)
|
|
|
|
x86_pmu.bts_active = 1;
|
|
|
|
|
|
|
|
if (x86_pmu.pebs && !pebs_err)
|
|
|
|
x86_pmu.pebs_active = 1;
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
for_each_online_cpu(cpu)
|
|
|
|
init_debug_store_on_cpu(cpu);
|
|
|
|
}
|
|
|
|
|
|
|
|
put_online_cpus();
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* BTS
|
|
|
|
*/
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint bts_constraint =
|
2012-06-21 02:46:33 +08:00
|
|
|
EVENT_CONSTRAINT(0, 1ULL << INTEL_PMC_IDX_FIXED_BTS, 0);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_enable_bts(u64 config)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
unsigned long debugctlmsr;
|
|
|
|
|
|
|
|
debugctlmsr = get_debugctlmsr();
|
|
|
|
|
2010-03-25 21:51:49 +08:00
|
|
|
debugctlmsr |= DEBUGCTLMSR_TR;
|
|
|
|
debugctlmsr |= DEBUGCTLMSR_BTS;
|
|
|
|
debugctlmsr |= DEBUGCTLMSR_BTINT;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
if (!(config & ARCH_PERFMON_EVENTSEL_OS))
|
2010-03-25 21:51:49 +08:00
|
|
|
debugctlmsr |= DEBUGCTLMSR_BTS_OFF_OS;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
if (!(config & ARCH_PERFMON_EVENTSEL_USR))
|
2010-03-25 21:51:49 +08:00
|
|
|
debugctlmsr |= DEBUGCTLMSR_BTS_OFF_USR;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
update_debugctlmsr(debugctlmsr);
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_disable_bts(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
unsigned long debugctlmsr;
|
|
|
|
|
|
|
|
if (!cpuc->ds)
|
|
|
|
return;
|
|
|
|
|
|
|
|
debugctlmsr = get_debugctlmsr();
|
|
|
|
|
|
|
|
debugctlmsr &=
|
2010-03-25 21:51:49 +08:00
|
|
|
~(DEBUGCTLMSR_TR | DEBUGCTLMSR_BTS | DEBUGCTLMSR_BTINT |
|
|
|
|
DEBUGCTLMSR_BTS_OFF_OS | DEBUGCTLMSR_BTS_OFF_USR);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
update_debugctlmsr(debugctlmsr);
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
int intel_pmu_drain_bts_buffer(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
struct debug_store *ds = cpuc->ds;
|
|
|
|
struct bts_record {
|
|
|
|
u64 from;
|
|
|
|
u64 to;
|
|
|
|
u64 flags;
|
|
|
|
};
|
2012-06-21 02:46:33 +08:00
|
|
|
struct perf_event *event = cpuc->events[INTEL_PMC_IDX_FIXED_BTS];
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
struct bts_record *at, *top;
|
|
|
|
struct perf_output_handle handle;
|
|
|
|
struct perf_event_header header;
|
|
|
|
struct perf_sample_data data;
|
|
|
|
struct pt_regs regs;
|
|
|
|
|
|
|
|
if (!event)
|
2010-09-10 19:28:01 +08:00
|
|
|
return 0;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (!x86_pmu.bts_active)
|
2010-09-10 19:28:01 +08:00
|
|
|
return 0;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
at = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
|
|
|
|
top = (struct bts_record *)(unsigned long)ds->bts_index;
|
|
|
|
|
|
|
|
if (top <= at)
|
2010-09-10 19:28:01 +08:00
|
|
|
return 0;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2013-03-19 23:10:38 +08:00
|
|
|
memset(®s, 0, sizeof(regs));
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
ds->bts_index = ds->bts_buffer_base;
|
|
|
|
|
2012-04-03 02:19:08 +08:00
|
|
|
perf_sample_data_init(&data, 0, event->hw.last_period);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Prepare a generic sample, i.e. fill in the invariant fields.
|
|
|
|
* We will overwrite the from and to address before we output
|
|
|
|
* the sample.
|
|
|
|
*/
|
|
|
|
perf_prepare_sample(&header, &data, event, ®s);
|
|
|
|
|
2011-06-27 22:47:16 +08:00
|
|
|
if (perf_output_begin(&handle, event, header.size * (top - at)))
|
2010-09-10 19:28:01 +08:00
|
|
|
return 1;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
for (; at < top; at++) {
|
|
|
|
data.ip = at->from;
|
|
|
|
data.addr = at->to;
|
|
|
|
|
|
|
|
perf_output_sample(&handle, &header, &data, event);
|
|
|
|
}
|
|
|
|
|
|
|
|
perf_output_end(&handle);
|
|
|
|
|
|
|
|
/* There's new data available. */
|
|
|
|
event->hw.interrupts++;
|
|
|
|
event->pending_kill = POLL_IN;
|
2010-09-10 19:28:01 +08:00
|
|
|
return 1;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* PEBS
|
|
|
|
*/
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint intel_core2_pebs_event_constraints[] = {
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0xfec1, 0x1), /* X87_OPS_RETIRED.ANY */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c5, 0x1), /* BR_INST_RETIRED.MISPRED */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x1fc7, 0x1), /* SIMD_INST_RETURED.ANY */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint intel_atom_pebs_event_constraints[] = {
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c0, 0x1), /* INST_RETIRED.ANY */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x00c5, 0x1), /* MISPREDICTED_BRANCH_RETIRED */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED.* */
|
2011-03-02 23:05:01 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2013-07-18 17:02:24 +08:00
|
|
|
struct event_constraint intel_slm_pebs_event_constraints[] = {
|
2014-08-12 03:27:10 +08:00
|
|
|
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
|
|
|
|
/* Allow all events as PEBS with no flags */
|
|
|
|
INTEL_ALL_EVENT_CONSTRAINT(0, 0x1),
|
2013-07-18 17:02:24 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint intel_nehalem_pebs_event_constraints[] = {
|
2013-01-24 23:10:32 +08:00
|
|
|
INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc0, 0xf), /* INST_RETIRED.ANY */
|
2011-03-09 23:21:29 +08:00
|
|
|
INTEL_EVENT_CONSTRAINT(0xc2, 0xf), /* UOPS_RETIRED.* */
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x02c5, 0xf), /* BR_MISP_RETIRED.NEAR_CALL */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc7, 0xf), /* SSEX_UOPS_RETIRED.* */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0xf), /* MEM_LOAD_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xf7, 0xf), /* FP_ASSIST.* */
|
2011-03-02 23:05:01 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint intel_westmere_pebs_event_constraints[] = {
|
2013-01-24 23:10:32 +08:00
|
|
|
INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc0, 0xf), /* INSTR_RETIRED.* */
|
2011-03-09 23:21:29 +08:00
|
|
|
INTEL_EVENT_CONSTRAINT(0xc2, 0xf), /* UOPS_RETIRED.* */
|
2014-09-24 22:34:48 +08:00
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xc7, 0xf), /* SSEX_UOPS_RETIRED.* */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x20c8, 0xf), /* ITLB_MISS_RETIRED */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xcb, 0xf), /* MEM_LOAD_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0xf7, 0xf), /* FP_ASSIST.* */
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint intel_snb_pebs_event_constraints[] = {
|
2014-09-24 22:34:47 +08:00
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
|
2013-01-24 23:10:32 +08:00
|
|
|
INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
|
2013-01-24 23:10:34 +08:00
|
|
|
INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
|
2014-08-12 03:27:10 +08:00
|
|
|
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
|
|
|
|
/* Allow all events as PEBS with no flags */
|
|
|
|
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
|
2011-03-02 21:27:04 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2012-09-11 07:07:01 +08:00
|
|
|
struct event_constraint intel_ivb_pebs_event_constraints[] = {
|
2014-09-24 22:34:47 +08:00
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
|
2013-01-24 23:10:32 +08:00
|
|
|
INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
|
2013-01-24 23:10:34 +08:00
|
|
|
INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
|
2014-08-12 03:27:10 +08:00
|
|
|
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
|
|
|
|
/* Allow all events as PEBS with no flags */
|
|
|
|
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
|
2012-09-11 07:07:01 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2013-06-18 08:36:49 +08:00
|
|
|
struct event_constraint intel_hsw_pebs_event_constraints[] = {
|
2014-09-24 22:34:47 +08:00
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
|
2014-08-12 03:27:10 +08:00
|
|
|
INTEL_PLD_CONSTRAINT(0x01cd, 0xf), /* MEM_TRANS_RETIRED.* */
|
|
|
|
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
|
|
|
|
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_UOPS_L3_HIT_RETIRED.* */
|
|
|
|
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_UOPS_L3_MISS_RETIRED.* */
|
|
|
|
/* Allow all events as PEBS with no flags */
|
|
|
|
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
|
2013-06-18 08:36:49 +08:00
|
|
|
EVENT_CONSTRAINT_END
|
|
|
|
};
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
struct event_constraint *intel_pebs_constraints(struct perf_event *event)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
struct event_constraint *c;
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
if (!event->attr.precise_ip)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (x86_pmu.pebs_constraints) {
|
|
|
|
for_each_event_constraint(c, x86_pmu.pebs_constraints) {
|
2013-01-24 23:10:27 +08:00
|
|
|
if ((event->hw.config & c->cmask) == c->code) {
|
|
|
|
event->hw.flags |= c->flags;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
return c;
|
2013-01-24 23:10:27 +08:00
|
|
|
}
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return &emptyconstraint;
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_pebs_enable(struct perf_event *event)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
2010-03-03 20:12:23 +08:00
|
|
|
struct hw_perf_event *hwc = &event->hw;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
|
|
|
|
|
2010-03-07 02:49:06 +08:00
|
|
|
cpuc->pebs_enabled |= 1ULL << hwc->idx;
|
2013-01-24 23:10:32 +08:00
|
|
|
|
|
|
|
if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
|
|
|
|
cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
|
2013-01-24 23:10:34 +08:00
|
|
|
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
|
|
|
|
cpuc->pebs_enabled |= 1ULL << 63;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_pebs_disable(struct perf_event *event)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
2010-03-03 20:12:23 +08:00
|
|
|
struct hw_perf_event *hwc = &event->hw;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-03-07 02:49:06 +08:00
|
|
|
cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
|
2013-06-21 22:20:41 +08:00
|
|
|
|
|
|
|
if (event->hw.constraint->flags & PERF_X86_EVENT_PEBS_LDLAT)
|
|
|
|
cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
|
|
|
|
else if (event->hw.constraint->flags & PERF_X86_EVENT_PEBS_ST)
|
|
|
|
cpuc->pebs_enabled &= ~(1ULL << 63);
|
|
|
|
|
2010-03-06 20:47:07 +08:00
|
|
|
if (cpuc->enabled)
|
2010-03-07 02:49:06 +08:00
|
|
|
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
hwc->config |= ARCH_PERFMON_EVENTSEL_INT;
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_pebs_enable_all(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
if (cpuc->pebs_enabled)
|
|
|
|
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
|
|
|
|
}
|
|
|
|
|
2011-08-31 07:41:05 +08:00
|
|
|
void intel_pmu_pebs_disable_all(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
if (cpuc->pebs_enabled)
|
|
|
|
wrmsrl(MSR_IA32_PEBS_ENABLE, 0);
|
|
|
|
}
|
|
|
|
|
2010-03-03 20:12:23 +08:00
|
|
|
static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
|
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
2010-03-03 20:12:23 +08:00
|
|
|
unsigned long from = cpuc->lbr_entries[0].from;
|
|
|
|
unsigned long old_to, to = cpuc->lbr_entries[0].to;
|
|
|
|
unsigned long ip = regs->ip;
|
2011-10-07 19:36:40 +08:00
|
|
|
int is_64bit = 0;
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
void *kaddr;
|
2010-03-03 20:12:23 +08:00
|
|
|
|
2010-03-04 00:07:40 +08:00
|
|
|
/*
|
|
|
|
* We don't need to fixup if the PEBS assist is fault like
|
|
|
|
*/
|
|
|
|
if (!x86_pmu.intel_cap.pebs_trap)
|
|
|
|
return 1;
|
|
|
|
|
2010-03-05 23:29:14 +08:00
|
|
|
/*
|
|
|
|
* No LBR entry, no basic block, no rewinding
|
|
|
|
*/
|
2010-03-03 20:12:23 +08:00
|
|
|
if (!cpuc->lbr_stack.nr || !from || !to)
|
|
|
|
return 0;
|
|
|
|
|
2010-03-05 23:29:14 +08:00
|
|
|
/*
|
|
|
|
* Basic blocks should never cross user/kernel boundaries
|
|
|
|
*/
|
|
|
|
if (kernel_ip(ip) != kernel_ip(to))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* unsigned math, either ip is before the start (impossible) or
|
|
|
|
* the basic block is larger than 1 page (sanity)
|
|
|
|
*/
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
if ((ip - to) > PEBS_FIXUP_SIZE)
|
2010-03-03 20:12:23 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We sampled a branch insn, rewind using the LBR stack
|
|
|
|
*/
|
|
|
|
if (ip == to) {
|
2012-07-10 15:42:15 +08:00
|
|
|
set_linear_ip(regs, from);
|
2010-03-03 20:12:23 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
if (!kernel_ip(ip)) {
|
|
|
|
int size, bytes;
|
|
|
|
u8 *buf = this_cpu_read(insn_buffer);
|
|
|
|
|
|
|
|
size = ip - to; /* Must fit our buffer, see above */
|
|
|
|
bytes = copy_from_user_nmi(buf, (void __user *)to, size);
|
2013-10-31 04:16:22 +08:00
|
|
|
if (bytes != 0)
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
kaddr = buf;
|
|
|
|
} else {
|
|
|
|
kaddr = (void *)to;
|
|
|
|
}
|
|
|
|
|
2010-03-03 20:12:23 +08:00
|
|
|
do {
|
|
|
|
struct insn insn;
|
|
|
|
|
|
|
|
old_to = to;
|
|
|
|
|
2011-10-07 19:36:40 +08:00
|
|
|
#ifdef CONFIG_X86_64
|
|
|
|
is_64bit = kernel_ip(to) || !test_thread_flag(TIF_IA32);
|
|
|
|
#endif
|
|
|
|
insn_init(&insn, kaddr, is_64bit);
|
2010-03-03 20:12:23 +08:00
|
|
|
insn_get_length(&insn);
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
|
2010-03-03 20:12:23 +08:00
|
|
|
to += insn.length;
|
perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:
[ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
[ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs
Don Zickus analyzed the source of the overhead and reported:
> While there are a few places that are causing latencies, for now I focused on
> the longest one first. It seems to be 'copy_user_from_nmi'
>
> intel_pmu_handle_irq ->
> intel_pmu_drain_pebs_nhm ->
> __intel_pmu_drain_pebs_nhm ->
> __intel_pmu_pebs_event ->
> intel_pmu_pebs_fixup_ip ->
> copy_from_user_nmi
>
> In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
> all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
> (there are some cases where only 10 iterations are needed to go that high
> too, but in generall over 50 or so). At this point copy_user_from_nmi
> seems to account for over 90% of the nmi latency.
The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.
Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.
Don reported this test result:
> Your patch made a huge difference in improvement. The
> copy_from_user_nmi() no longer hits the million of cycles. I still
> have a batch of 100,000-300,000 cycles. My longest NMI paths used
> to be dominated by copy_from_user_nmi, now it is not (I have to dig
> up the new hot path).
Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 18:14:04 +08:00
|
|
|
kaddr += insn.length;
|
2010-03-03 20:12:23 +08:00
|
|
|
} while (to < ip);
|
|
|
|
|
|
|
|
if (to == ip) {
|
2012-07-10 15:42:15 +08:00
|
|
|
set_linear_ip(regs, old_to);
|
2010-03-03 20:12:23 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2010-03-05 23:29:14 +08:00
|
|
|
/*
|
|
|
|
* Even though we decoded the basic block, the instruction stream
|
|
|
|
* never matched the given IP, either the TO or the IP got corrupted.
|
|
|
|
*/
|
2010-03-03 20:12:23 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-06 11:37:39 +08:00
|
|
|
static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
|
|
|
|
{
|
|
|
|
if (pebs->tsx_tuning) {
|
|
|
|
union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning };
|
|
|
|
return tsx.cycles_last_block;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-20 22:40:40 +08:00
|
|
|
static inline u64 intel_hsw_transaction(struct pebs_record_hsw *pebs)
|
|
|
|
{
|
|
|
|
u64 txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32;
|
|
|
|
|
|
|
|
/* For RTM XABORTs also log the abort code from AX */
|
|
|
|
if ((txn & PERF_TXN_TRANSACTION) && (pebs->ax & 1))
|
|
|
|
txn |= ((pebs->ax >> 24) & 0xff) << PERF_TXN_ABORT_SHIFT;
|
|
|
|
return txn;
|
|
|
|
}
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
static void __intel_pmu_pebs_event(struct perf_event *event,
|
|
|
|
struct pt_regs *iregs, void *__pebs)
|
|
|
|
{
|
2014-08-12 03:27:13 +08:00
|
|
|
#define PERF_X86_EVENT_PEBS_HSW_PREC \
|
|
|
|
(PERF_X86_EVENT_PEBS_ST_HSW | \
|
|
|
|
PERF_X86_EVENT_PEBS_LD_HSW | \
|
|
|
|
PERF_X86_EVENT_PEBS_NA_HSW)
|
2010-04-09 05:03:20 +08:00
|
|
|
/*
|
2013-09-12 19:00:47 +08:00
|
|
|
* We cast to the biggest pebs_record but are careful not to
|
|
|
|
* unconditionally access the 'extra' entries.
|
2010-04-09 05:03:20 +08:00
|
|
|
*/
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
2013-09-06 11:37:39 +08:00
|
|
|
struct pebs_record_hsw *pebs = __pebs;
|
2010-04-09 05:03:20 +08:00
|
|
|
struct perf_sample_data data;
|
|
|
|
struct pt_regs regs;
|
2013-01-24 23:10:32 +08:00
|
|
|
u64 sample_type;
|
2014-08-12 03:27:13 +08:00
|
|
|
int fll, fst, dsrc;
|
|
|
|
int fl = event->hw.flags;
|
2010-04-09 05:03:20 +08:00
|
|
|
|
|
|
|
if (!intel_pmu_save_and_restart(event))
|
|
|
|
return;
|
|
|
|
|
2014-08-12 03:27:13 +08:00
|
|
|
sample_type = event->attr.sample_type;
|
|
|
|
dsrc = sample_type & PERF_SAMPLE_DATA_SRC;
|
|
|
|
|
|
|
|
fll = fl & PERF_X86_EVENT_PEBS_LDLAT;
|
|
|
|
fst = fl & (PERF_X86_EVENT_PEBS_ST | PERF_X86_EVENT_PEBS_HSW_PREC);
|
2013-01-24 23:10:32 +08:00
|
|
|
|
2012-04-03 02:19:08 +08:00
|
|
|
perf_sample_data_init(&data, 0, event->hw.last_period);
|
2010-04-09 05:03:20 +08:00
|
|
|
|
2013-01-24 23:10:32 +08:00
|
|
|
data.period = event->hw.last_period;
|
|
|
|
|
|
|
|
/*
|
2014-08-12 03:27:13 +08:00
|
|
|
* Use latency for weight (only avail with PEBS-LL)
|
2013-01-24 23:10:32 +08:00
|
|
|
*/
|
2014-08-12 03:27:13 +08:00
|
|
|
if (fll && (sample_type & PERF_SAMPLE_WEIGHT))
|
|
|
|
data.weight = pebs->lat;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* data.data_src encodes the data source
|
|
|
|
*/
|
|
|
|
if (dsrc) {
|
|
|
|
u64 val = PERF_MEM_NA;
|
|
|
|
if (fll)
|
|
|
|
val = load_latency_data(pebs->dse);
|
|
|
|
else if (fst && (fl & PERF_X86_EVENT_PEBS_HSW_PREC))
|
|
|
|
val = precise_datala_hsw(event, pebs->dse);
|
|
|
|
else if (fst)
|
|
|
|
val = precise_store_data(pebs->dse);
|
|
|
|
data.data_src.val = val;
|
2013-01-24 23:10:32 +08:00
|
|
|
}
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
/*
|
|
|
|
* We use the interrupt regs as a base because the PEBS record
|
|
|
|
* does not contain a full regs set, specifically it seems to
|
|
|
|
* lack segment descriptors, which get used by things like
|
|
|
|
* user_mode().
|
|
|
|
*
|
|
|
|
* In the simple case fix up only the IP and BP,SP regs, for
|
|
|
|
* PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
|
|
|
|
* A possible PERF_SAMPLE_REGS will have to transfer all regs.
|
|
|
|
*/
|
|
|
|
regs = *iregs;
|
2012-07-10 15:42:15 +08:00
|
|
|
regs.flags = pebs->flags;
|
|
|
|
set_linear_ip(®s, pebs->ip);
|
2010-04-09 05:03:20 +08:00
|
|
|
regs.bp = pebs->bp;
|
|
|
|
regs.sp = pebs->sp;
|
|
|
|
|
2013-06-18 08:36:47 +08:00
|
|
|
if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
|
2013-09-06 11:37:39 +08:00
|
|
|
regs.ip = pebs->real_ip;
|
2013-06-18 08:36:47 +08:00
|
|
|
regs.flags |= PERF_EFLAGS_EXACT;
|
|
|
|
} else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(®s))
|
2010-04-09 05:03:20 +08:00
|
|
|
regs.flags |= PERF_EFLAGS_EXACT;
|
|
|
|
else
|
|
|
|
regs.flags &= ~PERF_EFLAGS_EXACT;
|
|
|
|
|
2014-08-12 03:27:13 +08:00
|
|
|
if ((sample_type & PERF_SAMPLE_ADDR) &&
|
2013-09-12 19:00:47 +08:00
|
|
|
x86_pmu.intel_cap.pebs_format >= 1)
|
2013-06-18 08:36:52 +08:00
|
|
|
data.addr = pebs->dla;
|
|
|
|
|
2013-09-20 22:40:40 +08:00
|
|
|
if (x86_pmu.intel_cap.pebs_format >= 2) {
|
|
|
|
/* Only set the TSX weight when no memory weight. */
|
2014-08-12 03:27:13 +08:00
|
|
|
if ((sample_type & PERF_SAMPLE_WEIGHT) && !fll)
|
2013-09-20 22:40:40 +08:00
|
|
|
data.weight = intel_hsw_weight(pebs);
|
|
|
|
|
2014-08-12 03:27:13 +08:00
|
|
|
if (sample_type & PERF_SAMPLE_TRANSACTION)
|
2013-09-20 22:40:40 +08:00
|
|
|
data.txn = intel_hsw_transaction(pebs);
|
|
|
|
}
|
2013-09-06 11:37:39 +08:00
|
|
|
|
2012-02-10 06:20:57 +08:00
|
|
|
if (has_branch_stack(event))
|
|
|
|
data.br_stack = &cpuc->lbr_stack;
|
|
|
|
|
2011-06-27 20:41:57 +08:00
|
|
|
if (perf_event_overflow(event, &data, ®s))
|
perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.
The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.
This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).
It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).
The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:
1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state
2) We store the counter state and ignore all read/overflow events
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-16 20:37:10 +08:00
|
|
|
x86_pmu_stop(event, 0);
|
2010-04-09 05:03:20 +08:00
|
|
|
}
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
|
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
struct debug_store *ds = cpuc->ds;
|
|
|
|
struct perf_event *event = cpuc->events[0]; /* PMC0 only */
|
|
|
|
struct pebs_record_core *at, *top;
|
|
|
|
int n;
|
|
|
|
|
2010-10-19 20:22:50 +08:00
|
|
|
if (!x86_pmu.pebs_active)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
at = (struct pebs_record_core *)(unsigned long)ds->pebs_buffer_base;
|
|
|
|
top = (struct pebs_record_core *)(unsigned long)ds->pebs_index;
|
|
|
|
|
2010-03-09 18:41:02 +08:00
|
|
|
/*
|
|
|
|
* Whatever else happens, drain the thing
|
|
|
|
*/
|
|
|
|
ds->pebs_index = ds->pebs_buffer_base;
|
|
|
|
|
|
|
|
if (!test_bit(0, cpuc->active_mask))
|
2010-03-06 20:26:11 +08:00
|
|
|
return;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-03-09 18:41:02 +08:00
|
|
|
WARN_ON_ONCE(!event);
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
if (!event->attr.precise_ip)
|
2010-03-09 18:41:02 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
n = top - at;
|
|
|
|
if (n <= 0)
|
|
|
|
return;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2010-03-09 18:41:02 +08:00
|
|
|
/*
|
|
|
|
* Should not happen, we program the threshold at 1 and do not
|
|
|
|
* set a reset value.
|
|
|
|
*/
|
2012-06-06 08:56:48 +08:00
|
|
|
WARN_ONCE(n > 1, "bad leftover pebs %d\n", n);
|
2010-03-09 18:41:02 +08:00
|
|
|
at += n - 1;
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
__intel_pmu_pebs_event(event, iregs, at);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
|
2013-09-12 19:00:47 +08:00
|
|
|
static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-18 01:30:40 +08:00
|
|
|
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
struct debug_store *ds = cpuc->ds;
|
|
|
|
struct perf_event *event = NULL;
|
2013-09-12 19:00:47 +08:00
|
|
|
void *at, *top;
|
2010-03-07 01:57:38 +08:00
|
|
|
u64 status = 0;
|
2013-09-16 15:23:02 +08:00
|
|
|
int bit;
|
2013-09-12 19:00:47 +08:00
|
|
|
|
|
|
|
if (!x86_pmu.pebs_active)
|
|
|
|
return;
|
|
|
|
|
|
|
|
at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
|
|
|
|
top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
ds->pebs_index = ds->pebs_buffer_base;
|
|
|
|
|
2013-09-16 15:23:02 +08:00
|
|
|
if (unlikely(at > top))
|
2013-09-12 19:00:47 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Should not happen, we program the threshold at 1 and do not
|
|
|
|
* set a reset value.
|
|
|
|
*/
|
2013-09-16 15:23:02 +08:00
|
|
|
WARN_ONCE(top - at > x86_pmu.max_pebs_events * x86_pmu.pebs_record_size,
|
|
|
|
"Unexpected number of pebs records %ld\n",
|
2013-09-20 19:08:50 +08:00
|
|
|
(long)(top - at) / x86_pmu.pebs_record_size);
|
2013-09-12 19:00:47 +08:00
|
|
|
|
2013-06-18 08:36:47 +08:00
|
|
|
for (; at < top; at += x86_pmu.pebs_record_size) {
|
|
|
|
struct pebs_record_nhm *p = at;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
2013-06-18 08:36:47 +08:00
|
|
|
for_each_set_bit(bit, (unsigned long *)&p->status,
|
|
|
|
x86_pmu.max_pebs_events) {
|
2010-03-07 01:57:38 +08:00
|
|
|
event = cpuc->events[bit];
|
|
|
|
if (!test_bit(bit, cpuc->active_mask))
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
continue;
|
|
|
|
|
2010-03-07 01:57:38 +08:00
|
|
|
WARN_ON_ONCE(!event);
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
if (!event->attr.precise_ip)
|
2010-03-07 01:57:38 +08:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (__test_and_set_bit(bit, (unsigned long *)&status))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
break;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
|
2012-06-06 08:56:48 +08:00
|
|
|
if (!event || bit >= x86_pmu.max_pebs_events)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
continue;
|
|
|
|
|
2010-04-09 05:03:20 +08:00
|
|
|
__intel_pmu_pebs_event(event, iregs, at);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* BTS, PEBS probe and setup
|
|
|
|
*/
|
|
|
|
|
2014-08-27 00:49:45 +08:00
|
|
|
void __init intel_ds_init(void)
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* No support for 32bit formats
|
|
|
|
*/
|
|
|
|
if (!boot_cpu_has(X86_FEATURE_DTES64))
|
|
|
|
return;
|
|
|
|
|
|
|
|
x86_pmu.bts = boot_cpu_has(X86_FEATURE_BTS);
|
|
|
|
x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS);
|
|
|
|
if (x86_pmu.pebs) {
|
2010-03-04 00:07:40 +08:00
|
|
|
char pebs_type = x86_pmu.intel_cap.pebs_trap ? '+' : '-';
|
|
|
|
int format = x86_pmu.intel_cap.pebs_format;
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
|
|
|
|
switch (format) {
|
|
|
|
case 0:
|
2010-03-04 00:07:40 +08:00
|
|
|
printk(KERN_CONT "PEBS fmt0%c, ", pebs_type);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
x86_pmu.pebs_record_size = sizeof(struct pebs_record_core);
|
|
|
|
x86_pmu.drain_pebs = intel_pmu_drain_pebs_core;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case 1:
|
2010-03-04 00:07:40 +08:00
|
|
|
printk(KERN_CONT "PEBS fmt1%c, ", pebs_type);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
x86_pmu.pebs_record_size = sizeof(struct pebs_record_nhm);
|
|
|
|
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
|
|
|
|
break;
|
|
|
|
|
2013-06-18 08:36:47 +08:00
|
|
|
case 2:
|
|
|
|
pr_cont("PEBS fmt2%c, ", pebs_type);
|
|
|
|
x86_pmu.pebs_record_size = sizeof(struct pebs_record_hsw);
|
2013-09-12 19:00:47 +08:00
|
|
|
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
|
2013-06-18 08:36:47 +08:00
|
|
|
break;
|
|
|
|
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
default:
|
2010-03-04 00:07:40 +08:00
|
|
|
printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
|
perf, x86: Add PEBS infrastructure
This patch implements support for Intel Precise Event Based Sampling,
which is an alternative counter mode in which the counter triggers a
hardware assist to collect information on events. The hardware assist
takes a trap like snapshot of a subset of the machine registers.
This data is written to the Intel Debug-Store, which can be programmed
with a data threshold at which to raise a PMI.
With the PEBS hardware assist being trap like, the reported IP is always
one instruction after the actual instruction that triggered the event.
This implements a simple PEBS model that always takes a single PEBS event
at a time. This is done so that the interaction with the rest of the
system is as expected (freq adjust, period randomization, lbr,
callchains, etc.).
It adds an ABI element: perf_event_attr::precise, which indicates that we
wish to use this (constrained, but precise) mode.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: paulus@samba.org
Cc: eranian@google.com
Cc: robert.richter@amd.com
Cc: fweisbec@gmail.com
LKML-Reference: <20100304140100.392111285@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-03 02:52:12 +08:00
|
|
|
x86_pmu.pebs = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2013-03-15 21:26:07 +08:00
|
|
|
|
|
|
|
void perf_restore_debug_store(void)
|
|
|
|
{
|
2013-03-18 06:44:43 +08:00
|
|
|
struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
|
|
|
|
|
2013-03-15 21:26:07 +08:00
|
|
|
if (!x86_pmu.bts && !x86_pmu.pebs)
|
|
|
|
return;
|
|
|
|
|
2013-03-18 06:44:43 +08:00
|
|
|
wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds);
|
2013-03-15 21:26:07 +08:00
|
|
|
}
|