This patch implements the raw syscall tracepoints on PowerPC and exports
them for ftrace syscalls to use.
To minimise reworking existing code, I slightly re-ordered the thread
info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
within the 16 bits of the andi. instruction's UI field. The instructions
in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
_TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
is enabled.
In the case of 64bit PowerPC, arch_syscall_addr and
arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
work given the unusual system call table structure and symbol names that
start with a period.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add the cputable entry, regs and setup & restore entries for
the PowerPC A2 core.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This bit indicates that we are operating in hypervisor mode on a CPU
compliant to architecture 2.06 or later (currently server only).
We set it on POWER7 and have a boot-time CPU setup function that
clears it if MSR:HV isn't set (booting under a hypervisor).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
No need to have three of them.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
KVM: Fix signature of kvm_iommu_map_pages stub
KVM: MCE: Send SRAR SIGBUS directly
KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
KVM: fix typo in copyright notice
KVM: Disable interrupts around get_kernel_ns()
KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
KVM: MMU: move access code parsing to FNAME(walk_addr) function
KVM: MMU: audit: check whether have unsync sps after root sync
KVM: MMU: audit: introduce audit_printk to cleanup audit code
KVM: MMU: audit: unregister audit tracepoints before module unloaded
KVM: MMU: audit: fix vcpu's spte walking
KVM: MMU: set access bit for direct mapping
KVM: MMU: cleanup for error mask set while walk guest page table
KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
KVM: x86: Fix constant type in kvm_get_time_scale
KVM: VMX: Add AX to list of registers clobbered by guest switch
KVM guest: Move a printk that's using the clock before it's ready
KVM: x86: TSC catchup mode
...
We have all the hypervisor pieces in place now, but the guest parts are still
missing.
This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.
This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.
This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The new e5500 core is similar to the e500mc core but adds 64-bit
support. We support running it in 32-bit mode as it is identical to the
e500mc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: (63 commits)
of/platform: Register of_platform_drivers with an "of:" prefix
of/address: Clean up function declarations
of/spi: call of_register_spi_devices() from spi core code
of: Provide default of_node_to_nid() implementation.
of/device: Make of_device_make_bus_id() usable by other code.
of/irq: Fix endian issues in parsing interrupt specifiers
of: Fix phandle endian issues
of/flattree: fix of_flat_dt_is_compatible() to match the full compatible string
of: remove of_default_bus_ids
of: make of_find_device_by_node generic
microblaze: remove references to of_device and to_of_device
sparc: remove references to of_device and to_of_device
powerpc: remove references to of_device and to_of_device
of/device: Replace of_device with platform_device in includes and core code
of/device: Protect against binding of_platform_drivers to non-OF devices
of: remove asm/of_device.h
of: remove asm/of_platform.h
of/platform: remove all of_bus_type and of_platform_bus_type references
of: Merge of_platform_bus_type with platform_bus_type
drivercore/of: Add OF style matching to platform bus
...
Fix up trivial conflicts in arch/microblaze/kernel/Makefile due to just
some obj-y removals by the devicetree branch, while the microblaze
updates added a new file.
We use a similar technique to ppc32: We set a thread local flag
to indicate that we are about to enter or have entered the stop
state, and have fixup code in the async interrupt entry code that
reacts to this flag to make us return to a different location
(sets NIP to LINK in our case).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
--
v2. Fix lockdep bug
Re-mask interrupts when coming back from idle
Note that critical doorbells are an unimplemented stub just like
other critical or machine check handlers, since we haven't done
support for "levelled" exceptions yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch merges the common routines of_device_alloc() and
of_device_make_bus_id() from powerpc and microblaze.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
CC: Michal Simek <monstr@monstr.eu>
CC: Grant Likely <grant.likely@secretlab.ca>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: microblaze-uclinux@itee.uq.edu.au
CC: linuxppc-dev@ozlabs.org
CC: devicetree-discuss@lists.ozlabs.org
Implement perf-events based hw-breakpoint interfaces for PowerPC
64-bit server (Book III S) processors. This allows access to a
given location to be used as an event that can be counted or
profiled by the perf_events subsystem.
This is done using the DABR (data breakpoint register), which can
also be used for process debugging via ptrace. When perf_event
hw_breakpoint support is configured in, the perf_event subsystem
manages the DABR and arbitrates access to it, and ptrace then
creates a perf_event when it is requested to set a data breakpoint.
[Adopted suggestions from Paul Mackerras <paulus@samba.org> to
- emulate_step() all system-wide breakpoints and single-step only the
per-task breakpoints
- perform arch-specific cleanup before unregistration through
arch_unregister_hw_breakpoint()
]
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is started as swsusp_32.S modifications, but the amount of #ifdefs
made the whole file horribly unreadable, so let's put the support into
its own separate file.
The code should be relatively easy to modify to support 44x BookEs as
well, but since I don't have any 44x to test, let's confine the code to
FSL BookE. (The only FSL-specific part so far is 'flush_dcache_L1'.)
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This implements perf_event support for the Freescale embedded performance
monitor, based on the existing perf_event.c that supports server/classic
chips.
Some limitations:
- Performance monitor interrupts are regular EE interrupts, and thus you
can't profile places with interrupts disabled. We may want to implement
soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs.
- When trying to schedule multiple event groups at once, and using
restricted events, situations could arise where scheduling fails even
though it would be possible. Consider three groups, each with two events.
One group has restricted events, the others don't. The two non-restricted
groups are scheduled, then one is removed, which happens to occupy the two
counters that can't do restricted events. The remaining non-restricted
group will not be moved to the non-restricted-capable counters to make
room if the restricted group tries to be scheduled.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
It's also useful for software events, as well as future support for
other types of hardware counters.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The CHRP code has some fishy timer based code to scan the RTAS event
log, which uses a 1KB stack buffer and doesn't even use the results.
The pSeries code as a nicer daemon that allows userspace to read the
event log and basically uses the same RTAS interface
This patch moves rtasd.c out of platform/pseries and makes it usable
by CHRP, after removing the old crufty event log mechanism in there.
The nvram logging part of the daemon is still only available on 64-bit
since the underlying nvram management routines aren't currently shared.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of the stuff in /proc/ppc64 such as the RTAS bits are actually
useful to some 32-bit platforms. Rename the file, and create a
symlink on 64-bit for backward compatibility
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The PCI device tree scanning code in pci_64.c is some useful functionality.
It allows PCI devices to be described in the device tree instead of being
probed for, which in turn allows pci devices to use all of the device tree
facilities to describe complex PCI bus architectures like GPIO and IRQ
routing (perhaps not a common situation for desktop or server systems,
but useful for embedded systems with on-board PCI devices).
This patch moves the device tree scanning into pci-common.c so it is
available for 32-bit powerpc machines too.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make it possible to enable GCOV code coverage measurement on powerpc.
Lightly tested on 64-bit, seems to work as expected.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This contains all the bits that didn't fit in previous patches :-) This
includes the actual exception handlers assembly, the changes to the
kernel entry, other misc bits and wiring it all up in Kconfig.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds support for tracing callchains for powerpc, both 32-bit
and 64-bit, and both in the kernel and userspace, from PMU interrupt
context.
The first three entries stored for each callchain are the NIP (next
instruction pointer), LR (link register), and the contents of the LR
save area in the second stack frame (the first is ignored because the
ABI convention on powerpc is that functions save their return address
in their caller's stack frame). Because leaf functions don't have to
save their return address (LR value) and don't have to establish a
stack frame, it's possible for either or both of LR and the second
stack frame's LR save area to have valid return addresses in them.
This is basically impossible to disambiguate without either reading
the code or looking at auxiliary information such as CFI tables.
Since we don't want to do either of those things at interrupt time,
we store both LR and the second stack frame's LR save area.
Once we get past the second stack frame, there is no ambiguity; all
return addresses we get are reliable.
For kernel traces, we check whether they are valid kernel instruction
addresses and store zero instead if they are not (rather than
omitting them, which would make it impossible for userspace to know
which was which). We also store zero instead of the second stack
frame's LR save area value if it is the same as LR.
For kernel traces, we check for interrupt frames, and for user traces,
we check for signal frames. In each case, since we're starting a new
trace, we store a PERF_CONTEXT_KERNEL/USER marker so that userspace
knows that the next three entries are NIP, LR and the second stack frame
for the interrupted context.
We read user memory with __get_user_inatomic. On 64-bit, if this
PMU interrupt occurred while interrupts are soft-disabled, and
there is no MMU hash table entry for the page, we will get an
-EFAULT return from __get_user_inatomic even if there is a valid
Linux PTE for the page, since hash_page isn't reentrant. Thus we
have code here to read the Linux PTE and access the page via the
kernel linear mapping. Since 64-bit doesn't use (or need) highmem
there is no need to do kmap_atomic. On 32-bit, we don't do soft
interrupt disabling, so this complication doesn't occur and there
is no need to fall back to reading the Linux PTE, since hash_page
(or the TLB miss handler) will get called automatically if necessary.
Note that we cannot get PMU interrupts in the interval during
context switch between switch_mm (which switches the user address
space) and switch_to (which actually changes current to the new
process). On 64-bit this is because interrupts are hard-disabled
in switch_mm and stay hard-disabled until they are soft-enabled
later, after switch_to has returned. So there is no possibility
of trying to do a user stack trace when the user address space is
not current's address space.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...
This adds support for the performance monitor hardware on the
MPC7450 family of processors (7450, 7451, 7455, 7447/7457, 7447A,
7448), used in the later Apple G4 powermacs/powerbooks and other
machines. These machines have 6 hardware counters with a unique
set of events which can be counted on each counter, with some
events being available on multiple counters.
Raw event codes for these processors are (PMC << 8) + PMCSEL.
If PMC is non-zero then the event is that selected by the given
PMCSEL value for that PMC (hardware counter). If PMC is zero
then the event selected is one of the low-numbered ones that are
common to several PMCs. In this case PMCSEL must be <= 22 and
the event is what that PMCSEL value would select on PMC1 (but
it may be placed any other PMC that has the same event for that
PMCSEL value).
For events that count cycles or occurrences that exceed a threshold,
the threshold requested can be specified in the 0x3f000 bits of the
raw event codes. If the event uses the threshold multiplier bit
and that bit should be set, that is indicated with the 0x40000 bit
of the raw event code.
This fills in some of the generic cache events. Unfortunately there
are quite a few blank spaces in the table, partly because these
processors tend to count cache hits rather than cache accesses.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55631.802122.696927@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This enables the perf_counter subsystem on 32-bit powerpc. Since we
don't have any support for hardware counters on 32-bit powerpc yet,
only software counters can be used.
Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as
64-bit, the main thing this does is add an implementation of
set_perf_counter_pending(). This needs to arrange for
perf_counter_do_pending() to be called when interrupts are enabled.
Rather than add code to local_irq_restore as 64-bit does, the 32-bit
set_perf_counter_pending() generates an interrupt by setting the
decrementer to 1 so that a decrementer interrupt will become pending
in 1 or 2 timebase ticks (if a decrementer interrupt isn't already
pending). When interrupts are enabled, timer_interrupt() will be
called, and some new code in there calls perf_counter_do_pending().
We use a per-cpu array of flags to indicate whether we need to call
perf_counter_do_pending() or not.
This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT,
which is selected by processor families for which we have hardware PMU
support (currently only PPC64), and PPC_PERF_CTRS, which enables the
powerpc-specific perf_counter back-end.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linuxppc-dev@ozlabs.org
Cc: benh@kernel.crashing.org
LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the option to build the code under arch/powerpc with -Werror.
The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.
The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.
The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.
It's not enabled for math-emu, which is a steaming pile of warnings.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 28794d34 ("powerpc/kconfig: Kill PPC_MULTIPLATFORM"), added
CONFIG_PPC_OF_BOOT_TRAMPOLINE to control the buliding of prom_init.o
However the Makefile still unconditionally builds prom_init_check,
the script that checks prom_init.o for symbol usage, and so in turn
prom_init.o is still always being built. (it's not linked though)
So surround all the prom_init_check logic with an ifeq block testing
if CONFIG_PPC_OF_BOOT_TRAMPOLINE is set.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the back-end for the PMU on POWER7 processors. POWER7
has 4 fully-programmable counters and two fixed-function counters
(which do respect the freeze conditions, can generate interrupts,
and are writable, unlike PMC5/6 on POWER5+/6).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <18992.36329.189378.17992@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch includes the basic infrastructure to use swiotlb
bounce buffering on 32-bit powerpc. It is not yet enabled on
any platforms. Probably the most interesting bit is the
addition of addr_needs_map to dma_ops - we need this as
a dma_op because the decision of whether or not an addr
can be mapped by a device is device-specific.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, load_up_altivec and give_up_altivec are duplicated
in 32-bit and 64-bit. This creates a common implementation that
is moved away from head_32.S, head_64.S and misc_64.S and into
vector.S, using the same macros we already use for our common
implementation of load_up_fpu.
I also moved the VSX code over to vector.S though in that case
I didn't make it build on 32-bit (yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge reason: we have gathered quite a few conflicts, need to merge upstream
Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
This removes it along with the following changes:
- 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
on 6xx which is what they want anyway.
- A new symbol, PPC_BOOK3S, is defined that represent compliance with
the "Server" variant of the architecture. This is set when either 6xx
or PPC64 is set and open the door for future BOOK3E 64-bit.
- 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
PPC64 && PPC_BOOK3S
- A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
used to control the use of prom_init.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: more hardware support
This adds the back-end for the PMU on the POWER4 and POWER4+ processors
(GP and GQ). This is quite similar to the PPC970, with 8 PMCs, but has
fewer events than the PPC970.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Impact: more hardware support
This adds the back-end for the PMU on the POWER5+ processors (i.e. GS,
including GS DD3 aka POWER5++). This doesn't use the fixed-function
PMC5 and PMC6 since they don't respect the freeze conditions and don't
generate interrupts, as on POWER6.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the back-end for the PMU on the POWER5 processor. This knows
how to use the fixed-function PMC5 and PMC6 (instructions completed and
run cycles). Unlike POWER6, PMC5/6 obey the freeze conditions and can
generate interrupts, so their use doesn't impose any extra restrictions.
POWER5+ is different and is not supported by this patch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a port of the function graph tracer that was written by
Frederic Weisbecker for the x86.
This only works for PPC64 at the moment and only for static tracing.
PPC32 and dynamic function graph tracing support will come later.
The trace produces a visual calling of functions:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) 2.224 us | }
0) ! 271.024 us | }
0) ! 320.080 us | }
0) ! 324.656 us | }
0) ! 329.136 us | }
0) | .put_prev_task_fair() {
0) | .update_curr() {
0) 2.240 us | .update_min_vruntime();
0) 6.512 us | }
0) 2.528 us | .__enqueue_entity();
0) + 15.536 us | }
0) | .pick_next_task_fair() {
0) 2.032 us | .__pick_next_entity();
0) 2.064 us | .__clear_buddies();
0) | .set_next_entity() {
0) 2.672 us | .__dequeue_entity();
0) 6.864 us | }
Geoff Lavand tested on PS3.
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We currently have a few variants of fsl-booke processors (e500v1, e500v2,
e500mc, and e200). They all have minor differences that we had previously
been handling via ifdefs.
To move towards having this support the following changes have been made:
* PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on
e500mc or e200. We use MMUCFG[NPIDS] to determine which case we are
since we only touch PID1/2 in extremely early init code.
* Not all IVORs exist on all the processors so introduce cpu_setup
functions for each variant to setup the proper IVORs that are either
unique or exist but have some variations between the processors
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This adds the back-end for the PMU on the POWER6 processor.
Fortunately, the event selection hardware is somewhat simpler on
POWER6 than on other POWER family processors, so the constraints
fit into only 32 bits.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the back-end for the PMU on the PPC970 family.
The PPC970 allows events from the ISU to be selected in two different
ways. Rather than use alternative event codes to express this, we
instead use a single encoding for ISU events and express the
resulting constraint (that you can't select events from all three
of FPU/IFU/VPU, ISU and IDU/STS at the same time, since they all come
in through only 2 multiplexers) using a NAND constraint field, and
work out which multiplexer is used for ISU events at compute_mmcr
time.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This provides the architecture-specific functions needed to access
PMU hardware on the 64-bit PowerPC processors. It has been designed
for the IBM POWER family (POWER 4/4+/5/5+/6 and PPC970) but will
hopefully also suit other 64-bit PowerPC machines (although probably
not Cell given how different it is in this area). This doesn't
include back-ends for any specific processors.
This implements a system which allows back-ends to express the
constraints that their hardware has on what events can be counted
simultaneously. The constraints are expressed as a 64-bit mask +
64-bit value for each event, and the encoding is capable of
expressing the constraints arising from having a set of multiplexers
feeding an event bus, with some events being available through
multiple multiplexer settings, such as we get on POWER4 and PPC970.
Furthermore, the back-end can supply alternative event codes for
each event, and the constraint checking code will try all possible
combinations of alternative event codes to try to find a combination
that will fit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current code for providing processor cache information in sysfs
has the following deficiencies:
- several complex functions that are hard to understand
- implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release)
- explicit recursion (create_cache_index_info)
- use of two per-cpu arrays when one would suffice
- duplication of work on systems where CPUs share cache
Also, when I looked at implementing support for a shared_cpu_map
attribute, it was pretty much impossible to handle hotplug without
checking every single online CPU's cache_desc list and fixing things
up... not that this is a hot path, but it would have introduced
O(n^2)-ish behavior during boot. Addressing this involved rethinking
the core data structures used, which didn't lend itself to an
incremental approach.
This implementation maintains a "forest" (potentially more than one
tree) of cache objects which reflects the system's cache topology.
Cache objects are instantiated as needed as CPUs come online. A
per-cpu array is used mainly for sysfs-related bookkeeping; the
objects in the array just point to the appropriate points in the
forest.
This maintains compatibility with the existing code and includes some
enhancements:
- Implement the shared_cpu_map attribute, which is essential for
enabling userspace to discover the system's overall cache topology.
- Use cache-block-size properties if cache-line-size is not available.
I chose to place this implementation in a new file since it would have
roughly doubled the size of sysfs.c, which is already kind of messy.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
powerpc/44x: Support 16K/64K base page sizes on 44x
powerpc: Force memory size to be a multiple of PAGE_SIZE
powerpc/32: Wire up the trampoline code for kdump
powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
powerpc/32: Setup OF properties for kdump
powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
powerpc: Prepare xmon_save_regs for use with kdump
powerpc: Remove default kexec/crash_kernel ops assignments
powerpc: Make default kexec/crash_kernel ops implicit
powerpc: Setup OF properties for ppc32 kexec
powerpc/pseries: Fix cpu hotplug
powerpc: Fix KVM build on ppc440
powerpc/cell: add QPACE as a separate Cell platform
powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
powerpc/mpc5200: fix error paths in PSC UART probe function
powerpc/mpc5200: add rts/cts handling in PSC UART driver
powerpc/mpc5200: Make PSC UART driver update serial errors counters
powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
...
Fix trivial conflict in drivers/char/Makefile as per Paul's directions