Based on patch by David Gibson <dwg@au1.ibm.com>
xmon has a longstanding bug on systems which are SMP-capable but lack
the MSR[RI] bit. In these cases, xmon invoked by IPI on secondary
CPUs will not properly keep quiet, but will print stuff, thereby
garbling the primary xmon's output. This patch fixes it, by ignoring
the RI bit if the processor does not support it.
There's already a version of this for 4xx upstream, which we'll need
to extend to other RI-lacking CPUs at some point. For now this adds
Book3e processors to the mix.
Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.
Fix these broken references, sometimes by dropping the irrelevant text
they were part of.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We don't want to configure PCI Express Max Payload Size or
Max Read Request Size on systems that set that flag. The
firmware will have done it for us, and under hypervisors such
as pHyp we don't even see the parent switches and bridges and
thus can make no assumption on what values are safe to use.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.
This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.
This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.
This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.
Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.
Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.
Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest. We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.
kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.
This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt. Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on. The handler will then return to the virtual-mode code
(potentially in the module).
Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.
The net result is that we no longer need any code addresses to be
stored in vcpu->arch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This makes arch/powerpc/kvm/book3s_rmhandlers.S and
arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
separate compilation units rather than having them #included in
arch/powerpc/kernel/exceptions-64s.S. We no longer have any
conditional branches between the exception prologs in
exceptions-64s.S and the KVM handlers, so there is no need to
keep their contents close together in the vmlinux image.
In their current location, they are using up part of the limited
space between the first-level interrupt handlers and the firmware
NMI data area at offset 0x7000, and with some kernel configurations
this area will overflow (e.g. allyesconfig), leading to an
"attempt to .org backwards" error when compiling exceptions-64s.S.
Moving them out requires that we add some #includes that the
book3s_{,hv_}rmhandlers.S code was previously getting implicitly
via exceptions-64s.S.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
There are multiple features in PowerPC KVM that can now be enabled
depending on the user's wishes. Some of the combinations don't make
sense or don't work though.
So this patch adds a way to check if the executing environment would
actually be able to run the guest properly. It also adds sanity
checks if PVR is set (should always be true given the current code
flow), if PAPR is only used with book3s_64 where it works and that
HV KVM is only used in PAPR mode.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
mode.
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR defines hypercalls as SC1 instructions. Using these, the guest modifies
page tables and does other privileged operations that it wouldn't be allowed
to do in supervisor mode.
This patch adds support for PR KVM to trap these instructions and route them
through the same PAPR hypercall interface that we already use for HV style
KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent Linux versions use the CFAR and PURR SPRs, but don't really care about
their contents (yet). So for now, we can simply return 0 when the guest wants
to read them.
Signed-off-by: Alexander Graf <agraf@suse.de>
When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).
So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
v1 -> v2:
- whitespace fix
Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.
We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a few traps where we cache the instruction that cause the trap
for analysis later on. Since we now need to be able to distinguish
between SC 0 and SC 1 system calls and the only way to find out which
is which is by looking at the instruction, we also read out the instruction
causing the system call.
Signed-off-by: Alexander Graf <agraf@suse.de>
When running a PAPR guest, the guest is not allowed to set SDR1 - instead
the HTAB information is held in internal hypervisor structures. But all of
our current code relies on SDR1 and walking the HTAB like on real hardware.
So in order to not be too intrusive, we simply set SDR1 to the HTAB we hold
in host memory. That way we can keep the HTAB in user space, but use it from
kernel space to map the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have 3 privilege levels: problem state, supervisor state and hypervisor
state. Each of them can access different SPRs, so we need to check on every
SPR if it's accessible in the respective mode.
Signed-off-by: Alexander Graf <agraf@suse.de>
When running a PAPR guest, some things change. The privilege level drops
from hypervisor to supervisor, SDR1 gets treated differently and we interpret
hypercalls. For bisectability sake, add the flag now, but only enable it when
all the support code is there.
Signed-off-by: Alexander Graf <agraf@suse.de>
We need the compute_tlbie_rb in _pr and _hv implementations for papr
soon, so let's move it over to a common header file that both
implementations can leverage.
Signed-off-by: Alexander Graf <agraf@suse.de>
Some devices have a dma-window that starts at the address 0. This allows
DMA addresses to be mapped to this address and returned to drivers as a
valid DMA address. Some drivers may not behave well in this case, since
the address 0 is considered an error or not allocated.
The solution to avoid this kind of error from happening is reserve the
page addressed as 0 so it cannot be allocated for a DMA mapping.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 41151e77a4 ("powerpc: Hugetlb for BookE") added some
#ifdef CONFIG_MM_SLICES conditionals to hugetlb_get_unmapped_area()
and vma_mmu_pagesize(). Unfortunately this is not the correct config
symbol; it should be CONFIG_PPC_MM_SLICES. The result is that
attempting to use hugetlbfs on 64-bit Power server processors results
in an infinite stack recursion between get_unmapped_area() and
hugetlb_get_unmapped_area().
This fixes it by changing the #ifdef to use CONFIG_PPC_MM_SLICES
in those functions and also in book3e_hugetlb_preload().
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Activate all MPC512x related boards. Also enable GPIO-driver, SPI driver
and at25 to test SPI. Enable DEVTMPFS. Bump to 3.1-rc6.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Move the driver to the place where it is expected to be nowadays. Also
rename its CONFIG-name to match the rest and adapt the defconfigs.
Finally, move selection of REQUIRE_GPIOLIB or WANTS_OPTIONAL_GPIOLIB to
the platforms, because this option is per-platform and not per-driver.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Audio support for the MPC5200 exists, so enable it by default.
Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
timer0 and timer1 pins are used as simple GPIO on this board.
Add gpio-controller and #gpio-cells properties to timer nodes
so that we can control gpio lines using available MPC52xx
GPT driver.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Add new nodes to describe more hardware the board is
equipped with:
- two can nodes for SJA1000 on localbus
- pci node to support Coral-PA graphics controller
- serial node for SC28L92 DUART on localbus
- spi node for MSP430 device
Also correct i2c eeprom node name.
Signed-off-by: Heiko Schocher <hs@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
perf events, powerpc: Add POWER7 stalled-cycles-frontend/backend events
Extent the POWER7 PMU driver with definitions for generic front-end and back-end
stall events.
As explained in Ingo's original comment(8f62242246
), the exact definitions of the stall events are very much processor specific as
different things mean different in their respective instruction pipeline. These
two Power7 raw events are the closest approximation to the concept detailed in
Ingo's comment.
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */
It means cycles when the Global Completion Table has no slots from this thread
[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a, /* CMPLU_STALL */
It means no groups completed and GCT not empty for this thread
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The firmware doesn't wait after lifting the PCI reset. However it does
timestamp it in the device tree. We use that to ensure we wait long
enough (3s is our current arbitrary setting) from that timestamp to
actually probing the bus.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This implements support for MSIs on p5ioc2 PHBs. We only support
MSIs on the PCIe PHBs, not the PCI-X ones as the later hasn't been
properly verified in HW.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds support for PCI-X and PCIe on the p5ioc2 IO hub using
OPAL. This includes allocating & setting up TCE tables and config
space access routines.
This also supports fallbacks via RTAS when OPAL is absent, using
legacy TCE format pre-allocated via the device-tree (BML style)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL can handle various interrupt for us such as Machine Checks (it
performs all sorts of recovery tasks and passes back control to us with
informations about the error), Hardware Management Interrupts and Softpatch
interrupts.
This wires up the mechanisms and prints out specific informations returned
by HAL when a machine check occurs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We do the minimum which is to "pass" interrupts to HAL, which
makes the console smoother and will allow us to implement
interrupt based completion and console.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL handles HW access to the various ICS or equivalent chips
for us (with the exception of p5ioc2 based HEA which uses a
different backend) similarily to what RTAS does on pSeries.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implements OPAL RTC and NVRAM support and wire all that up to
the powernv platform.
We use RTAS for RTC as a fallback if available. Using RTAS for nvram
is not supported yet, pending some rework/cleanup and generalization
of the pSeries & CHRP code. We also use RTAS fallbacks for power off
and reboot
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This calls the respective HAL functions, and spin on hal_poll_event()
to ensure the HAL has a chance to communicate with the FSP to trigger
the reboot or shutdown operation
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a udbg and an hvc console backend for supporting a console
using the OPAL console interfaces.
On OPAL v1 we have hvc0 mapped to whatever console the system was
configured for (network or hvsi serial port) via the service
processor.
On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
which generally corresponds to:
hvc0 : network console (raw protocol)
hvc1 : serial port S1 (hvsi)
hvc2 : serial port S2 (hvsi)
Note: At this point, early debug console only works with OPAL v1
and shouldn't be enabled in a normal kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL v2 is instantiated in a way similar to RTAS using Open Firmware
client interface calls, and the resulting address and entry point are
put in the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add definition of OPAL interfaces along with the wrappers to call
into OPAL runtime and the early device-tree parsing hook to locate
the OPAL runtime firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We stash it in boot_command_line which isn't in BSS and so won't
be overwritten. We then use that as a default cmd_line before
we walk the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On machines supporting the OPAL firmware version 1, the system
is initially booted under pHyp. We then use a special hypercall
to verify if OPAL is available and if it is, we then trigger
a "takeover" which disables pHyp and loads the OPAL runtime
firmware, giving control to the kernel in hypervisor mode.
This patch add the necessary code to detect that the OPAL takeover
capability is present when running under PowerVM (aka pHyp) and
perform said takeover to get hypervisor control of the processor.
To perform the takeover, we must first use RTAS (within Open
Firmware runtime environment) to start all processors & threads,
in order to give control to OPAL on all of them. We then call
the takeover hypercall on everybody, OPAL will re-enter the kernel
main entry point passing it a flat device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a skeletton for the new Power "Non Virtualized"
platform which will be used by machines supporting running
without an hypervisor, for example in order to run KVM.
These machines will be using a new firmware called OPAL
for which the support will be provided by later patches.
The PowerNV platform is intended to be also usable under
the BML environment used internally for early CPU bringup
which is why the code also supports using RTAS instead of
OPAL in various places.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With OPAL, r8 and r9 will be used to pass the OPAL base and entry
for debugging purposes (those informations are also in the
device-tree). We don't want to clobber those registers that
early.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This new function is used to properly setup the PCI Express Max Payload Size
(and in some circumstances Max Read Request Size).
Some systems will not operate properly if these aren't set correctly and
the firmware doesn't always do it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds more generic support for doing CPU hotplug with a simple
idle loop and no actual reset of the processors. The generic
smp_generic_kick_cpu() does the hotplug bringup trick if the PACA
shows that the CPU has already been started at boot and we provide
an accessor for the CPU state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It was preventing the global early debug selection whenever KVM was enabled
instead of only preventing the 440 specific one.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The icswx code introduced an A-B B-A deadlock:
CPU0 CPU1
---- ----
lock(&anon_vma->mutex);
lock(&mm->mmap_sem);
lock(&anon_vma->mutex);
lock(&mm->mmap_sem);
Instead of using the mmap_sem to keep mm_users constant, take the
page table spinlock.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we echo an address the hypervisor doesn't like to
/sys/devices/system/memory/probe we oops the box:
# echo 0x10000000000 > /sys/devices/system/memory/probe
kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!
The backtrace is:
create_section_mapping
arch_add_memory
add_memory
memory_probe_store
sysdev_class_store
sysfs_write_file
vfs_write
SyS_write
In create_section_mapping we BUG if htab_bolt_mapping returned
an error. A better approach is to return an error which will
propagate back to userspace.
Rerunning the test with this patch applied:
# echo 0x10000000000 > /sys/devices/system/memory/probe
-bash: echo: write error: Invalid argument
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While converting code to use for_each_node_by_type I noticed a
number of coding style issues.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use for_each_node_by_type instead of open coding it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During memory hotplug testing, I got the following warning:
ERROR: Bad of_node_put() on /memory@0
of_node_release
kref_put
of_node_put
of_find_node_by_type
hot_add_node_scn_to_nid
hot_add_scn_to_nid
memory_add_physaddr_to_nid
...
of_find_node_by_type() loop does the of_node_put for us so we only
need the handle the case where we terminate the loop early.
As suggested by Stephen Rothwell we can do the of_node_put
unconditionally outside of the loop since of_node_put handles a
NULL argument fine.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have two identical definitions of RECLAIM_DISTANCE, looks like
the patch got applied twice. Remove one.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On big POWER7 boxes we see large amounts of CPU time in system
processes like workqueue and watchdog kernel threads.
We currently rebalance the entire machine each time a task goes
idle and this is very expensive on large machines. Disable newidle
balancing at the node level and rely on the scheduler tick to
rebalance across nodes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The largest POWER7 boxes have 32 nodes. SD_NODES_PER_DOMAIN groups
nodes into chunks of 16 and adds a global balancing domain
(SD_ALLNODES) above it.
If we bump SD_NODES_PER_DOMAIN to 32, then we avoid this extra
level of balancing on our largest boxes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When chasing a performance issue on ppc64, I noticed tasks
communicating via a pipe would often end up on different nodes.
It turns out SD_WAKE_AFFINE is not set in our node defition. Commit
9fcd18c9e6 (sched: re-tune balancing) enabled SD_WAKE_AFFINE
in the node definition for x86 and we need a similar change for
ppc64.
I used lmbench lat_ctx and perf bench pipe to verify this fix. Each
benchmark was run 10 times and the average taken.
lmbench lat_ctx:
before: 66565 ops/sec
after: 204700 ops/sec
3.1x faster
perf bench pipe:
before: 5.6570 usecs
after: 1.3470 usecs
4.2x faster
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a new udbg driver for the PS3 gelic Ehthernet device.
This driver shares only a few stucture and constant definitions with the
gelic Ethernet device driver, so is implemented as a stand-alone driver
with no dependencies on the gelic Ethernet device driver.
Signed-off-by: Hector Martin <hector@marcansoft.com>
Signed-off-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since commit 188917e183, /proc/ppc64 is a
symlink to /proc/powerpc/. That means that creating /proc/ppc64/eeh will
end up with a unaccessible file, that is not listed under /proc/powerpc/
and, then, not listed under /proc/ppc64/.
Creating /proc/powerpc/eeh fixes that problem and maintain the
compatibility intended with the ppc64 symlink.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org> [3.x]
This should fix the following warning:
LD arch/powerpc/sysdev/xics/built-in.o
WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mismatch in
reference from the function .icp_native_init() to the function
.init.text:.icp_native_init_one_node()
The function .icp_native_init() references
the function __init .icp_native_init_one_node().
This is often because .icp_native_init lacks a __init
annotation or the annotation of .icp_native_init_one_node is wrong.
icp_native_init() is only referenced in `arch/powerpc/sysdev/xics/xics-common.c'
by xics_init() which is itself marked with __init.
= not built-tested =
Reported-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During hotplug CPU add we get the following error:
Unexpected Error (0) returned from configure-connector
ibm,configure-connector returns 0 for configuration complete, so
catch this and avoid the error.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
In SMP mode, the kernel would produce call trace when resumed
from hibernation. The reason is when the function destroy_context
is called to drop the resuming mm context, the mm->context.active
is 1 which is wrong and should be zero.
We pass the current->active_mm as previous mm context to function
switch_mmu_context to decrease the context.active by 1.
In UP mode, there is no effect.
Signed-off-by: Tang Yuantian <b29983@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The various port_init_hw methods of ppc4xx_pciex_hwops should have been
marked __init and when I added ppc4xx_pciex_port_reset_sdr(), which is
__init. This added many section mismatch warnings like:
WARNING: arch/powerpc/sysdev/built-in.o(.text+0x5c68): Section mismatch in reference from the function ppc440spe_pciex_init_port_hw() to the function .init.text:ppc4xx_pciex_port_reset_sdr()
The function ppc440spe_pciex_init_port_hw() references
the function __init ppc4xx_pciex_port_reset_sdr().
This is often because ppc440spe_pciex_init_port_hw lacks a __init
annotation or the annotation of ppc4xx_pciex_port_reset_sdr is wrong.
Trivial patch to silence those warnings.
Reported-By: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Yours Tony
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on a patch by Michael Ellerman <michael@ellerman.id.au>
Patch was simply forward ported upstream.
Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on a patch by Benjamin Herrenschmidt <benh@kernel.crashing.org>
Modernized and slightly modified to not record erros into the nvram
log since we do not have that device driver just yet.
Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some config selections were applied to the platform (reference board)
when they actuall apply to the chip.
Signed-off-by: Jimi Xenidis <jimix@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At this point, window has not been stored anywhere, so it has to be freed
before leaving the function.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@exists@
local idexpression x;
statement S,S1;
expression E;
identifier fl;
expression *ptr != NULL;
@@
x = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...kfree(x)...+> }
when any
when != true x == NULL
x->fl
...>
(
if (x == NULL) S1
|
if (...) { ... when != x
when forall
(
return \(0\|<+...x...+>\|ptr\);
|
* return ...;
)
}
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
u64 is used rather than phys_addr_t to keep things simple, as
this is called from assembly code.
Update callers to pass a 64-bit address in r3/r4. Other unused
register assignments that were once parameters to machine_init
are dropped.
For FSL BookE, look up the physical address of the device tree from the
effective address passed in r3 by the loader. This is required for
situations where memory does not start at zero (due to AMP or IOMMU-less
virtualization), and thus the IMA doesn't start at zero, and thus the
device tree effective address does not equal the physical address.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Capture more than twice as much text from the printk buffer, and
compress it to fit it in the lnx,oops-log NVRAM partition. You
can view the compressed text using the new (as of July 20) --unzip
option of the nvram command in the powerpc-utils package.
[BenH: Added select of ZLIB_DEFLATE]
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, the build can (very rarely) fail to build because libfdt.h has
not been created or is in the process of being copied.
Signed-off-by: Matthew McClintock <msm@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is one place in the MPIC driver that assumes that the cores are numbered
from 0 to n-1. However, this is not true if the CPUs are not numbered
sequentially. This can happen on a eight-core SOC where cores two and three
are removed in the device tree. So instead of blindly looping, we iterate
over the discovered CPUs and use the SMP ID as the index.
This means that we no longer ask the MPIC how many CPUs there are, so
we also delete mpic->num_cpus.
We also catch if the number of CPUs in the SOC exceeds the number that the
MPIC supports. This should never happen, of course, but it's good to be
sure.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enable hugepages on Freescale BookE processors. This allows the kernel to
use huge TLB entries to map pages, which can greatly reduce the number of
TLB misses and the amount of TLB thrashing experienced by applications with
large memory footprints. Care should be taken when using this on FSL
processors, as the number of large TLB entries supported by the core is low
(16-64) on current processors.
The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
Page sizes larger than the max zone size are called "gigantic" pages and
must be allocated on the command line (and cannot be deallocated).
This is currently only fully implemented for Freescale 32-bit BookE
processors, but there is some infrastructure in the code for
64-bit BooKE.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This iotype is only used by the legacy_serial code in powerpc, so the
code should live there, rather than be compiled in for every 8250
driver.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: linux-serial@vger.kernel.org
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The new get_required_mask hook name is longer than many of but not all
of the prior ops. Tidy the struct initializers to align the equal signs
using the local whitespace.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: benh@kernel.crashing.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that the generic code has dma_map_ops set, instead of having a
messy ifdef & if block in the base dma_get_required_mask hook push
the computation into the dma ops.
If the ops fails to set the get_required_mask hook default to the
width of dma_addr_t.
This also corrects ibmbus ibmebus_dma_supported to require a 64
bit mask. I doubt anything is checking or setting the dma mask on
that bus.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: benh@kernel.crashing.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
uic->lock is protecting the interrupt controller hardware. This lock
can not be preempted on -rt.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Reported-by: Darcy L. Watkins <dwatkins@tranzeo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The hook dma_get_required_mask is supposed to return the mask required
by the platform to operate efficently. The generic version of
dma_get_required_mask in driver/base/platform.c returns a mask based
only on max_pfn. However, this is likely too big for iommu systems
and could be too small for platforms that require a dma offset or have
a secondary window at a high offset.
Override the default, provide a hook in ppc_md used by pseries lpar and
cell, and provide the default answer based on memblock_end_of_DRAM(),
with hooks for get_dma_offset, and provide an implementation for iommu
that looks at the defined table size. Coverting from the end address
to the required bit mask is based on the generic implementation.
The need for this was discovered when the qla2xxx driver switched to
64 bit dma then reverted to 32 bit when dma_get_required_mask said
32 bits was sufficient.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: benh@kernel.crashing.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/p1023rds: Fix the error of bank-width of nor flash
powerpc/85xx: enable caam crypto driver by default
powerpc/85xx: enable the audio drivers in the defconfigs
In the p1023rds, a physical bus of nor flash is 16 bits width.
The bank-width is width (in bytes) of the bus width. So, the
value of bank-width of nor flash is not one, and it should be
two.
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
corenet based SoCs have SEC4 h/w, so enable the SEC4 driver,
caam, and the algorithms it supports, and disable the
SEC2/3 driver, talitos.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Enable the audio drivers in the non-corenet 85xx defconfigs so that audio
is enabled on the Freescale P1022DS reference board.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
These were missed in commit f5b9409973 "All Arch: remove linkage
for sys_nfsservctl system call" due to them having no sys_ prefix
(presumably).
Cc: NeilBrown <neilb@suse.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This bug causes the IECSR register clear failure. In this case, the RETE
(retry error threshold exceeded) interrupt will be generated and cannot be
cleared. So the related ISR may be called persistently.
The RETE bit in IECSR is cleared by writing a 1 to it.
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ePAPR embedded hypervisor specification provides an API for "byte
channels", which are serial-like virtual devices for sending and receiving
streams of bytes. This driver provides Linux kernel support for byte
channels via three distinct interfaces:
1) An early-console (udbg) driver. This provides early console output
through a byte channel. The byte channel handle must be specified in a
Kconfig option.
2) A normal console driver. Output is sent to the byte channel designated
for stdout in the device tree. The console driver is for handling kernel
printk calls.
3) A tty driver, which is used to handle user-space input and output. The
byte channel used for the console is designated as the default tty.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In commit 9aa3283595 (ehea/ibm*: Move the
IBM drivers) the IBM_NEW_EMAC* were renames to IBM_EMAC*
The conversion was incomplete so that even if the driver was added to
the .config it wasn't built, but there were no errors). In this commit
we also update the various defconfigs that use EMAC to use the new
Kconfig symbol, and explicitly add the NET_VENDOR_IBM guard.
We do not explicitly select the Kconfig dependencies, as this would force
EMAC on. Doing it in the defconfig allows more flexibility.
Tested on a canyondlands board.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the p1010 processor to select the flexcan network driver.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>,
Acked-by: Wolfgang Grandegger <wg@grandegger.com>,
Cc: U Bhaskar-B22300 <B22300@freescale.com>
Cc: socketcan-core@lists.berlios.de,
Cc: netdev@vger.kernel.org,
Cc: PPC list <linuxppc-dev@lists.ozlabs.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the documentation of the device-tree binding for
the Flexcan devices on Freescale's PowerPC and ARM cores. Extra
properties are not used by the driver so we are removing them.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>,
Acked-by: Wolfgang Grandegger <wg@grandegger.com>,
Cc: U Bhaskar-B22300 <B22300@freescale.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: socketcan-core@lists.berlios.de,
Cc: netdev@vger.kernel.org,
Cc: PPC list <linuxppc-dev@lists.ozlabs.org>
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a register to the config space for the 460sx. Changes the vc0
detect to a pll detect. maps configuration space to test the link
status. changes the setup to enable gen2 devices to operate at gen2
speeds. fixes mapping that was not correct for the 460sx. added
bit definitions for the OMRxMSKL registers. Removed reserved bit
that was set incorrectly in the OMR2MSKL register.
tested on the 460sx eiger and custom board
Signed-off-by: Ayman El-Khashab <ayman@elkhashab.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
This patch adds kexec support for PPC440 based chipsets. This work is based
on the KEXEC patches for FSL BookE.
The FSL BookE patch and the code flow could be found at the link below:
http://patchwork.ozlabs.org/patch/49359/
Steps:
1) Invalidate all the TLB entries except the one this code is run from
2) Create a tmp mapping for our code in the other address space and jump to it
3) Invalidate the entry we used
4) Create a 1:1 mapping for 0-2GiB in blocks of 256M
5) Jump to the new 1:1 mapping and invalidate the tmp mapping
I have tested this patches on Ebony, Sequoia boards and Virtex on QEMU.
You need kexec-tools commit e8b7939b1e or newer for ppc440x support,
available at:
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
Brown paper bag day, previous commit wouldn't work very well with modules
enabled. Move the exports into the ifdef.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit fea80311a9
"iomap: make IOPORT/PCI mapping functions conditional"
Broke powerpc build without CONFIG_PCI as we would still define
pci_iomap(), which overlaps with the new empty inline in the headers.
Make our implementation conditional on CONFIG_PCI
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 112d1fe9f7
"powerpc/4xx: Add check_link to struct ppc4xx_pciex_hwops" inadvertently
broke 405 builds due to some functions being over protected by an
ifdef CONFIG_44x.
Move them back out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The VPA, SLB shadow and DTL degistration functions do not need an
address, so simplify things and remove it.
Also cleanup pseries_kexec_cpu_down a bit by storing the cpu IDs
in local variables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make the VPA, SLB shadow and DTL registration and deregistration
functions print consistent messages on error. I needed the firmware
error code while chasing a kexec bug but we weren't printing it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent versions of firmware will fail to unmap the virtual processor
area if we have a dispatch trace log registered. This causes kexec
to fail.
If a trace log is registered this patch unregisters it before the
SLB shadow and virtual processor areas, fixing the problem.
The address argument is ignored by firmware on unregister so we
may as well remove it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
KVM_GUEST adds a 1 MB array to the kernel (kvm_tmp) which grew
my kernel enough to cause it to fail to boot.
Dynamically allocating or reducing the size of this array is a
good idea, but in the meantime I think it makes sense to make
KVM_GUEST default to n in order to minimise surprises.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On a box with gcc 4.3.2, I see errors like:
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1254: Error: Unrecognized opcode: stxvd2x
arch/powerpc/kvm/book3s_hv_rmhandlers.S:1316: Error: Unrecognized opcode: lxvd2x
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ibm,io-events code is a bit verbose with its error messages.
Reverse the reporting so we only print when we successfully enable
I/O event interrupts.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are seeing boot failures on some very large boxes even with
commit b5416ca9f8 (powerpc: Move kdump default base address to
64MB on 64bit).
This patch halves the RMO so both kernels get about the same
amount of RMO memory. On large machines this region will be
at least 256MB, so each kernel will get 128MB.
We cap it at 256MB (small SLB size) since some early allocations need
to be in the bolted SLB region. We could relax this on machines with
1TB SLBs in a future patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Panic observed on an older kernel when collecting call chains for
the context-switch software event:
[<b0180e00>]rb_erase+0x1b4/0x3e8
[<b00430f4>]__dequeue_entity+0x50/0xe8
[<b0043304>]set_next_entity+0x178/0x1bc
[<b0043440>]pick_next_task_fair+0xb0/0x118
[<b02ada80>]schedule+0x500/0x614
[<b02afaa8>]rwsem_down_failed_common+0xf0/0x264
[<b02afca0>]rwsem_down_read_failed+0x34/0x54
[<b02aed4c>]down_read+0x3c/0x54
[<b0023b58>]do_page_fault+0x114/0x5e8
[<b001e350>]handle_page_fault+0xc/0x80
[<b0022dec>]perf_callchain+0x224/0x31c
[<b009ba70>]perf_prepare_sample+0x240/0x2fc
[<b009d760>]__perf_event_overflow+0x280/0x398
[<b009d914>]perf_swevent_overflow+0x9c/0x10c
[<b009db54>]perf_swevent_ctx_event+0x1d0/0x230
[<b009dc38>]do_perf_sw_event+0x84/0xe4
[<b009dde8>]perf_sw_event_context_switch+0x150/0x1b4
[<b009de90>]perf_event_task_sched_out+0x44/0x2d4
[<b02ad840>]schedule+0x2c0/0x614
[<b0047dc0>]__cond_resched+0x34/0x90
[<b02adcc8>]_cond_resched+0x4c/0x68
[<b00bccf8>]move_page_tables+0xb0/0x418
[<b00d7ee0>]setup_arg_pages+0x184/0x2a0
[<b0110914>]load_elf_binary+0x394/0x1208
[<b00d6e28>]search_binary_handler+0xe0/0x2c4
[<b00d834c>]do_execve+0x1bc/0x268
[<b0015394>]sys_execve+0x84/0xc8
[<b001df10>]ret_from_syscall+0x0/0x3c
A page fault occurred walking the callchain while creating a perf
sample for the context-switch event. To handle the page fault the
mmap_sem is needed, but it is currently held by setup_arg_pages.
(setup_arg_pages calls shift_arg_pages with the mmap_sem held.
shift_arg_pages then calls move_page_tables which has a cond_resched
at the top of its for loop - hitting that cond_resched is what caused
the context switch.)
This is an extension of Anton's proposed patch:
https://lkml.org/lkml/2011/7/24/151
adding case for 32-bit ppc.
Tested on the system that first generated the panic and then again
with latest kernel using a PPC VM. I am not able to test the 64-bit
path - I do not have H/W for it and 64-bit PPC VMs (qemu on Intel)
is horribly slow.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
One definition of PV_POWER7 seems enough to me.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On a box with 8TB of RAM the MMU hashtable is 64GB in size. That
means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed
int to store the index which will overflow at 2G.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I hit an oops at boot on the first instruction of timer_cpu_notify:
NIP [c000000000722f88] .timer_cpu_notify+0x0/0x388
The code should look like:
c000000000722f78: eb e9 00 30 ld r31,48(r9)
c000000000722f7c: 2f bf 00 00 cmpdi cr7,r31,0
c000000000722f80: 40 9e ff 44 bne+ cr7,c000000000722ec4
c000000000722f84: 4b ff ff 74 b c000000000722ef8
c000000000722f88 <.timer_cpu_notify>:
c000000000722f88: 7c 08 02 a6 mflr r0
c000000000722f8c: 2f a4 00 07 cmpdi cr7,r4,7
c000000000722f90: fb c1 ff f0 std r30,-16(r1)
c000000000722f94: fb 61 ff d8 std r27,-40(r1)
But the oops output shows:
eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 7c0803a6 ebe1fff8 4e800020
00000000 ebe90030 c0000000 00ad0a28 00000000 2fa40007 fbc1fff0 fb61ffd8
So we scribbled over our instructions with c000000000ad0a28, which
is an address inside the jump_table ELF section.
It turns out the jump_table section is only aligned to 8 bytes but
we are aligning our entries within the section to 16 bytes. This
means our entries are offset from the table:
c000000000acd4a8 <__start___jump_table>:
...
c000000000ad0a10: c0 00 00 00 lfs f0,0(0)
c000000000ad0a14: 00 70 cd 5c .long 0x70cd5c
c000000000ad0a18: c0 00 00 00 lfs f0,0(0)
c000000000ad0a1c: 00 70 cd 90 .long 0x70cd90
c000000000ad0a20: c0 00 00 00 lfs f0,0(0)
c000000000ad0a24: 00 ac a4 20 .long 0xaca420
And the jump table sort code gets very confused and writes into the
wrong spot. Remove the alignment, and also remove the padding since
we it saves some space and we shouldn't need it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a newline to the panic messages in make_room. Also fix a
comment that suggested our chunk size is 4Mb. It's 1MB.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have a box that fails in OF during boot with:
DEFAULT CATCH!, exception-handler=fff00400
at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002
ie "IBM,Logh". OF got corrupted with a device tree string.
Looking at make_room and alloc_up, we claim the first chunk (1 MB)
but we never claim any more. mem_end is always set to alloc_top
which is the top of our available address space, guaranteeing we will
never call alloc_up and claim more memory.
Also alloc_up wasn't setting alloc_bottom to the bottom of the
available address space.
This doesn't help the box to boot, but we at least fail with
an obvious error. We could relocate the device tree in a future
patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit af9eef3c7b caused cpu_setup to see
the_cpu_spec, rather than the source struct. However, on 32-bit, the
return value of identify_cpu was being used for feature fixups, and
identify_cpu was returning the source struct. So if cpu_setup patches
the feature bits, the update won't affect the fixups.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a cast in case the caller passes in a different type, as it would
if mtspr/mtmsr were functions.
Previously, if a 64-bit type was passed in on 32-bit, GCC would bind the
constraint to a pair of registers, and would substitute the first register
in the pair in the asm code. This corresponds to the upper half of the
64-bit register, which is generally not the desired behavior.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some trivial conflicts due to other various merges
adding to the end of common lists sooner than this one.
arch/ia64/Kconfig
arch/powerpc/Kconfig
arch/x86/Kconfig
lib/Kconfig
lib/Makefile
Signed-off-by: Len Brown <len.brown@intel.com>
cmpxchg() is widely used by lockless code, including NMI-safe lockless
code. But on some architectures, the cmpxchg() implementation is not
NMI-safe, on these architectures the lockless code may need a
spin_trylock_irqsave() based implementation.
This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
NMI-safe lockless code can depend on it or provide different
implementation according to it.
On many architectures, cmpxchg is only NMI-safe for several specific
operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Richard Henderson <rth@twiddle.net>
CC: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
CC: Yoshinori Sato <ysato@users.sourceforge.jp>
CC: Tony Luck <tony.luck@intel.com>
CC: Hirokazu Takata <takata@linux-m32r.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Michal Simek <monstr@monstr.eu>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
CC: Kyle McMartin <kyle@mcmartin.ca>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Chen Liqin <liqin.chen@sunplusct.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Chris Zankel <chris@zankel.net>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
ARM: Consolidate the clkdev header files
ARM: set vga memory base at run-time
ARM: convert PCI defines to variables
ARM: pci: make pcibios_assign_all_busses use pci_has_flag
ARM: remove unnecessary mach/hardware.h includes
pci: move microblaze and powerpc pci flag functions into asm-generic
powerpc: rename ppc_pci_*_flags to pci_*_flags
Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h
After changing all consumers of atomics to include <linux/atomic.h>, we
ran into some compile time errors due to this dependency chain:
linux/atomic.h
-> asm/atomic.h
-> asm-generic/atomic-long.h
where atomic-long.h could use funcs defined later in linux/atomic.h
without a prototype. This patches moves the code that includes
asm-generic/atomic*.h to linux/atomic.h.
Archs that need <asm-generic/atomic64.h> need to select
CONFIG_GENERIC_ATOMIC64 from now on (some of them used to include it
unconditionally).
Compile tested on i386 and x86_64 with allnoconfig.
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is in preparation for more generic atomic primitives based on
__atomic_add_unless.
Signed-off-by: Arun Sharma <asharma@fb.com>
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The majority of architectures implement ext2 atomic bitops as
test_and_{set,clear}_bit() without spinlock.
This adds this type of generic implementation in ext2-atomic-setbit.h and
use it wherever possible.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ poleg@redhat.com: no need to declare show_regs() in ptrace.h, sched.h does this ]
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits)
drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c
powerpc/85xx: fix mpic configuration in CAMP mode
powerpc: Copy back TIF flags on return from softirq stack
powerpc/64: Make server perfmon only built on ppc64 server devices
powerpc/pseries: Fix hvc_vio.c build due to recent changes
powerpc: Exporting boot_cpuid_phys
powerpc: Add CFAR to oops output
hvc_console: Add kdb support
powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon
powerpc/irq: Quieten irq mapping printks
powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs
powerpc: Add mpt2sas driver to pseries and ppc64 defconfig
powerpc: Disable IRQs off tracer in ppc64 defconfig
powerpc: Sync pseries and ppc64 defconfigs
powerpc/pseries/hvconsole: Fix dropped console output
hvc_console: Improve tty/console put_chars handling
powerpc/kdump: Fix timeout in crash_kexec_wait_realmode
powerpc/mm: Fix output of total_ram.
powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards
powerpc: Correct annotations of pmu registration functions
...
Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and
drivers/cpufreq
* Merge akpm patch series: (122 commits)
drivers/connector/cn_proc.c: remove unused local
Documentation/SubmitChecklist: add RCU debug config options
reiserfs: use hweight_long()
reiserfs: use proper little-endian bitops
pnpacpi: register disabled resources
drivers/rtc/rtc-tegra.c: properly initialize spinlock
drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
drivers/rtc: add support for Qualcomm PMIC8xxx RTC
drivers/rtc/rtc-s3c.c: support clock gating
drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
init: skip calibration delay if previously done
misc/eeprom: add eeprom access driver for digsy_mtc board
misc/eeprom: add driver for microwire 93xx46 EEPROMs
checkpatch.pl: update $logFunctions
checkpatch: make utf-8 test --strict
checkpatch.pl: add ability to ignore various messages
checkpatch: add a "prefer __aligned" check
checkpatch: validate signature styles and To: and Cc: lines
checkpatch: add __rcu as a sparse modifier
checkpatch: suggest using min_t or max_t
...
Did this as a merge because of (trivial) conflicts in
- Documentation/feature-removal-schedule.txt
- arch/xtensa/include/asm/uaccess.h
that were just easier to fix up in the merge than in the patch series.
It is not necessary to share the same notifier.h.
This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.
[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...
Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
This patch removes all the module loader hook implementations in the
architecture specific code where the functionality is the same as that
now provided by the recently added default hooks.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio has been so far used only in the context of virtualization,
and the virtio Kconfig was sourced directly by the relevant arch
Kconfigs when VIRTUALIZATION was selected.
Now that we start using virtio for inter-processor communications,
we need to source the virtio Kconfig outside of the virtualization
scope too.
Moreover, some architectures might use virtio for both virtualization
and inter-processor communications, so directly sourcing virtio
might yield unexpected results due to conflicting selections.
The simple solution offered by this patch is to always source virtio's
Kconfig in drivers/Kconfig, and remove it from the appropriate arch
Kconfigs. Additionally, a virtio menu entry has been added so virtio
drivers don't show up in the general drivers menu.
This way anyone can use virtio, though it's arguably less accessible
(and neat!) for virtualization users now.
Note: some architectures (mips and sh) seem to have a VIRTUALIZATION
menu merely for sourcing virtio's Kconfig, so that menu is removed too.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
isofs: Remove global fs lock
jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
mm/truncate.c: fix build for CONFIG_BLOCK not enabled
fs:update the NOTE of the file_operations structure
Remove dead code in dget_parent()
AFS: Fix silly characters in a comment
switch d_add_ci() to d_splice_alias() in "found negative" case as well
simplify gfs2_lookup()
jfs_lookup(): don't bother with . or ..
get rid of useless dget_parent() in btrfs rename() and link()
get rid of useless dget_parent() in fs/btrfs/ioctl.c
fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
drivers: fix up various ->llseek() implementations
fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
Ext4: handle SEEK_HOLE/SEEK_DATA generically
Btrfs: implement our own ->llseek
fs: add SEEK_HOLE and SEEK_DATA flags
reiserfs: make reiserfs default to barrier=flush
...
Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mips: Fix i8253 clockevent fallout
i8253: Cleanup outb/inb magic
arm: Footbridge: Use common i8253 clockevent
mips: Use common i8253 clockevent
x86: Use common i8253 clockevent
i8253: Create common clockevent implementation
i8253: Export i8253_lock unconditionally
pcpskr: MIPS: Make config dependencies finer grained
pcspkr: Cleanup Kconfig dependencies
i8253: Move remaining content and delete asm/i8253.h
i8253: Consolidate definitions of PIT_LATCH
x86: i8253: Consolidate definitions of global_clock_event
i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
alpha: i8253: Cleanup remaining users of i8253pit.h
i8253: Remove I8253_LOCK config
i8253: Make pcsp sound driver use the shared i8253_lock
i8253: Make pcspkr input driver use the shared i8253_lock
i8253: Consolidate all kernel definitions of i8253_lock
i8253: Unify all kernel declarations of i8253_lock
i8253: Create linux/i8253.h and use it in all 8253 related files
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
perf: Remove the nmi parameter from the oprofile_perf backend
x86, perf: Make copy_from_user_nmi() a library function
perf: Remove perf_event_attr::type check
x86, perf: P4 PMU - Fix typos in comments and style cleanup
perf tools: Make test use the preset debugfs path
perf tools: Add automated tests for events parsing
perf tools: De-opt the parse_events function
perf script: Fix display of IP address for non-callchain path
perf tools: Fix endian conversion reading event attr from file header
perf tools: Add missing 'node' alias to the hw_cache[] array
perf probe: Support adding probes on offline kernel modules
perf probe: Add probed module in front of function
perf probe: Introduce debuginfo to encapsulate dwarf information
perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
perf probe: Remove redundant dwarf functions
perf probe: Move strtailcmp to string.c
perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
tracing/kprobe: Update symbol reference when loading module
tracing/kprobes: Support module init function probing
kprobes: Return -ENOENT if probe point doesn't exist
...
* 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
pci/of: Consolidate pci_bus_to_OF_node()
pci/of: Consolidate pci_device_to_OF_node()
x86/devicetree: Use generic PCI <-> OF matching
microblaze/pci: Move the remains of pci_32.c to pci-common.c
microblaze/pci: Remove powermac originated cruft
pci/of: Match PCI devices to OF nodes dynamically
* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6: (61 commits)
gpio/mxc/mxs: fix build error introduced by the irq_gc_ack() renaming
mcp23s08: add i2c support
mcp23s08: isolate spi specific parts
mcp23s08: get rid of setup/teardown callbacks
gpio/tegra: dt: add binding for gpio polarity
mcp23s08: remove unused work queue
gpio/da9052: remove a redundant assignment for gpio->da9052
gpio/mxc: add device tree probe support
ARM: mxc: use ARCH_NR_GPIOS to define gpio number
gpio/mxc: get rid of the uses of cpu_is_mx()
gpio/mxc: add missing initialization of basic_mmio_gpio shadow variables
gpio: Move mpc5200 gpio driver to drivers/gpio
GPIO: DA9052 GPIO module v3
gpio/tegra: Use engineering names in DT compatible property
of/gpio: Add new method for getting gpios under different property names
gpio/dt: Refine GPIO device tree binding
gpio/ml-ioh: fix off-by-one for displaying variable i in dev_err
gpio/pca953x: Deprecate meaningless device-tree bindings
gpio/pca953x: Remove dynamic platform data pointer
gpio/pca953x: Fix IRQ support.
...
Change the string to check for CAMP mode boot on MPC85xx (eg. P2020) to match
the one in the corresponding dts files (p2020rdb_camp_core{0,1}.dts).
Without this fix the mpic is configured as in the SMP boot mode, which causes
the first core to report a protected source interrupt error for devices
of the other core and lock up.
Also add MPIC_SINGLE_DEST_CPU on both P2020 based architectures in CAMP
mode as suggested by Scott Wood. Thanks.
Cc: Scott Wood <scottwood@freescale.com>
Cc: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We already did it for hard IRQs but it looks like we forgot
to do it for softirqs. Without this, we would lose flags
such as TIF_NEED_RESCHED set using current_thread_info()
by something running of a softirq.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The 64-bit Book-E parts (to date) dont utilize the 'server' class
perfmon. So building or depending on it makes no sense (and does break
FSL Book-E 64-bit support). Move the selection of PPC_HAVE_PMU_SUPPORT
to be based on PPC_BOOK3S_64.
Based on a patch from Scott Wood.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
An implementation of a code generator for BPF programs to speed up packet
filtering on PPC64, inspired by Eric Dumazet's x86-64 version.
Filter code is generated as an ABI-compliant function in module_alloc()'d mem
with stackframe & prologue/epilogue generated if required (simple filters don't
need anything more than an li/blr). The filter's local variables, M[], live in
registers. Supports all BPF opcodes, although "complicated" loads from negative
packet offsets (e.g. SKF_LL_OFF) are not yet supported.
There are a couple of further optimisations left for future work; many-pass
assembly with branch-reach reduction and a register allocator to push M[]
variables into volatile registers would improve the code quality further.
This currently supports big-endian 64-bit PowerPC only (but is fairly simple
to port to PPC32 or LE!).
Enabled in the same way as x86-64:
echo 1 > /proc/sys/net/core/bpf_jit_enable
Or, enabled with extra debug output:
echo 2 > /proc/sys/net/core/bpf_jit_enable
Signed-off-by: Matt Evans <matt@ozlabs.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All these are instances of
#define NAME value;
or
#define NAME(params_opt) value;
These of course fail to build when used in contexts like
if(foo $OP NAME)
while(bar $OP NAME)
and may silently generate the wrong code in contexts such as
foo = NAME + 1; /* foo = value; + 1; */
bar = NAME - 1; /* bar = value; - 1; */
baz = NAME & quux; /* baz = value; & quux; */
Reported on comp.lang.c,
Message-ID: <ab0d55fe-25e5-482b-811e-c475aa6065c3@c29g2000yqd.googlegroups.com>
Initial analysis of the dangers provided by Keith Thompson in that thread.
There are many more instances of more complicated macros having unnecessary
trailing semicolons, but this pile seems to be all of the cases of simple
values suffering from the problem. (Thus things that are likely to be found
in one of the contexts above, more complicated ones aren't.)
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kernel loadable module can use hard_smp_processor_id() if building with SMP
kernel. In order to make it work for UP kernels too, boot_cpuid_phys
symbol (which is what hard_smp_processor_id() macro resolves to
in non-SMP configuration) must be exported.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now we have the CFAR saved add it to the oops output.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
HFI creates interrupts each time a window is setup. This results in
a lot of messages in the kernel log buffer:
irq: irq 199007 on host null mapped to virtual irq 351
This box has over 3500 of them, causing more important kernel
messages to be overwritten. We can get at this information via
debugfs now so we may as well turn it into a pr_debug.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As a result of changes to Kconfig files, we no longer enable
the lockup and hung task detectors. Both are very light weight
and provide useful information in the event of a hang, so
reenable them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add mpt2sas driver to pseries and ppc64 defconfig.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The IRQs off tracer enables mcount which has a big impact on
performance. Disable it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The pseries defconfig had a number of drivers enabled and we may
as well add them to the ppc64 defconfig.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Return -EAGAIN when we get H_BUSY back from the hypervisor. This
makes the hvc console driver retry, avoiding dropped printks.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The existing code it pretty ugly. How about we clean it up even more
like this?
From: Anton Blanchard <anton@samba.org>
We check for timeout expiry in the outer loop, but we also need to
check it in the inner loop or we can lock up forever waiting for a
CPU to hit real mode.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On 32bit platforms that support >= 4GB memory total_ram was truncated.
This creates a confusing printk:
Top of RAM: 0x100000000, Total RAM: 0x0
Fix that:
Top of RAM: 0x100000000, Total RAM: 0x100000000
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes the following warning:
WARNING: arch/powerpc/kernel/built-in.o(.text+0x29768): Section mismatch in reference from the function .register_power_pmu() to the function .cpuinit.text:.power_pmu_notifier()
The function .register_power_pmu() references
the function __cpuinit .power_pmu_notifier().
This is often because .register_power_pmu lacks a __cpuinit
annotation or the annotation of .power_pmu_notifier is wrong.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Callback based iteration is cumbersome and much less useful than
for_each_*() iterator. This patch implements for_each_mem_pfn_range()
which replaces work_with_active_regions(). All the current users of
work_with_active_regions() are converted.
This simplifies walking over early_node_map and will allow converting
internal logics in page_alloc to use iterator instead of walking
early_node_map directly, which in turn will enable moving node
information to memblock.
powerpc change is only compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Instead of using ill-defined numbers (0, 1, and 2) for the monitor port, allow
the user to specify the port by name ("dvi", "lvds", or "dlvds"). This works
on the kernel command line, the module command-line, and the sysfs "monitor"
device.
Note that changing the monitor port does not currently work on the P1022DS,
because the code that talks to the PIXIS FPGA is broken.
Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Move the "Checking link..." printk to the function that actually checks the
linke.
Reported-by: Ayman El-Khashab <ayman@elkhashab.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Move separate microblaze and powerpc pci flag functions pci_set_flags,
pci_add_flags, and pci_has_flag into asm-generic/pci-bridge.h so other
archs can use them.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Michal Simek <monstr@monstr.eu>
The 44x code (which is shared by 47x) assumes the available physical memory
begins at 0x00000000. This is not necessarily the case in an AMP
environment.
Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be
loaded into a higher memory range.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This renames pci flags functions and enums in preparation for creating
generic version in asm-generic/pci-bridge.h. The following search and
replace is done:
s/ppc_pci_/pci_/
s/PPC_PCI_/PCI_/
Direct accesses to ppc_pci_flag variable are replaced with helper
functions.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Taishan (440GX) has the first PHY (EMAC2) mapped at PHY address 1
and the 2nd PHY (EMAC3) at PHY address 3. Use "phy-address" to
correctly describe this instead of "phy-map".
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
For AMP, different kernel instances load into separate memory regions.
Read the start of memory from the device tree and limit the memory to what's
specified in the device tree.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Since other OS's may be running on the other cores don't use tlbivax
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
All current pcie controllers unconditionally use SDR to check the link and
poll for reset. Refactor the code to include device reset in the
port_init_hw() op and add a new check_link() op.
This will make room fro new controllers that do not use SDR for these
operations.
Tested on 460ex.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Commit c8f729d408 (KVM: PPC: Deliver program interrupts right away instead
of queueing them) made away with all users of prog_flags, so we can just
remove it from the headers.
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds support for running KVM guests in supervisor mode on those
PPC970 processors that have a usable hypervisor mode. Unfortunately,
Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
1), but the YDL PowerStation does have a usable hypervisor mode.
There are several differences between the PPC970 and POWER7 in how
guests are managed. These differences are accommodated using the
CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
bits. Notably, on PPC970:
* The LPCR, LPID or RMOR registers don't exist, and the functions of
those registers are provided by bits in HID4 and one bit in HID0.
* External interrupts can be directed to the hypervisor, but unlike
POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
SRR0/1 not HSRR0/1.
* There is no virtual RMA (VRMA) mode; the guest must use an RMO
(real mode offset) area.
* The TLB entries are not tagged with the LPID, so it is necessary to
flush the whole TLB on partition switch. Furthermore, when switching
partitions we have to ensure that no other CPU is executing the tlbie
or tlbsync instructions in either the old or the new partition,
otherwise undefined behaviour can occur.
* The PMU has 8 counters (PMC registers) rather than 6.
* The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.
* The SLB has 64 entries rather than 32.
* There is no mediated external interrupt facility, so if we switch to
a guest that has a virtual external interrupt pending but the guest
has MSR[EE] = 0, we have to arrange to have an interrupt pending for
it so that we can get control back once it re-enables interrupts. We
do that by sending ourselves an IPI with smp_send_reschedule after
hard-disabling interrupts.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
indicate that we have a usable hypervisor mode, and another to indicate
that the processor conforms to PowerISA version 2.06. We also add
another bit to indicate that the processor conforms to ISA version 2.01
and set that for PPC970 and derivatives.
Some PPC970 chips (specifically those in Apple machines) have a
hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
is not useful in the sense that there is no way to run any code in
supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1
bits in HID4 are always 0, and we use that as a way of detecting that
hypervisor mode is not useful.
Where we have a feature section in assembly code around code that
only applies on POWER7 in hypervisor mode, we use a construct like
END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
The definition of END_FTR_SECTION_IFSET is such that the code will
be enabled (not overwritten with nops) only if all bits in the
provided mask are set.
Note that the CPU feature check in __tlbie() only needs to check the
ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
if we are running bare-metal, i.e. in hypervisor mode.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode. H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.
Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The
ioctl returns a file descriptor which can be used to mmap the newly
created table. The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.
There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time. Specifically, allowing this will avoid awkwardness
when we need to reset the table. More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds the infrastructure for handling PAPR hcalls in the kernel,
either early in the guest exit path while we are still in real mode,
or later once the MMU has been turned back on and we are in the full
kernel context. The advantage of handling hcalls in real mode if
possible is that we avoid two partition switches -- and this will
become more important when we support SMT4 guests, since a partition
switch means we have to pull all of the threads in the core out of
the guest. The disadvantage is that we can only access the kernel
linear mapping, not anything vmalloced or ioremapped, since the MMU
is off.
This also adds code to handle the following hcalls in real mode:
H_ENTER Add an HPTE to the hashed page table
H_REMOVE Remove an HPTE from the hashed page table
H_READ Read HPTEs from the hashed page table
H_PROTECT Change the protection bits in an HPTE
H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table
H_SET_DABR Set the data address breakpoint register
Plus code to handle the following hcalls in the kernel:
H_CEDE Idle the vcpu until an interrupt or H_PROD hcall arrives
H_PROD Wake up a ceded vcpu
H_REGISTER_VPA Register a virtual processor area (VPA)
The code that runs in real mode has to be in the base kernel, not in
the module, if KVM is compiled as a module. The real-mode code can
only access the kernel linear mapping, not vmalloc or ioremap space.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
There are several fields in struct kvmppc_book3s_shadow_vcpu that
temporarily store bits of host state while a guest is running,
rather than anything relating to the particular guest or vcpu.
This splits them out into a new kvmppc_host_state structure and
modifies the definitions in asm-offsets.c to suit.
On 32-bit, we have a kvmppc_host_state structure inside the
kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
to get to them both with one pointer. On 64-bit they are separate
fields in the PACA. This means that on 64-bit we don't need to
copy the kvmppc_host_state in and out on vcpu load/unload, and
in future will mean that the book3s_hv code doesn't need a
shadow_vcpu struct in the PACA at all. That does mean that we
have to be careful not to rely on any values persisting in the
hstate field of the paca across any point where we could block
or get preempted.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
In hypervisor mode, the LPCR controls several aspects of guest
partitions, including virtual partition memory mode, and also controls
whether the hypervisor decrementer interrupts are enabled. This sets
up LPCR at boot time so that guest partitions will use a virtual real
memory area (VRMA) composed of 16MB large pages, and hypervisor
decrementer interrupts are disabled.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable()
calls in powerpc.c, this moves them down into the subarch-specific
book3s_pr.c and booke.c. This eliminates an extra local_irq_enable()
call in book3s_pr.c, and will be needed for when we do SMT4 guest
support in the book3s hypervisor mode code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This arranges for the top-level arch/powerpc/kvm/powerpc.c file to
pass down some of the calls it gets to the lower-level subarchitecture
specific code. The lower-level implementations (in booke.c and book3s.c)
are no-ops. The coming book3s_hv.c will need this.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Doing so means that we don't have to save the flags anywhere and gets
rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.
Doing so is OK because a program interrupt won't be generated at the
same time as any other synchronous interrupt. If a program interrupt
and an asynchronous interrupt (external or decrementer) are generated
at the same time, the program interrupt will be delivered, which is
correct because it has a higher priority, and then the asynchronous
interrupt will be masked.
We don't ever generate system reset or machine check interrupts to the
guest, but if we did, then we would need to make sure they got delivered
rather than the program interrupt. The current code would be wrong in
this situation anyway since it would deliver the program interrupt as
well as the reset/machine check interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Instead of branching out-of-line with the DO_KVM macro to check if we
are in a KVM guest at the time of an interrupt, this moves the KVM
check inline in the first-level interrupt handlers. This speeds up
the non-KVM case and makes sure that none of the interrupt handlers
are missing the check.
Because the first-level interrupt handlers are now larger, some things
had to be move out of line in exceptions-64s.S.
This all necessitated some minor changes to the interrupt entry code
in KVM. This also streamlines the book3s_32 KVM test.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
In preparation for adding code to enable KVM to use hypervisor mode
on 64-bit Book 3S processors, this splits book3s.c into two files,
book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
specific to running the guest in problem state (user mode) and book3s.c
contains code which should apply to all Book 3S processors.
In doing this, we abstract some details, namely the interrupt offset,
updating the interrupt pending flag, and detecting if the guest is
in a critical section. These are all things that will be different
when we use hypervisor mode.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This moves the slb field, which represents the state of the emulated
SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the
hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s.
This is in accord with the principle that the kvm_vcpu_arch struct
represents the state of the emulated CPU, and the kvmppc_vcpu_book3s
struct holds the auxiliary data structures used in the emulation.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through
function exports") resulted in vcpu->arch.trampoline_lowmem and
vcpu->arch.trampoline_enter ending up with kernel virtual addresses
rather than physical addresses. This is OK on 64-bit Book3S machines,
which ignore the top 4 bits of the effective address in real mode,
but on 32-bit Book3S machines, accessing these addresses in real mode
causes machine check interrupts, as the hardware uses the whole
effective address as the physical address in real mode.
This fixes the problem by using __pa() to convert these addresses
to physical addresses.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Only look in the 4 entries that could possibly contain the
entry we're looking for.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS]. Use
both PID0 and PID1 so that the shadow PIDs for the right mode can be
selected, that correspond both to guest TID = zero and guest TID = guest
PID.
This allows us to significantly reduce the frequency of needing to
invalidate the entire TLB. When the guest mode or PID changes, we just
update the host PID0/PID1. And since the allocation of shadow PIDs is
global, multiple guests can share the TLB without conflict.
Note that KVM does not yet support the guest setting PID1 or PID2 to
a value other than zero. This will need to be fixed for nested KVM
to work. Until then, we enforce the requirement for guest PID1/PID2
to stay zero by failing the emulation if the guest tries to set them
to something else.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Instead of a fully separate set of TLB entries, keep just the
pfn and dirty status.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is a shared page used for paravirtualization. It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.
The physical address specified by the hypercall is not used, as
e500 does not have real mode.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This allows large pages to be used on guest mappings backed by things like
/dev/mem, resulting in a significant speedup when guest memory
is mapped this way (it's useful for directly-assigned MMIO, too).
This is not a substitute for hugetlbfs integration, but is useful for
configurations where devices are directly assigned on chips without an
IOMMU -- in these cases, we need guest physical and true physical to
match, and be contiguous, so static reservation and mapping via /dev/mem
is the most straightforward way to set things up.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is in line with what other architectures do, and will allow us to
map things other than ordinary, unreserved kernel pages -- such as
dedicated devices, or large contiguous reserved regions.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This avoids races. It also means that we use the shadow TLB way,
rather than the hardware hint -- if this is a problem, we could do
a tlbsx before inserting a TLB0 entry.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since TLB1 loading doesn't check the shadow TLB before allocating another
entry, you can get duplicates.
Once shadow PIDs are enabled in a later patch, we won't need to
invalidate the TLB on every switch, so this optimization won't be
needed anyway.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is done lazily. The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit. Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.
For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.
Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
kvmppc_set_msr().
While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Previously, these macros hardcoded THREAD_EVR0 as the base of the save
area, relative to the base register passed. This base offset is now
passed as a separate macro parameter, allowing reuse with other SPE
save areas, such as used by KVM.
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().
This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Up until now, Book3S KVM had variables stored in the kernel that a kernel module
or the kvm code in the kernel could read from to figure out where some real mode
helper functions are located.
This is all unnecessary. The high bits of the EA get ignore in real mode, so we
can just use the pointer as is. Also, it's a lot easier on relocations when we
use the normal way of resolving the address to a function, instead of jumping
through hoops.
This patch fixes compilation with CONFIG_RELOCATABLE=y.
Signed-off-by: Alexander Graf <agraf@suse.de>
When http://www.spinics.net/lists/kvm-ppc/msg02664.html
was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2,
the removal of the conversion in add_exit_timing was left out.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Just compiling pseries in the kernel causes it to override
memory_block_size_bytes() regardless of what is the runtime
platform.
This cleans up the implementation of that function, fixing
a bug or two while at it, so that it's harmless (and potentially
useful) for other platforms. Without this, bugs in that code
would trigger a WARN_ON() in drivers/base/memory.c when
booting some different platforms.
If/when we have another platform supporting memory hotplug we
might want to either move that out to a generic place or
make it a ppc_md. callback.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The only reason to require a dma_ops struct is to see if it has
implemented set_dma_mask. If not we can fall back to setting the mask
directly.
This resolves an issue with how to sequence the setting of a DMA mask
for platform devices. Before we had an issue in that we have no way of
setting the DMA mask before the various low level bus notifiers get
called that might check it (swiotlb).
So now we can do:
pdev = platform_device_alloc("foobar", 0);
dma_set_mask(&pdev->dev, DMA_BIT_MASK(37));
platform_device_add(pdev);
And expect the right thing to happen with the bus notifiers get called
via platform_device_add.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We have a long standing issues with platform devices not have a valid
dma_mask pointer. This hasn't been an issue to date as no platform
device has tried to set its dma_mask value to a non-default value.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This is used to round-robin TLBCAM entries.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On P1022DS both ethernet controllers are connected to RGMII PHYs
accessible via MDIO bus. Remove fixed-link property from ethernet
nodes as they only required when fixed link PHYs without MDIO bus
are used.
Signed-off-by: Felix Radensky <felix@embedded-sol.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Under the FSL Hypervisor we triggered a BUG_ON in mpc85xx_smp_init that
expected smp_ops.message_pass to be explicity set. However recent
changes allows smp_ops.message_pass to be NULL and handled by default
code. Thus the BUG_ON isn't relevant anymore.
Signed-off-by: Laurentiu TUDOR <Laurentiu.Tudor@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
P2040RDB Specification:
-----------------------
2Gbyte unbuffered DDR3 SDRAM SO-DIMM(64bit bus)
128 Mbyte NOR flash single-chip memory
256 Kbit M24256 I2C EEPROM
16 Mbyte SPI memory
SD connector to interface with the SD memory card
dTSEC1: connected to the Vitesse SGMII PHY (VSC8221)
dTSEC2: connected to the Vitesse SGMII PHY (VSC8221)
dTSEC3: connected to the Vitesse SGMII PHY (VSC8221)
dTSEC4: connected to the Vitesse RGMII PHY (VSC8641)
dTSEC5: connected to the Vitesse RGMII PHY (VSC8641)
I2C1: Real time clock, Temperature sensor
I2C2: Vcore Regulator, 256Kbit I2C Bus EEPROM
SATA: Lanes C and Land D of Bank2 are connected to two SATA connectors
UART: supports two UARTs up to 115200 bps for console
USB 2.0: connected via a internal UTMI PHY to two TYPE-A interfaces
PCIe:
- Lanes E, F, G and H of Bank1 are connected to one x4 PCIe SLOT1
- Lanes C and Land D of Bank2 are connected to one x4 PCIe SLOT2
Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
CONFIG_PPC_EPAPR_HV_BYTECHAN adds support for the Freescale hypervisor
byte channel tty driver.
CONFIG_VIRT_DRIVERS and CONFIG_FSL_HV_MANAGER add support for the Freescale
hypervisor management driver.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Split out common (non-board specific) parts of the SoC related device
tree into a stub so multiple board dts files can include it and we can
reduce duplication and maintenance effort.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Split out common (non-board specific) parts of the SoC related device
tree into a stub so multiple board dts files can include it and we can
reduce duplication and maintenance effort.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
GPIO drivers are getting consolidated into drivers/gpio. While at it,
change the driver name to mpc5200-gpio* to avoid collisions.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The patch a8b0ca17b8 ("perf: Remove the nmi parameter from the swevent
and overflow interface") missed a spot in the ppc hw_breakpoint code,
fix this up so things compile again.
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-09pfip95g88s70iwkxu6nnbt@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The perf_event overflow handler does not receive any caller-derived
argument, so many callers need to resort to looking up the perf_event
in their local data structure. This is ugly and doesn't scale if a
single callback services many perf_events.
Fix by adding a context parameter to perf_event_create_kernel_counter()
(and derived hardware breakpoints APIs) and storing it in the perf_event.
The field can be accessed from the callback as event->overflow_handler_context.
All callers are updated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..
The below needs filling out for !x86 (which I filled out with
unsupported events).
I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.
Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.
For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.
As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit e360adbe29 ("irq_work: Add generic hardirq context
callbacks") fouled up the ppc bit, not properly naming the
arch specific function that raises the 'self-IPI'.
Cc: Huang Ying <ying.huang@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: stable@kernel.org # 37+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
gcc 4.7 will be more strict about parsing the -mtraceback option:
gcc: error: unrecognized argument in option '-mtraceback=none'
gcc: note: valid arguments to '-mtraceback=' are: full no part
gcc used to do a 2 char compare so both "no" and "none" would
match. Switch to using -mtraceback=no should work everywhere.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for the new "jump label" feature.
Unlike x86 and sparc we just merrily patch the code with no locks etc,
as far as I know this is safe, but I'm not really sure what the x86/sparc
code is protecting against so maybe it's not.
I also don't see any reason for us to implement the poke_early() routine,
even though sparc does.
[BenH: Updated the patch to upstream generic changes]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A mix of think & mismerge on my side caused a problem where both the
new hvsi_lib and the old hvsi driver gets compiled and try to define
symbols with the same name.
This fixes it by renaming the hvsi_lib exported symbols.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
a9c0f41b3a breaks the build
on some platforms. The extern declaration must be shielded
against assembly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a printk companion to replace the udbg progress function
when initmem is freed.
Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The free_initmem function is basically duplicated in mm/init_32,
and init_64, and is moved to the common 32/64-bit mm/mem.c.
All other sections except init were removed in v2.6.15 by
6c45ab992e (powerpc: Remove section
free() and linker script bits), and therefore the bulk of the executed
code is identical.
This patch also removes updating ppc_md.progress to NULL in the powermac
late_initcall.
Suggested-by: Milton Miller <miltonm@bga.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
arch/powerpc: use printk_ratelimited instead of printk_ratelimit
powerpc/rtas-rtc: remove sideeffects of printk_ratelimit
powerpc/pseries: remove duplicate SCSI_BNX2_ISCSI in pseries_defconfig
powerpc/e500: fix breakage with fsl_rio_mcheck_exception
powerpc/p1022ds: fix audio-related properties in the device tree
powerpc/85xx: fix NAND_CMD_READID read bytes number
On pseries machines, consoles are provided by the hypervisor using
a low level get_chars/put_chars type interface. However, this is
really just a transport to the service processor which implements
them either as "raw" console (networked consoles, HMC, ...) or as
"hvsi" serial ports.
The later is a simple packet protocol on top of the raw character
interface that is supposed to convey additional "serial port" style
semantics. In practice however, all it does is provide a way to
read the CD line and set/clear our DTR line, that's it.
We currently implement the "raw" protocol as an hvc console backend
(/dev/hvcN) and the "hvsi" protocol using a separate tty driver
(/dev/hvsi0).
However this is quite impractical. The arbitrary difference between
the two type of devices has been a major source of user (and distro)
confusion. Additionally, there's an additional mini -hvsi implementation
in the pseries platform code for our low level debug console and early
boot kernel messages, which means code duplication, though that low
level variant is impractical as it's incapable of doing the initial
protocol negociation to establish the link to the FSP.
This essentially replaces the dedicated hvsi driver and the platform
udbg code completely by extending the existing hvc_vio backend used
in "raw" mode so that:
- It now supports HVSI as well
- We add support for hvc backend providing tiocm{get,set}
- It also provides a udbg interface for early debug and boot console
This is overall less code, though this will only be obvious once we
remove the old "hvsi" driver, which is still available for now. When
the old driver is enabled, the new code still kicks in for the low
level udbg console, replacing the old mini implementation in the platform
code, it just doesn't provide the higher level "hvc" interface.
In addition to producing generally simler code, this has several benefits
over our current situation:
- The user/distro only has to deal with /dev/hvcN for the hypervisor
console, avoiding all sort of confusion that has plagued us in the past
- The tty, kernel and low level debug console all use the same code
base which supports the full protocol establishment process, thus the
console is now available much earlier than it used to be with the
old HVSI driver. The kernel console works much earlier and udbg is
available much earlier too. Hackers can enable a hard coded very-early
debug console as well that works with HVSI (previously that was only
supported for the "raw" mode).
I've tried to keep the same semantics as hvsi relative to how I react
to things like CD changes, with some subtle differences though:
- I clear DTR on close if HUPCL is set
- Current hvsi triggers a hangup if it detects a up->down transition
on CD (you can still open a console with CD down). My new implementation
triggers a hangup if the link to the FSP is severed, and severs it upon
detecting a up->down transition on CD.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When CONFIG_PPC_EARLY_DEBUG is set, call register_early_udbg_console()
early from generic code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Embed the struct hvsi_header in the various packet definitions
rather than open coding it multiple times. Will help provide
stronger type checking.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves various HVSI protocol definitions from the hvsi.c
driver to a header file that can be used later on by a udbg
implementation
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently Maple setup code creates cpc925_edac device only on
Motorola ATCA-6101 blade. Make setup code check bridge revision
and enable EDAC on all U3H bridges.
Verified on Momentum MapleD (ppc970fx kit) board.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reconfiguration notifier call for device node may fail by several reasons,
but it always assumes kmalloc failures.
This enables reconfiguration notifier call chain to get the actual error
code rather than -ENOMEM by converting all reconfiguration notifier calls
to return encapsulate error code with notifier_from_errno().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This introduces pSeries_reconfig_notify() as a just wrapper of
blocking_notifier_call_chain() for pSeries_reconfig_chain.
This is a preparation to improvement of error code on reconfiguration
notifier failure.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enable functions used to access SCOM if PPC_MAPLE is defined: they are
used by cpufreq driver to control hardware.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This has been broken for a while but hasn't been an issue until
now because nobody was reserving regions at high addresses.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On MMUs such as FSL where we can guarantee the entire linear mapping is
bolted, we don't need to worry about linear TLB misses. If on top of
that we do a full table walk, we get rid of all recursive TLB faults, and
can dispense with some state saving. This gains a few percent on
TLB-miss-heavy workloads, and around 50% on a benchmark that had a high
rate of virtual page table faults under the normal handler.
While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and
EX_TLB_SRR1 as they're not used.
[BenH: Fixed build with 64K pages (wsp config)]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited.
Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Don't use printk_ratelimit() as an additional condition for returning
on an error. Because when the ratelimit is reached, printk_ratelimit
will return 0 and e.g. in rtas_get_boot_time won't check for an error
condition.
Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove duplicate assignment of SCSI_BNX2_ISCSI in pseries_defconfig
introduced by:
37e0c21e powerpc/pseries: Enable iSCSI support for a number of cards
causes warning:
arch/powerpc/configs/pseries_defconfig:151:warning: override: reassigning to symbol SCSI_BNX2_ISCSI
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This will allow the new HW RNG driver to bind on these boards
Signed-off-by: Mike Williams <mike@mikebwilliams.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The Sequoia board has a Security function IP block on it that contains a TRNG.
Add the crypto and rng portions of that IP block to the DTS.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
commit 21a3c96 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.
Then, we see
mm/page_cgroup.c: In function 'page_cgroup_init':
mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'
So, fixiing page_cgroup.c is an idea...
But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)
This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.
A result of macro expansion is here (mm/page_cgroup.c)
for !NUMA
start_pfn = ((&contig_page_data)->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
for NUMA (x86-64)
start_pfn = ((node_data[nid])->node_start_pfn);
end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
Changelog:
- fixed to avoid using "nid" twice in node_end_pfn() macro.
Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Freescale hypervisor does not allow guests to write to the timebase
registers (virtualizing the timebase register was deemed too complicated),
so don't try to synchronize the timebase registers when we're running
under the hypervisor.
This typically happens when kexec support is enabled.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
P1010RDB Overview
-----------------
1Gbyte DDR3 (on board DDR)
32Mbyte 16bit NOR flash
32Mbyte SLC NAND Flash
256 Kbit M24256 I2C EEPROM
128 Mbit SPI Flash memory
I2C Board 128x8 bit memory
SD/MMC connector to interface with the SD memory card
2 SATA interface
1 internal SATA connect to 2.5. 160G SATA2 HDD
1 eSATA connector to rear panel
USB 2.0
x1 USB 2.0 port: connected via a UTMI PHY to Mini-AB interface.
x1 USB 2.0 port: directly connected to Mini-AB interface Ethernet
eTSEC1: Connected to RGMII PHY VSC8641XKO
eTSEC2: Connected to SGMII PHY VSC8221
eTSEC3: Connected to SGMII PHY VSC8221 eCAN
Two DB-9 female connectors for Field bus interface UART
DUART interface: supports two UARTs up to 115200 bps for console display
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Enable framebuffer console support by default in the defconfig on the
Freescale MPC8610 HPCD reference board. This allows the boot messages to
be shown on the video display.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
mpc8610hpcd_set_pixel_clock() calculates the correct value of the PXCLK
bits in the CLKDVDR register for a given pixel clock rate. The code which
performs this calculation is overly complicated and includes an error
estimation routine that doesn't work most of the time anyway. Replace the
code with the simpler routine that's currently used on the P1022DS.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Enable framebuffer console support by default in the defconfigs for the
Freescale 85xx-based reference board. This allows the boot messages to
be shown on the video display on the P1022DS.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
To ensure that the DIU pixel clock will not be set to an invalid value,
clamp the PXCLK divider to the allowed range (2-255). This also acts as
a limiter for the pixel clock.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
e500mc cannot doze or nap due to an erratum (as well as having a
different mechanism than previous e500), but it has a "wait" instruction
that is similar to doze.
On 64-bit, due to the soft-irq-disable mechanism, the existing
book3e_idle should be used instead.
Signed-off-by: Vakul Garg <vakul@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Split out common (non-board specific) parts of the SoC related device
tree into a stub so multiple board dts files can include it and we can
reduce duplication and maintenance effort.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The platform file for the Freecale P1022DS reference board is not freeing
the ioremap() mapping of the PIXIS and global utilities nodes it creates.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
fsl-lbc driver requires an interrupt to bind to localbus device.
Populate 85xx boards' dts trees with lbc interrupt info.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
FSL PCIe controller can act as agent(EP) or host(RC). Under Agent(EP) mode
the controller will be configured by the host system. So its not required
to be registered with the PCI(e) sub-system. We only register the
controller if its configured in host(RC) mode.
Signed-off-by: Vivek Mahajan <vivek.mahajan@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add support for the ePAPR-compliant Freescale hypervisor (aka "Topaz") on
the Freescale P3041DS, P4080DS, and P5020DS reference boards.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add functions to restart and halt the current partition when running under
the Freescale hypervisor. These functions should be assigned to various
function pointers of the ppc_md structure during the .probe() function for
the board:
ppc_md.restart = fsl_hv_restart;
ppc_md.power_off = fsl_hv_halt;
ppc_md.halt = fsl_hv_halt;
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Freescale ePAPR reference hypervisor provides interrupt controller
services via a hypercall interface, instead of emulating the MPIC
controller. This is called the VMPIC.
The ePAPR "virtual interrupt controller" provides interrupt controller
services for external interrupts. External interrupts received by a
partition can come from two sources:
- Hardware interrupts - hardware interrupts come from external
interrupt lines or on-chip I/O devices.
- Virtual interrupts - virtual interrupts are generated by the hypervisor
as part of some hypervisor service or hypervisor-created virtual device.
Both types of interrupts are processed using the same programming model and
same set of hypercalls.
Signed-off-by: Ashish Kalra <ashish.kalra@freescale.com>
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
ePAPR hypervisors provide operating system services via a "hypercall"
interface. The following steps need to be performed to make an hcall:
1. Load r11 with the hcall number
2. Load specific other registers with parameters
3. Issue instrucion "sc 1"
4. The return code is in r3
5. Other returned parameters are in other registers.
To provide this service to the kernel, these steps are wrapped in inline
assembly functions. Standard ePAPR hcalls are in epapr_hcalls.h, and
Freescale extensions are in fsl_hcalls.h.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Move irq_choose_cpu() into arch/powerpc/kernel/irq.c so that it can be used
by other PIC drivers. The function is not MPIC-specific.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We need the FSL specific header fixup code on both 32-bit and 64-bit
platforms so just move the code into pci-common.c.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The P1023 processor is an e500v2 based SoC that utilizes the DPAA
networking architecture. This adds basic board support for non-DPAA
functionality (device tree, board file, etc).
Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We expect this is actually faster, and we end up needing more space than we
can get from the SPRGs in some instances. This is also useful when running
as a guest OS - SPRGs4-7 do not have guest versions.
8 slots are allocated in thread_info for this even though we only actually
use 4 of them - this allows space for future code to have more scratch
space (and we know we'll need it for things like hugetlb).
Signed-off-by: Ashish Kalra <Ashish.Kalra@freescale.com>
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In cases like when the platform is used under hypervisor we will NOT
have an MPIC controller but still want doorbells setup.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We fixup every FSL PCIe Root Complex we need to fixup a few things.
Rather than adding every device under the sun we move to just matching
only on the vendor (PCI_VENDOR_ID_FREESCALE) and than check that we are
a PCIe controller in host mode in the fixup.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Several changes on PCIe support on P3041DS/P4080DS/P5020DS boards:
* Add support for "fsl,qoriq-pcie-v2.2" needed by P3041 & P5020
* Removed support for setting primary_phb_addr as we have no ISA need
* Add PCI controller to of_platform_bus_probe (for EDAC)
* Cleanup building w/SWIOTLB off on P4080DS (not stricly PCIe related)
Signed-off-by: Kai.Jiang <Kai.Jiang@freescale.com>
Signed-off-by: Laurentiu TUDOR <Laurentiu.Tudor@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* Added BSD dual-license
* Moved mpic-parent to root so we dont need to duplicate everywhere
* Added next level cache from L2 to CPC
* Moved to 4-cell MPIC interrupt properties
* Added 3 MSI banks
* Added numerous missing nodes: soc-sram-error, guts, pins, clockgen,
rcpm, sfp, serdes, etc.
* Reworked PCIe interrupts to be at virtual bridge level
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add basic device tree for P3041DS board. This device tree excludes
support for DPAA and RapidIO nodes.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add basic device tree for P5020DS board. This device tree excludes
support for DPAA and RapidIO nodes.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The e500mc and e5500 based cores are only available on corenet based
SoCs. We use this name for the P204x, P3040, P4040, P4080, P50x0 SoCs
and any future processors in these families.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Rather than trying to use the core name we use corenet to distinquish
the platform/core combo. corenet64 will be a 64-bit kernel build and
we'll add a new defconfig for corenet32 for a 32-bit platforms.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The wrong MCSR bit was being used on e500mc. MCSR_BUS_RBERR only exists
on e500v1/v2. Use MCSR_LD on e500mc, and remove all MCSR checking
in fsl_rio_mcheck_exception as we now no longer call that function
if the appropriate bit in MCSR is not set.
If RIO support was enabled at compile-time, but was never probed, just
return from fsl_rio_mcheck_exception rather than dereference a NULL
pointer.
TODO: There is still a remaining, though comparitively minor, issue in
that this recovery mechanism will falsely engage if there's an unrelated
MCSR_LD event at the same time as a RIO error.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On the Freescale P1022DS reference board, the SSI audio controller is
connected in "asynchronous" mode to the codec's clocks, so the device tree
needs an "fsl,ssi-asynchronous" property.
Also remove the clock-frequency property from the wm8776 node, because
the clock is enabled only if U-Boot enables it, and U-Boot will set the
property if the clock is enabled. A future version of the P1022DS audio
driver will configure the clock itself, but for now, the driver should
not be told that the clock is running when it isn't.
Also fix the FIFO depth to 15, instead of 16.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Many stupid corrections of duplicated includes based on the output of
scripts/checkincludes.pl.
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
doorbell type is defined as bits 32:36 so should be shifted by 63-36 =
27 rather than 28.
We never noticed this bug as we've only every used type PPC_DBELL = 0.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Support compilation of mpic.c with DEBUG defined, as now we have irq_desc and
not irq number.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On many platforms (including pSeries), smp_ops->message_pass is always
smp_muxed_ipi_message_pass. This changes arch/powerpc/kernel/smp.c so
that if smp_ops->message_pass is NULL, it calls smp_muxed_ipi_message_pass
directly.
This means that a platform doesn't need to set both .message_pass and
.cause_ipi, only one of them. It is a slight performance improvement
in that it gets rid of an indirect function call at the expense of a
predictable conditional branch.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
smp_release_cpus() waits for all cpus (including the bootcpu) due to an
off-by-one count on boot_cpu_count (which is all CPUs). This patch replaces
that with spinning_secondaries (which is all secondary CPUs).
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Before if we didn't support or enable HW table walk we'd get a messaage
like:
MMU: Book3E Page Tables Disabled
Which is a bit misleading. Now it will say:
MMU: Book3E HW tablewalk not supported
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Several fixes as well where the +1 was missing.
Done via coccinelle scripts like:
@@
struct resource *ptr;
@@
- ptr->end - ptr->start + 1
+ resource_size(ptr)
and some grep and typing.
Mostly uncompiled, no cross-compilers.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Lenghty lists of the kind "depends on ARCH1 || ARCH2 ... || ARCH123" are
usually either wrong or too coarse grained. Or plain an ugly sin.
[ tglx: Fixed up amigaone ]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Gerhard Pircher <gerhard_pircher@gmx.net>
Link: http://lkml.kernel.org/r/20110601180610.984881988@duck.linux-mips.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When using 64K pages with a separate cpio rootfs, U-Boot will align
the rootfs on a 4K page boundary. When the memory is reserved, and
subsequent early memblock_alloc is called, it will allocate memory
between the 64K page alignment and reserved memory. When the reserved
memory is subsequently freed, it is done so by pages, causing the
early memblock_alloc requests to be re-used, which in my case, caused
the device-tree to be clobbered.
This patch forces the reserved memory for initrd to be kernel page
aligned, and will move the device tree if it overlaps with the range
extension of initrd. This patch will also consolidate the identical
function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c,
and adds the same range extension when freeing initrd. free_initrd_mem()
is also moved to the __init section.
Many thanks to Milton Miller for his input on this patch.
[BenH: Fixed build without CONFIG_BLK_DEV_INITRD]
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
dtc was moved and .gitignores have been added to the new location. So, we can
delete the old, forgotten ones.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The generic code always get the device-node in the right place now
so a single implementation will work for all archs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
All archs do more or less the same thing now, move it into
a single generic place.
I chose pci.h rather than of_pci.h to avoid having to change
all call-sites to include the later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
powerpc has two different ways of matching PCI devices to their
corresponding OF node (if any) for historical reasons. The ppc64 one
does a scan looking for matching bus/dev/fn, while the ppc32 one does a
scan looking only for matching dev/fn on each level in order to be
agnostic to busses being renumbered (which Linux does on some
platforms).
This removes both and instead moves the matching code to the PCI core
itself. It's the most logical place to do it: when a pci_dev is created,
we know the parent and thus can do a single level scan for the matching
device_node (if any).
The benefit is that all archs now get the matching for free. There's one
hook the arch might want to provide to match a PHB bus to its device
node. A default weak implementation is provided that looks for the
parent device device node, but it's not entirely reliable on powerpc for
various reasons so powerpc provides its own.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We are missing FPU feature bit that user space may require. In the
64-bit mode this gets set since we pull it in via COMMON_USER_PPC64. We
just explicitly set it so user space will be happy again.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
arch/powerpc/kernel/built-in.o: In function `machine_check_e500mc':
arch/powerpc/kernel/traps.c:429: undefined reference to `fsl_rio_mcheck_exception'
arch/powerpc/kernel/built-in.o: In function `machine_check_e500':
arch/powerpc/kernel/traps.c:519: undefined reference to `fsl_rio_mcheck_exception'
make: *** [.tmp_vmlinux1] Error 1
Reported-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The Apple custom PIC only exist in some earlier machine models,
anything with an MPIC will crash on suspend if we register those
syscore ops unconditionally.
This is a regression caused by commit f5a592f7d7 ("PM / PowerPC: Use
struct syscore_ops instead of sysdevs for PM")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (45 commits)
ARM: 6945/1: Add unwinding support for division functions
ARM: kill pmd_off()
ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks
ARM: 6943/1: mm: use TTBR1 instead of reserved context ID
ARM: 6942/1: mm: make TTBR1 always point to swapper_pg_dir on ARMv6/7
ARM: 6941/1: cache: ensure MVA is cacheline aligned in flush_kern_dcache_area
ARM: add sendmmsg syscall
ARM: 6863/1: allow hotplug on msm
ARM: 6832/1: mmci: support for ST-Ericsson db8500v2
ARM: 6830/1: mach-ux500: force PrimeCell revisions
ARM: 6829/1: amba: make hardcoded periphid override hardware
ARM: 6828/1: mach-ux500: delete SSP PrimeCell ID
ARM: 6827/1: mach-netx: delete hardcoded periphid
ARM: 6940/1: fiq: Briefly document driver responsibilities for suspend/resume
ARM: 6938/1: fiq: Refactor {get,set}_fiq_regs() for Thumb-2
ARM: 6914/1: sparsemem: fix highmem detection when using SPARSEMEM
ARM: 6913/1: sparsemem: allow pfn_valid to be overridden when using SPARSEMEM
at91: drop at572d940hf support
at91rm9200: introduce at91rm9200_set_type to specficy cpu package
at91: drop boot_params and PLAT_PHYS_OFFSET
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Fix PM QOS's user mode interface to work with ASCII input
PM / Hibernate: Update kerneldoc comments in hibernate.c
PM / Hibernate: Remove arch_prepare_suspend()
PM / Hibernate: Update some comments in core hibernate code
By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:
* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup
The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.
This patch removes the ns_cgroup as suggested in the following thread:
https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
The 'cgroup_clone' function is removed because it is no longer used.
This is a userspace-visible change. Commit 45531757b4 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds MSI support for 440SPe, 460Ex, 460Sx and 405Ex.
Signed-off-by: Rupjyoti Sarmah <rsarmah@apm.com>
Signed-off-by: Tirumala R Marri <tmarri@apm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Instead of looping over each irq and checking against the irq array
bounds, adjust the bounds before looping.
The old code will not free any irq if the irq + count is above
irq_virq_count because the test in the loop is testing irq + count
instead of irq + i.
This code checks the limits to avoid unsigned integer overflows.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The radix-tree code uses call_rcu when freeing internal elements.
We must protect against the elements being freed while we traverse
the tree, even if the returned pointer will still be valid.
While preparing a patch to expand the context in which
irq_radix_revmap_lookup will be called, I realized that the
radix tree was not locked.
When asked
For a normal call_rcu usage, is it allowed to read the structure in
irq_enter / irq_exit, without additional rcu_read_lock? Could an
element freed with call_rcu advance with the cpu still between
irq_enter/irq_exit (and irq_disabled())?
Paul McKenney replied:
Absolutely illegal to do so. OK for call_rcu_sched(), but a
flaming bug for call_rcu().
And thank you very much for finding this!!!
Further analysis:
In the current CONFIG_TREE_RCU implementation. CONFIG_TREE_PREEMPT_RCU
(and CONFIG_TINY_PREEMPT_RCU) uses explicit counters.
These counters are reflected from per-CPU to global in the
scheduling-clock-interrupt handler, so disabling irq does prevent the
grace period from completing. But there are real-time implementations
(such as the one use by the Concurrent guys) where disabling irq
does -not- prevent the grace period from completing.
While an alternative fix would be to switch radix-tree to rcu_sched, I
don't want to audit the other users of radix trees (nor put alternative
freeing in the library). The normal overhead for rcu_read_lock and
unlock are a local counter increment and decrement.
This does not show up in the rcu lockdep because in 2.6.34 commit
2676a58c98 (radix-tree: Disable RCU lockdep checking in radix tree)
deemed it too hard to pass the condition of the protecting lock
to the library.
Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Look up the descriptor and check that it is found in handle_one_irq
before checking if we are on the irq stack, and call the handler
directly using the descriptor if we are on the stack.
We need check irq_to_desc finds the descriptor to avoid a NULL
pointer dereference. It could have failed because the number from
ppc_md.get_irq was above NR_IRQS, or various exceptional conditions
with sparse irqs (eg race conditions while freeing an irq if its was
not shutdown in the controller).
fe12bc2c99 (genirq: Uninline and sanity check generic_handle_irq())
moved generic_handle_irq out of line to allow its use by interrupt
controllers in modules. However, handle_one_irq is core arch code.
It already knows the details of struct irq_desc and handling irqs in
the nested irq case. This will avoid the extra stack frame to return
the value we don't check.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since kmem caches are allocated before init_IRQ as noted in 3af259d155
(powerpc: Radix trees are available before init_IRQ), we now call
kmalloc in all cases and can can always call kfree if we are asked
to allocate a duplicate or conflicting IRQ_HOST_MAP_LEGACY host.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The comment claims we will call host->ops->map() to update the flags if
we find a previously established mapping, but we never did. We used
to call remap, but that call was removed in da05198002 (powerpc: Remove
irq_host_ops->remap hook).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rename functions and arguments to reflect current usage. iic_cause_ipi
becomes iic_message_pass and iic_ipi_to_irq becomes iic_msg_to_irq,
and iic_request_ipi now takes a message (msg) instead of an ipi number.
Also mesg is renamed to msg.
Commit f1072939b6 (powerpc: Remove checks for MSG_ALL and
MSG_ALL_BUT_SELF) connected the smp_message_pass hook for cell to the
underlying iic_cause_IPI, a platform unique name. Later 23d72bfd8f
(powerpc: Consolidate ipi message mux and demux) added a cause_ipi
hook to the smp_ops, also used in message passing, but for controllers
that can not send 4 unique messages and require multiplexing. It is
even more confusing that the both take two arguments, but one is the
small message ordinal and the other is an opaque long data associated
with the cpu.
Since cell iic maps messages one to one to ipi irqs, rename the
function and argument to translate from ipi to message. Also make it
clear that iic_request_ipi takes a message number as the argument
for which ipi to create and request.
No functionional change, just renames to avoid future confusion.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The cell iic interrupt controller has enough software caused interrupts
to use a unique interrupt for each of the 4 messages powerpc uses.
This means each interrupt gets its own irq action/data combination.
Use the seperate, optimized, arch common ipi action functions
registered via the helper smp_request_message_ipi instead passing the
message as action data to a single action that then demultipexes to
the required acton via a switch statement.
smp_request_message_ipi will register the action as IRQF_PER_CPU
and IRQF_DISABLED, and WARN if the allocation fails for some reason,
so no need to print on that failure. It will return positive if
the message will not be used by the kernel, in which case we can
free the virq.
In addition to elimiating inefficient code, this also corrects the
error that a kernel built with kexec but without a debugger would
not register the ipi for kdump to notify the other cpus of a crash.
This also restores the debugger action to be static to kernel/smp.c.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When page coalescing support was added recently, the MAX_HCALL_OPCODE
define was not updated for the newly added H_GET_MPP_X hcall.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 0837e3242c fixes a situation on POWER7
where events can roll back if a specualtive event doesn't actually complete.
This can raise a performance monitor exception. We need to catch this to ensure
that we reset the PMC. In all cases the PMC will be less than 256 cycles from
overflow.
This patch lifts Anton's fix for the problem in perf and applies it to oprofile
as well.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Cc: <stable@kernel.org> # as far back as it applies cleanly
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch implements the raw syscall tracepoints on PowerPC and exports
them for ftrace syscalls to use.
To minimise reworking existing code, I slightly re-ordered the thread
info flags such that the new TIF_SYSCALL_TRACEPOINT bit would still fit
within the 16 bits of the andi. instruction's UI field. The instructions
in question are in /arch/powerpc/kernel/entry_{32,64}.S to and the
_TIF_SYSCALL_T_OR_A with the thread flags to see if system call tracing
is enabled.
In the case of 64bit PowerPC, arch_syscall_addr and
arch_syscall_match_sym_name are overridden to allow ftrace syscalls to
work given the unusual system call table structure and symbol names that
start with a period.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'timers-ptp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ptp: Fix dp83640 build warning when building statically
ptp: Added a clock driver for the National Semiconductor PHYTER.
ptp: Added a clock driver for the IXP46x.
ptp: Added a clock that uses the eTSEC found on the MPC85xx.
ptp: Added a brand new class driver for ptp clocks.
Most arches define CONFIG_DEBUG_STACK_USAGE exactly the same way. Move it
to lib/Kconfig.debug so each arch doesn't have to define it. This
obviously makes the option generic, but that's fine because the config is
already used in generic code.
It's not obvious to me that sysrq-P actually does anything caution by
keeping the most inclusive wording.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DEBUG_PER_CPU_MAPS is used in lib/cpumask.c as well as in
inlcude/linux/cpumask.h and thus it has outgrown its use within x86 and
powerpc alone. Any arch with SMP support may want to get some more
debugging, so make this option generic.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix up powerpc to the new mmu_gather stuff.
PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.
For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context switch
and use a TLF bit to track the lazy_mmu state.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures supporting hibernation define
arch_prepare_suspend() as an empty function, so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: Unify input section names
percpu: Avoid extra NOP in percpu_cmpxchg16b_double
percpu: Cast away printk format warning
percpu: Always align percpu output section to PAGE_SIZE
Fix up fairly trivial conflict in arch/x86/include/asm/percpu.h as per Tejun
The eTSEC includes a PTP clock with quite a few features. This patch adds
support for the basic clock adjustment functions, plus two external time
stamps, one alarm, and the PPS callback.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
b43: fix comment typo reqest -> request
Haavard Skinnemoen has left Atmel
cris: typo in mach-fs Makefile
Kconfig: fix copy/paste-ism for dell-wmi-aio driver
doc: timers-howto: fix a typo ("unsgined")
perf: Only include annotate.h once in tools/perf/util/ui/browsers/annotate.c
md, raid5: Fix spelling error in comment ('Ofcourse' --> 'Of course').
treewide: fix a few typos in comments
regulator: change debug statement be consistent with the style of the rest
Revert "arm: mach-u300/gpio: Fix mem_region resource size miscalculations"
audit: acquire creds selectively to reduce atomic op overhead
rtlwifi: don't touch with treewide double semicolon removal
treewide: cleanup continuations and remove logging message whitespace
ath9k_hw: don't touch with treewide double semicolon removal
include/linux/leds-regulator.h: fix syntax in example code
tty: fix typo in descripton of tty_termios_encode_baud_rate
xtensa: remove obsolete BKL kernel option from defconfig
m68k: fix comment typo 'occcured'
arch:Kconfig.locks Remove unused config option.
treewide: remove extra semicolons
...
* 'kvm-updates/2.6.40' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (131 commits)
KVM: MMU: Use ptep_user for cmpxchg_gpte()
KVM: Fix kvm mmu_notifier initialization order
KVM: Add documentation for KVM_CAP_NR_VCPUS
KVM: make guest mode entry to be rcu quiescent state
KVM: x86 emulator: Make jmp far emulation into a separate function
KVM: x86 emulator: Rename emulate_grpX() to em_grpX()
KVM: x86 emulator: Remove unused arg from emulate_pop()
KVM: x86 emulator: Remove unused arg from writeback()
KVM: x86 emulator: Remove unused arg from read_descriptor()
KVM: x86 emulator: Remove unused arg from seg_override()
KVM: Validate userspace_addr of memslot when registered
KVM: MMU: Clean up gpte reading with copy_from_user()
KVM: PPC: booke: add sregs support
KVM: PPC: booke: save/restore VRSAVE (a.k.a. USPRG0)
KVM: PPC: use ticks, not usecs, for exit timing
KVM: PPC: fix exit accounting for SPRs, tlbwe, tlbsx
KVM: PPC: e500: emulate SVR
KVM: VMX: Cache vmcs segment fields
KVM: x86 emulator: consolidate segment accessors
KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions
...
Linux doesn't use USPRG0 (now renamed VRSAVE in the architecture, even
when Altivec isn't involved), but a guest might.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Convert to microseconds when displaying
(with fix from Bharat Bhushan <Bharat.Bhushan@freescale.com>).
This reduces rounding error with large quantities of short exits.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The exit type setting for mfspr/mtspr is moved from 44x to toplevel SPR
emulation. This enables it on e500, and makes sure that all SPRs
are covered.
Exit accounting for tlbwe and tlbsx is added to e500.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Return the actual host SVR for now, as we already do for PVR. Eventually
we may support Qemu overriding PVR/SVR if the situation is appropriate,
once we implement KVM_SET_SREGS on e500.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (45 commits)
crypto: caam - add support for sha512 variants of existing AEAD algorithms
crypto: caam - remove unused authkeylen from caam_ctx
crypto: caam - fix decryption shared vs. non-shared key setting
crypto: caam - platform_bus_type migration
crypto: aesni-intel - fix aesni build on i386
crypto: aesni-intel - Merge with fpu.ko
crypto: mv_cesa - make count_sgs() null-pointer proof
crypto: mv_cesa - copy remaining bytes to SRAM only when needed
crypto: mv_cesa - move digest state initialisation to a better place
crypto: mv_cesa - fill inner/outer IV fields only in HMAC case
crypto: mv_cesa - refactor copy_src_to_buf()
crypto: mv_cesa - no need to save digest state after the last chunk
crypto: mv_cesa - print a warning when registration of AES algos fail
crypto: mv_cesa - drop this call to mv_hash_final from mv_hash_finup
crypto: mv_cesa - the descriptor pointer register needs to be set just once
crypto: mv_cesa - use ablkcipher_request_cast instead of the manual container_of
crypto: caam - fix printk recursion for long error texts
crypto: caam - remove unused keylen from session context
hwrng: amd - enable AMD hw rnd driver for Maple PPC boards
hwrng: amd - manage resource allocation
...
Commit 69e3cea8d5 ("powerpc/smp: Make start_secondary_resume
available to all CPU variants") introduced start_secondary_resume to
misc_32.S, however it uses a 64-bit instruction which is not valid on
32-bit platforms. Use 'stw' instead.
Reported-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
macvlan: fix panic if lowerdev in a bond
tg3: Add braces around 5906 workaround.
tg3: Fix NETIF_F_LOOPBACK error
macvlan: remove one synchronize_rcu() call
networking: NET_CLS_ROUTE4 depends on INET
irda: Fix error propagation in ircomm_lmp_connect_response()
irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
be2net: Kill set but unused variable 'req' in lancer_fw_download()
irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
isdn: capi: Use pr_debug() instead of ifdefs.
tg3: Update version to 3.119
tg3: Apply rx_discards fix to 5719/5720
...
Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (152 commits)
powerpc: Fix hard CPU IDs detection
powerpc/pmac: Update via-pmu to new syscore_ops
powerpc/kvm: Fix the build for 32-bit Book 3S (classic) processors
powerpc/kvm: Fix kvmppc_core_pending_dec
powerpc: Remove last piece of GEMINI
powerpc: Fix for Pegasos keyboard and mouse
powerpc: Make early memory scan more resilient to out of order nodes
powerpc/pseries/iommu: Cleanup ddw naming
powerpc/pseries/iommu: Find windows after kexec during boot
powerpc/pseries/iommu: Remove ddw property when destroying window
powerpc/pseries/iommu: Add additional checks when changing iommu mask
powerpc/pseries/iommu: Use correct return type in dupe_ddw_if_already_created
powerpc: Remove unused/obsolete CONFIG_XICS
misc: Add CARMA DATA-FPGA Programmer support
misc: Add CARMA DATA-FPGA Access Driver
powerpc: Make IRQ_NOREQUEST last to clear, first to set
powerpc: Integrated Flash controller device tree bindings
powerpc/85xx: Create dts of each core in CAMP mode for P1020RDB
powerpc/85xx: Fix PCIe IDSEL for Px020RDB
powerpc/85xx: P2020 DTS: re-organize dts files
...
Commit e66eed651f ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.
So this fixes things up a bit, using
grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')
to guide us in finding files that either need <linux/prefetch.h>
inclusion, or have it despite not needing it.
There are more of them around (mostly network drivers), but this gets
many core ones.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sRIO controller reports errors to the core with one signal, it uses
register EPWISR to provides the core quick access to where the error
occurred. The EPWISR indicates that there are 4 interrupts sources,
port1, port2, message unit and port write receive, but the sRIO driver
does not support port2 for now, still the handler takes care of port2.
Currently the handler only clear error status without any recovery.
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add support for machine_check support into machine_check_e500 and
machine_check_e500mc.
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Simultaneous FCM and GPCM or UPM operation may erroneously trigger
bus monitor timeout.
Set the local bus monitor timeout value to the maximum by setting
LBCR[BMT] = 0 and LBCR[BMTPS] = 0xF.
Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
commit 9d07bc841c
"powerpc: Properly handshake CPUs going out of boot spin loop"
Would cause a miscalculation of the hard CPU ID. It removes breaking
out of the loop when finding a match with a processor, thus the "i"
used as an index in the intserv array is always incorrect
This broke interrupt on my PowerMac laptop.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi()
call to arch/powerpc/platforms/cell/interrupt.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commits a5d4f3ad3a ("powerpc: Base support for exceptions using
HSRR0/1") and 673b189a2e ("powerpc: Always use SPRN_SPRG_HSCRATCH0
when running in HV mode") cause compile and link errors for 32-bit
classic Book 3S processors when KVM is enabled. This fixes these
errors.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The vcpu->arch.pending_exceptions field is a bitfield indexed by
interrupt priority number as returned by kvmppc_book3s_vec2irqprio.
However, kvmppc_core_pending_dec was using an interrupt vector shifted
by 7 as the bit index. Fix it to use the irqprio value for the
decrementer interrupt instead. This problem was found by code
inspection.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (44 commits)
debugfs: Silence DEBUG_STRICT_USER_COPY_CHECKS=y warning
sysfs: remove "last sysfs file:" line from the oops messages
drivers/base/memory.c: fix warning due to "memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION"
memory hotplug: Speed up add/remove when blocks are larger than PAGES_PER_SECTION
SYSFS: Fix erroneous comments for sysfs_update_group().
driver core: remove the driver-model structures from the documentation
driver core: Add the device driver-model structures to kerneldoc
Translated Documentation/email-clients.txt
RAW driver: Remove call to kobject_put().
reboot: disable usermodehelper to prevent fs access
efivars: prevent oops on unload when efi is not enabled
Allow setting of number of raw devices as a module parameter
Introduce CONFIG_GOOGLE_FIRMWARE
driver: Google Memory Console
driver: Google EFI SMI
x86: Better comments for get_bios_ebda()
x86: get_bios_ebda_length()
misc: fix ti-st build issues
params.c: Use new strtobool function to process boolean inputs
debugfs: move to new strtobool
...
Fix up trivial conflicts in fs/debugfs/file.c due to the same patch
being applied twice, and an unrelated cleanup nearby.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
sched: Fix and optimise calculation of the weight-inverse
sched: Avoid going ahead if ->cpus_allowed is not changed
sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
sched: Remove unused parameters from sched_fork() and wake_up_new_task()
sched: Shorten the construction of the span cpu mask of sched domain
sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
sched: Remove unused 'this_best_prio arg' from balance_tasks()
sched: Remove noop in alloc_rt_sched_group()
sched: Get rid of lock_depth
sched: Remove obsolete comment from scheduler_tick()
sched: Fix sched_domain iterations vs. RCU
sched: Next buddy hint on sleep and preempt path
sched: Make set_*_buddy() work on non-task entities
sched: Remove need_migrate_task()
sched: Move the second half of ttwu() to the remote cpu
sched: Restructure ttwu() some more
sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
sched: Remove rq argument from ttwu_stat()
sched: Remove rq->lock from the first half of ttwu()
sched: Drop rq->lock from sched_exec()
...
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix rt_rq runtime leakage bug
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (34 commits)
PM: Introduce generic prepare and complete callbacks for subsystems
PM: Allow drivers to allocate memory from .prepare() callbacks safely
PM: Remove CONFIG_PM_VERBOSE
Revert "PM / Hibernate: Reduce autotuned default image size"
PM / Hibernate: Add sysfs knob to control size of memory for drivers
PM / Wakeup: Remove useless synchronize_rcu() call
kmod: always provide usermodehelper_disable()
PM / ACPI: Remove acpi_sleep=s4_nonvs
PM / Wakeup: Fix build warning related to the "wakeup" sysfs file
PM: Print a warning if firmware is requested when tasks are frozen
PM / Runtime: Rework runtime PM handling during driver removal
Freezer: Use SMP barriers
PM / Suspend: Do not ignore error codes returned by suspend_enter()
PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
OMAP1 / PM: Use generic clock manipulation routines for runtime PM
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
...
It seems that Adrian is getting old. He removed almost everything of
GEMINI in commit c53653130 ("[POWERPC] Remove the broken Gemini
support") except this piece.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[See http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-October/086424.html
and followups. Part of the commit message is directly copied from that.]
Commit 540c6c392f tries to find i8042 IRQs in
the device-tree but doesn't fall back to the old hardcoded 1 and 12 in all
failure cases.
Specifically, the case where the device-tree contains nothing matching
pnpPNP,303 or pnpPNP,f03 doesn't seem to be handled well. It sort of falls
through to the old code, but leaves the IRQs set to 0.
Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We keep track of the size of the lowest block of memory and call
setup_initial_memory_limit() only after we've parsed them all
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Milton Miller <miltonm@bga.com>
When using a property refering to the availibily of dynamic dma windows
call it ddw_avail not ddr_avail.
dupe_ddw_if_already_created does not dupilcate anything, it only finds
and reuses the windows we already created, so rename it to
find_existing_ddw. Also, it does not need the pci device node, so
remove that argument.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the discovery of windows previously setup from when the pci driver
calls set_dma_mask to an arch_initcall.
When kexecing into a kernel with dynamic dma windows allocated, we need
to find the windows early so that memory hot remove will be able to
delete the tces mapping the to be removed memory and memory hotplug add
will map the new memory into the window. We should not wait for the
driver to be loaded and the device to be probed. The iommu init hooks
are before kmalloc is setup, so defer to arch_initcall.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we destroy the window, we need to remove the property recording that
we setup the window. Otherwise the next kernel we kexec will be
confused.
Also we should remove the property if even if we don't find the
ibm,ddw-applicable window or if one of the property sizes is unexpected;
presumably these came from a prior kernel via kexec, and we will not be
maintaining the window with respect to memory hotplug.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Do not check dma supported until we have chosen the right dma ops.
Check that the device is pci before treating it as such.
Check the mask is supported by the selected dma ops before
committing it.
We only need to set iommu ops if it is not the current ops; this
avoids searching the tree for the iommu table unnecessarily.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Otherwise we get silent truncations.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Milton Miller <miltonm@bga.com>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When creating an irq, don't allow a concurent driver request until
we have caled map, which will likley call set_chip_and_handler to
change the irq_chip and its operations.
Similarly, when tearing down an IRQ, make sure no new uses come
along while we change the irq back to the nop chip and then reset
the descriptor to freed status.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create the dts files for each core and splits the devices between the two
cores for P1020RDB.
Core0 has core0 to have memory, l2, i2c, spi, gpio, tdm, dma, usb, eth1,
eth2, sdhc, crypto, global-util, message, pci0, pci1, msi.
Core1 has l2, eth0, crypto.
MPIC is shared between two cores but each core will protect its interrupts
from other core by using "protected-sources" of mpic.
Fix compatible property for global-util node of P1020si.dtsi.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
PCIe device in legacy mode can trigger interrupts using the wires #INTA,
#INTB ,#INTC and #INTD. PCI devices are obligated to use #INTx for
interrupts under legacy mode. Each PCI slot or device is typically wired
to different inputs on the interrupt controller.
So, Define interrupt-map and interrupt-map-mask properties for device tree
to of map each PCI interrupt signal to the inputs of the interrupt
controller.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Creates P2020si.dtsi, containing information for P2020 SoC. Modifies dts
files for P2020 based systems to use dtsi file.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Creates P1020si.dtsi, containing information for the P1020 SoC. Modifies dts
files for P1020 based systems to use dtsi file
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Acked-by: Grant Likely <grant.likelY@secretlab.ca>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This debug option has no overhead other than a slight increase in
kernel size, and makes bug reports more useful. While some end users
may prefer to save the space, as a default on a kernel config aimed
primarily at development on reference boards, it should be enabled.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Even though support for the p5020's on-chip ethernet is not yet upstream,
it is not appropriate to disable all networking support (including
loopback, unix domain sockets, external ethernet devices, etc) in the
defconfig. The networking settings are taken from mpc85xx_smp_defconfig,
minus the drivers for ethernet devices not found on any current e5500
chip.
The other changes are the result of running "make savedefconfig".
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add support for MPIC timers as requestable interrupt sources.
Based on http://patchwork.ozlabs.org/patch/20941/ by Dave Liu.
Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
There is no hardware interrupt 0xf7. But now we can express the timer
interrupt using 4-cell interrupts. This requires converting all of the
other interrupt specifiers in the tree as well.
Also add the second timer group, and fix the reg property to only
describe the timer registers.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
If the video mode is set to 16-, 24-, or 32-bit pixels, then the pixel data
contains actual levels of red, blue, and green. However, if the video mode
is set to 8-bit pixels, then the 8-bit value represents an index into color
table. This is called "palette mode" on the Freescale DIU video controller.
The DIU driver does not currently support palette mode, but the MPC8610 HPCD
board file returned a non-zero (although incorrect) pixel format value for
8-bit mode.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
It may trigger a warning in fs/proc/generic.c:__xlate_proc_name() when
trying to add an entry for the interrupt handler to sysfs.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Without this, we attempt to use doorbells for IPIs, and end up
branching to some bad address. Plus, even for the exceptions
we don't implement, it's good to handle it and get a message out.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The only references to the irq_map[].host field are internal to
arch/powerpc/kernel/irq.c
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some irq_host implementations are using virq_to_host to check if
they are the irq_host for a virtual irq. To allow us to make space
versus time tradeoffs, replace this usage with an assertive
virq_is_host that confirms or denies the irq is associated with the
given irq_host.
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Instead of checking for rogue msi numbers via the irq_map host field
set the chip_data to h.host_data (which is the msic struct pointer)
at map and compare it in get_irq.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Building on Grant's efforts to remove the irq_map array, this patch
moves spider-pics use of virq_to_host() to use irq_data_get_chip_data
and sets the irq chip data in the map call, like most other interrupt
controllers in powerpc.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It was called from irq_create_mapping if that was called for a host
and hwirq that was previously mapped, "to update the flags". But the
only implementation was in beat_interrupt and all it did was repeat a
hypervisor call without error checking that was performed with error
checking at the beginning of the map hook. In addition, the comment on
the beat remap hook says it will only called once for a given mapping,
which would apply to map not remap.
All flags should be known by the time the match hook is called, before
we call the map hook. Removing this mostly unused hook will simpify
the requirements of irq_domain concept.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a dummy irq_host using the generic dummy irq chip for the secondary
cpus to use. Create a direct irq mapping for the ipi and register the
ipi action handler against it. If for some unlikely reason part of this
fails then don't detect the secondary cpus.
This removes another instance of NO_IRQ_IGNORE, records the ipi stats
for the secondary cpus, and runs the ipi on the interrupt stack.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If none of irq category bits were set mpc52xx_get_irq() would pass
NO_IRQ_IGNORE (-1) to irq_linear_revmap, which does an unsigned compare
and declares the interrupt above the linear map range. It then punts
to irq_find_mapping, which performs a linear search of all irqs,
which will likely miss and only then return NO_IRQ.
If no status bit is set, then we should return NO_IRQ directly.
The interrupt should not be suppressed from spurious counting, in fact
that is the definition of supurious.
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As NO_IRQ_IGNORE is only used between the static function cpld_pic_get_irq
and its caller cpld_pic_cascade, and cpld_pic_cascade only uses it to
suppress calling handle_generic_irq, we can change these uses to NO_IRQ
and remove the extra tests and pathlength in cpld_pic_cascade.
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
handler_data should be reserved for flow handlers on the dependent
irq, not consumed by the parent irq code that is part of the irq_chip
code. The msi_data pointer was already set in msidesc->irqhost->hostdata
and being copied to irq_data->chipdata in the msidesc->irqhost->map()
method called via create_irq_mapping, so we can obtain the pointer
from there and free the instance it in teardown_msi_irqs.
Also remove the unnecessary cast of irq_get_handler_data in the
cascade handler, which is the demux flow handler of the parent
msi interrupt. (This is the expected usage for handler_data).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The msi platform device driver was abusing dev.platform_data for its
platform_driver_data. Use the correct pointer for storage.
Platform_data is supposed to be for platforms to communicate to drivers
parameters that are not otherwise discoverable. Its lifetime matches
the platform_device not the platform device driver. It is generally
not needed for drivers that only support systems with device trees.
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It was never called because the host is always IRQ_HOST_MAP_LEGACY.
And what it purported to do was mask the interrupt (which will already
have happend if we shutdown the interrupt), then synchronise_irq and
clear the chip pointer, both of which will have been be done by the
caller were we to call unmap on a legacy irq.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These all just clear chip or chipdata fields, which will be done
by the generic code when we call irq_free_descs.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If for some reason the code incrorectly calls the wrong function to
manage the revmap, not only should we warn, we should take action.
However, in the paths we expect to be taken every delivered interrupt
change to WARN_ON_ONCE. Use the if (WARN_ON(x)) format to get the
unlikely for free.
Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since the generic irq code uses a radix tree for sparse interrupts,
the initcall ordering has been changed to initialize radix trees before
irqs. We no longer need to defer creating revmap radix trees to the
arch_initcall irq_late_init.
Also, the kmem caches are allocated so we don't need to use
zalloc_maybe_bootmem.
Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since we already have a special case in map to set the ipi handler, use
the desired flow.
If we don't find an ics to handle the interrupt complain instead of
returning 0 without having set a chip or handler.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since there are only 4 messages, we can replace the atomic bit set
(which uses atomic load reserve and store conditional sequence) with
a byte stores to seperate bytes. We still have to perform a load
reserve and store conditional sequence to avoid loosing messages on
reception but we can do that with a single call to xchg.
The do {} while and __BIG_ENDIAN specific mask testing was chosen by
looking at the generated asm code. On gcc-4.4, the bit masking becomes
a simple bit mask and test of the register returned from xchg without
storing and loading the value to the stack like attempts with a union
of bytes and an int (or worse, loading single bit constants from the
constant pool into non-voliatle registers that had to be preseved on
the stack). The do {} while avoids an unconditional branch to the
end of the loop to test the entry / repeat condition of a while loop
and instead optimises for the expected single iteration of the loop.
We have a full mb() at the beginning to cover ordering between send,
ipi, and receive so we can use xchg_local and forgo the further
acquire and release barriers of xchg.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Compile the new smp ipi mux and demux code only if a platform
will make use of it. The new config is selected as required.
The new cause_ipi smp op is only available conditionally to point out
configs where the select is required; this makes setting the op an
immediate fail instead of a deferred unresolved symbol at link.
This also creates a new config for power surge powermac upgrade support
that can be disabled in expert mode but is default on.
I also removed the depends / default y on CONFIG_XICS since it is selected
by PSERIES.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Consolidate the mux and demux of ipi messages into smp.c and call
a new smp_ops callback to actually trigger the ipi.
The powerpc architecture code is optimised for having 4 distinct
ipi triggers, which are mapped to 4 distinct messages (ipi many, ipi
single, scheduler ipi, and enter debugger). However, several interrupt
controllers only provide a single software triggered interrupt that
can be delivered to each cpu. To resolve this limitation, each smp_ops
implementation created a per-cpu variable that is manipulated with atomic
bitops. Since these lines will be contended they are optimialy marked as
shared_aligned and take a full cache line for each cpu. Distro kernels
may have 2 or 3 of these in their config, each taking per-cpu space
even though at most one will be in use.
This consolidation removes smp_message_recv and replaces the single call
actions cases with direct calls from the common message recognition loop.
The complicated debugger ipi case with its muxed crash handling code is
moved to debug_ipi_action which is now called from the demux code (instead
of the multi-message action calling smp_message_recv).
I put a call to reschedule_action to increase the likelyhood of correctly
merging the anticipated scheduler_ipi() hook coming from the scheduler
tree; that single required call can be inlined later.
The actual message decode is a copy of the old pseries xics code with its
memory barriers and cache line spacing, augmented with a per-cpu unsigned
long based on the book-e doorbell code. The optional data is set via a
callback from the implementation and is passed to the new cause-ipi hook
along with the logical cpu number. While currently only the doorbell
implemntation uses this data it should be almost zero cost to retrieve and
pass it -- it adds a single register load for the argument from the same
cache line to which we just completed a store and the register is dead
on return from the call. I extended the data element from unsigned int
to unsigned long in case some other code wanted to associate a pointer.
The doorbell check_self is replaced by a call to smp_muxed_ipi_resend,
conditioned on the CPU_DBELL feature. The ifdef guard could be relaxed
to CONFIG_SMP but I left it with BOOKE for now.
Also, the doorbell interrupt vector for book-e was not calling irq_enter
and irq_exit, which throws off cpu accounting and causes code to not
realize it is running in interrupt context. Add the missing calls.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I can't see any reason these functions are needed by machdep.h
and they are all hidden by CONFIG_SMP with no UP alternative.
Also move the declarations for the fallback timebase ops, which
are used to fill in the smp ops.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have no idea if the beat hypervisor supports multiple cpus in
a partition, but the code has not been touched since these stubs
were added in February of 2007 except to move them in April of 2008.
These are stubs: start_cpu always returns fail (which is dropped),
the message passing and reciving are empty functions, and the top
of file comment says "Incomplete".
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Replace all remaining callers of alloc_maybe_bootmem with
zalloc_maybe_bootmem. The callsite in pci_dn is followed with a
memset to clear the memory, and not zeroing at the other callsites
in the celleb fake pci code could lead to following uninitialized
memory as pointers or even freeing said pointers on error paths.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Its unused, and of the three declarations, one is duplicated in pmac.h,
the second is static and the third is renamed and static.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that MSG_ALL and MSG_ALL_BUT_SELF have been eliminated,
smp_mpic_mesage_pass no longer needs to lookup the cpumask just to
have mpic_send_ipi extract part of it and recode it in a NR_CPUS loop
by mpic_physmask.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that smp_ops->smp_message_pass is always called with an (online) cpu
number for the target remove the checks for MSG_ALL and MSG_ALL_BUT_SELF.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The only user of MSG_ALL_BUT_SELF in the whole kernel tree is powerpc,
and it only uses it to start the debugger. Both debuggers always call
smp_send_debugger_break with MSG_ALL_BUT_SELF, and only mpic can do
anything more optimal than a loop over all online cpus, but all message
passing implementations have to code for this special delivery target.
Convert smp_send_debugger_break to take void and loop calling the smp_ops
message_pass function for each of the other cpus in the online cpumask.
Use raw_smp_processor_id() because we are either entering the debugger
or trying to start kdump and the additional warning it not useful were
it to trigger.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
mpic_set_affinity is allocating and freeing a cpumask var even though
it was breaking the cpumask abstraction when passing the mask to
mpic_physmask. It also didn't have any check for allocatin failure.
Break the cpumask abstraction earlier and use simple bitwise and of the
bits from the mask with the bits of cpu_online_mask.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
mpic_physmask was looping NR_CPUS times over a mask that was passed as
a u32. Since mpic is architecturaly limited to 32 physical cpus, clamp
the logical cpus to 32 when compiling (we could also clamp at runtime
to nr_cpu_ids).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
c1854e0072 (powerpc: Set nr_cpu_ids early
and use it to free PACAs) copied the formerly static setup_nr_cpu_ids
from init/main.c but 34db18a054 (smp:
move smp setup functions to kernel/smp.c) moved it to kernel/smp.c
with a declaration in include/linux/smp.h, so we can call it instead of
replicating it.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we never set a cpu above nr_cpu_ids possible we can
limit our initial paca allocation to nr_cpu_ids. We can then
clamp the number of cpus in platforms/iseries/setup.c.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We should not set cpus above nr_cpu_ids to possible. While we
will trigger a warning with CONFIG_CPUMASK_DEBUG, even then the mask
initializers will set the bits beyond what the iterators check and cause
nr_cpu_ids to increase.
Respecting nr_cpu_ids during setup will allow us to use it in our initial
paca allocation. It can be reduced from NR_CPUS by the existing early param
nr_cpus=, which was added in 2b633e3fac (smp:
Use nr_cpus= to set nr_cpu_ids early). We already call parse_early_parms
between finding the command line and allocating the pacas.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
9cb82f2f46 (Make iSeries spin on
__secondary_hold_spinloop, like pSeries) added a load of current_set
but this load was repeated later and we don't even have the paca yet.
It also checked __secondary_hold_spinloop with a 32 bit compare instead
of a 64 bit compare.
b6f6b98a4e (Don't spin on sync instruction
at boot time) missed the copy of the startup code in iseries.
1426d5a3bd (Dynamically allocate pacas)
doesn't allow for pacas to be less than lppacas and recalculated the paca
location from the cpu id in r0 every time through the secondary loop.
Various revisions over time made the comments on conditional branches
confusing with respect to being a hold loop or forward progress
Mostly in-order description of the changes:
Replicate the few lines of code saved by the ugly scoped ifdef CONFIG_SMP
in the secondary loop between yielding on UP and marking time with the
hypervisor on SMP. Always compile the iseries_secondary_yield loop and
use it if the cpu id is above nr_cpu_ids. Change all forward progress
paths to be forward branches to the next numerical label. Assign a
label to all loops. Move all sync instructions from the loops to the
forward progress path. Wait to load current_set until paca is set to go.
Move the iseries_secondary_smp_loop label to cover the whole spin loop.
Add HMT_MEDIUM when we make forward progress.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Starting with 1426d5a3bd (powerpc:
Dynamically allocate pacas) the space for pacas beyond cpu_possible
is freed, but we failed to update the loop in crash.c.
Since c1854e0072 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) the number of pacas allocated is
always nr_cpu_ids.
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Starting with 1426d5a3bd (powerpc:
Dynamically allocate pacas) we free the memory for pacas beyond
cpu_possible, but we failed to update the loop the secondary cpus use
to find their paca. If the system has running cpu threads for which
the kernel did not allocate a paca for they will search the memory that
was freed. For instance this could happen when the device tree for
a kdump kernel was not updated after a cpu hotplug, or the kernel is
running with more cpus than the kernel was configured.
Since c1854e0072 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) we set nr_cpu_ids before telling the
cpus to advance, so use that to limit the search.
We can't reference nr_cpu_ids without CONFIG_SMP because it is defined
as 1 instead of a memory location, but any extra threads should be sent
to kexec_wait in that case anyways, so make that explicit and remove
the search loop for UP.
Note to stable: The fix also requires
c1854e0072 (powerpc: Set
nr_cpu_ids early and use it to free PACAs) to function. Also
9d07bc841c (Properly handshake CPUs going
out of boot spin loop) affects the second chunk, specifically the branch
target was 3b before and is 4b after that patch, and there was a blank
line before the #ifdef CONFIG_SMP that was removed
Cc: <stable@kernel.org> # .34.x: c1854e0072 powerpc: Set nr_cpu_ids early
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 1fc711f7ff (powerpc/kexec: Fix race
in kexec shutdown) moved the write to signal the cpu had exited the kernel
from before the transition to real mode in kexec_smp_wait to kexec_wait.
Unfornately it missed that kexec_wait is used both by cpus leaving the
kernel and by secondary slave cpus that were not allocated a paca for
what ever reason -- they could be beyond nr_cpus or not described in
the current device tree for whatever reason (for example, kexec-load
was not refreshed after a cpu hotplug operation). Cpus coming through
that path they will write to paca[NR_CPUS] which is beyond the space
allocated for the paca data and overwrite memory not allocated to pacas
but very likely still real mode accessable).
Move the write back to kexec_smp_wait, which is used only by cpus that
found their paca, but after the transition to real mode.
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # (1fc711f was backported to 2.6.32)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I have a report of an FWNMI with an r3 value that we think is
corrupt, but since we don't print r3 we have no idea what was
wrong with it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we swtich to direct dma ops, we set the dma data union to have the
dma offset. When we switch back to iommu table ops because of a later
dma_set_mask, we need to restore the iommu table pointer. Without this
change, crashes have been observed on kexec where (for reasons still
being investigated) we fall back to a 32-bit dma mask on a particular
device and then panic because the table pointer is not valid.
The easiset way to find this value is to call
pci_dma_dev_setup_pSeriesLP which will search up the pci tree until it
finds the node with the table.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a confusing number of ioremap functions. Make things just a
bit simpler by merging ioremap_flags and ioremap_prot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add ioremap_wc so drivers can request write combining on kernel
mappings.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After looking at our system call path, Mary Brown suggested that we
should put all mfspr SRR* instructions before any mtspr SRR*.
To test this I used a very simple null syscall (actually getppid)
testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c
I tested with the following changes against the pseries_defconfig:
CONFIG_VIRT_CPU_ACCOUNTING=n
CONFIG_AUDIT=n
to remove the overhead of virtual CPU accounting and syscall
auditing.
POWER6:
baseline: mean = 757.2 cycles sd = 2.108
modified: mean = 759.1 cycles sd = 2.020
POWER7:
baseline: mean = 411.4 cycles sd = 0.138
modified: mean = 404.1 cycles sd = 0.109
So we have 1.77% improvement on POWER7 which looks significant. The
POWER6 suggest a 0.25% slowdown, but the results are within 1
standard deviation and may be in the noise.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A static branch hint will override dynamic branch prediction on
recent POWER CPUs. Since we are about to use more altivec in the
kernel remove the static hint in giveup_altivec that assumes
a userspace task is using altivec.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To make it easier to add optimised versions of copy_page, remove
the 4kB loop for 64kB pages and just do all the work in copy_page.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enable iSCSI support for a number of cards. We had the base
networking devices enabled but forgot to enable iSCSI.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Enable the Qlogic and Emulex 10Gbit adapters.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The variable 'old' is set but not used in the wrprotect functions in
arch/powerpc/include/asm/pgtable-ppc64.h, which can trigger a compiler warning.
Remove the variable, since it's not used anyway.
Signed-off-by: Stratos Psomadakis <psomas@ece.ntua.gr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Future releases of fimrware will enforce a requirement that DTL buffers
do not cross a 4k boundary. Commit
127493d5dc satisfies this requirement for
CONFIG_VIRT_CPU_ACCOUNTING=y kernels, but if !CONFIG_VIRT_CPU_ACCOUNTING
&& CONFIG_DTL=y, the current code will fail at dtl registration time.
Fix this by making the kmem cache from
127493d5dc visible outside of setup.c and
using the same cache in both dtl.c and setup.c. This requires a bit of
reorganization to ensure ordering of the kmem cache and buffer
allocations.
Note: Since firmware now limits the size of the buffer, I made
dtl_buf_entries read-only in debugfs.
Tested with upcoming firmware with the 4 combinations of
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_DTL.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we kexec we look for a particular property added by the first
kernel, "linux,direct64-ddr-window-info", per-device where we already
have set up dynamic dma windows. The current code, though, wasn't
initializing the size of this property and thus when we kexec'd, we
would find the property but read uninitialized memory resulting in
garbage ddw values for the kexec'd kernel and panics. Fix this by
setting the size at enable_ddw() time and ensuring that the size of the
found property is valid at dupe_ddw_if_kexec() time.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The timestamps recorded in the .gz files add no value.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit c56e58537d breaks SMP support in PPC_47x chip.
secondary_ti must be set to current thread info before callin kick_cpu or else
start_secondary_47x will jump into void when trying to return to c-code.
In the current setup secondary_ti is initialized before the CPU idle task is started
and only the boot core will start. I am not sure this is the correct solution, but it
makes SMP possible in my chip.
Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
available in head_44x.S - start_secondary_resume?
Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit b987812b3f left
crash_kexec_wait_realmode() undefined for UP.
Commit 7c7a81b53e defined it for UP but
left it undefined for 32-bit SMP.
Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.
Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time. This was unsafe
because matching is not an atomic operation with probing a driver. If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.
This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* syscore:
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
ARM / Samsung: Use struct syscore_ops for "core" power management
ARM / PXA: Use struct syscore_ops for "core" power management
ARM / SA1100: Use struct syscore_ops for "core" power management
ARM / Integrator: Use struct syscore_ops for core PM
ARM / OMAP: Use struct syscore_ops for "core" power management
ARM: Use struct syscore_ops instead of sysdevs for PM in common code
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.
This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.
So it's time to delete the line. This is good as we need all the space
we can get for oops messages at times on consoles.
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make some PowerPC architecture's code use struct syscore_ops
objects for power management instead of sysdev classes and sysdevs.
This simplifies the code and reduces the kernel's memory footprint.
It also is necessary for removing sysdevs from the kernel entirely in
the future.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch drops the reference to a global 'cmd_line' variable from
early_init_dt_scan_chosen, and instead passes the pointer to the command
line string via the *data argument. Each architecture does something
slightly different with the initial command line, so it makes sense for
the architecture to be able to specify the variable name.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Following dump is observed on host when clearing the exit timing counters
[root@p1021mds kvm]# echo -n 'c' > vm1200_vcpu0_timing
INFO: task echo:1276 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
echo D 0ff5bf94 0 1276 1190 0x00000000
Call Trace:
[c2157e40] [c0007908] __switch_to+0x9c/0xc4
[c2157e50] [c040293c] schedule+0x1b4/0x3bc
[c2157e90] [c04032dc] __mutex_lock_slowpath+0x74/0xc0
[c2157ec0] [c00369e4] kvmppc_init_timing_stats+0x20/0xb8
[c2157ed0] [c0036b00] kvmppc_exit_timing_write+0x84/0x98
[c2157ef0] [c00b9f90] vfs_write+0xc0/0x16c
[c2157f10] [c00ba284] sys_write+0x4c/0x90
[c2157f40] [c000e320] ret_from_syscall+0x0/0x3c
The vcpu->mutex is used by kvm_ioctl_* (KVM_RUN etc) and same was
used when clearing the stats (in kvmppc_init_timing_stats()). What happens
is that when the guest is idle then it held the vcpu->mutx. While the
exiting timing process waits for guest to release the vcpu->mutex and
a hang state is reached.
Now using seprate lock for exit timing stats.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We make use of ptrace_get_breakpoints() / ptrace_put_breakpoints() to
protect ptrace_set_debugreg() even if CONFIG_HAVE_HW_BREAKPOINT if off.
However in this case, these APIs are not implemented.
To fix this, push the protection down inside the relevant ifdef.
Best would be to export the code inside
CONFIG_HAVE_HW_BREAKPOINT into a standalone function to cleanup
the ifdefury there and call the breakpoint ref API inside. But
as it is more invasive, this should be rather made in an -rc1.
Fixes this build error:
arch/powerpc/kernel/ptrace.c:1594: error: implicit declaration of function 'ptrace_get_breakpoints' make[2]: ***
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LPPC <linuxppc-dev@lists.ozlabs.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: v2.6.33.. <stable@kernel.org>
Link: http://lkml.kernel.org/r/1304639598-4707-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add a platform for the Wire Speed Processor, based on the PPC A2.
This includes code for the ICS & OPB interrupt controllers, as well
as a SCOM backend, and SCOM based cpu bringup.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For adapters which have devices under a PCIe switch/bridge it is informative
to display information for both the PCIe switch/bridge and the device on
which the bus error was detected.
rebased to powerpc-next
Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
slb0_limit() wasn't a very descriptive name. This changes it along with
a comment explaining what it's used for, and provides a 64-bit BookE
implementation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for handling IO Event interrupts which come
through at the /event-sources/ibm,io-events device tree node.
The interrupts come through ibm,io-events device tree node are generated
by the firmware to report IO events. The firmware uses the same interrupt
to report multiple types of events for multiple devices. Each device may
have its own event handler. This patch implements a plateform interrupt
handler that is triggered by the IO event interrupts come through
ibm,io-events device tree node, pull in the IO events from RTAS and call
device event handlers registered in the notifier list.
Device event handlers are expected to use atomic_notifier_chain_register()
and atomic_notifier_chain_unregister() to register/unregister their
event handler in pseries_ioei_notifier_list list with IO event interrupt.
Device event handlers are responsible to identify if the event belongs
to the device event handler. The device event handle should return NOTIFY_OK
after the event is handled if the event belongs to the device event handler,
or NOTIFY_DONE otherwise.
Signed-off-by: Tseng-Hui (Frank) Lin <thlin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds definitions of non-IBM specific v6 extended log
definitions to rtas.h.
Signed-off-by: Tseng-Hui (Frank) Lin <tsenglin@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and
Anton's patch.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.
I wrote a microbenchmark to test the performance gains of using
this new syscall:
http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.
64B UDP
batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)
64B raw socket
batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)
We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.
[ Add sparc syscall entries. -DaveM ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fundamental reset is an optional reset type supported only by PCIe adapters.
Handle the unexpected case where a non-PCIe device has requested a
fundamental reset. Try hot-reset as a fallback to handle this case.
Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For multifunction adapters with a PCI bridge or switch as the device
at the Partitionable Endpoint(PE), if one or more devices below PE
sets dev->needs_freset, that value will be set for the PE device.
In other words, if any device below PE requires a fundamental reset
the PE will request a fundamental reset.
Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>