Commit d015fe995 'powerpc/cell/axon-msi: Retry on missing interrupt'
has turned a rare failure to kexec on QS22 into a reproducible
error, which we have now analysed.
The problem is that after a kexec, the MSIC hardware still points
into the middle of the old ring buffer. We set up the ring buffer
during reboot, but not the offset into it. On older kernels, this
would cause a storm of thousands of spurious interrupts after a
kexec, which would most of the time get dropped silently.
With the new code, we time out on each interrupt, waiting for
it to become valid. If more interrupts come in that we time
out on, this goes on indefinitely, which eventually leads to
a hard crash.
The solution in this commit is to read the current offset from
the MSIC when reinitializing it. This now works correctly, as
expected.
Reported-by: Dirk Herrendoerfer <d.herrendoerfer@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An earlier patch from Jens Osterkamp attempted to fix GDB
watchpoints by enabling the DABRX register at boot time.
Unfortunately, this did not work on SMP setups, where
secondary CPUs were still using the power-on DABRX value.
This introduces the same change for secondary CPUs on cell
as well.
Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Tested-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The MSI capture logic for the axon bridge can sometimes
lose interrupts in case of high DMA and interrupt load,
when it signals an MSI interrupt to the MPIC interrupt
controller while we are already handling another MSI.
Each MSI vector gets written into a FIFO buffer in main
memory using DMA, and that DMA access is normally flushed
by the actual interrupt packet on the IOIF. An MMIO
register in the MSIC holds the position of the last
entry in the FIFO buffer that was written. However,
reading that position does not flush the DMA, so that
we can observe stale data in the buffer.
In a stress test, we have observed the DMA to arrive
up to 14 microseconds after reading the register.
This patch works around this problem by retrying the
access to the FIFO buffer.
We can reliably detect the conditioning by writing
an invalid MSI vector into the FIFO buffer after
reading from it, assuming that all MSIs we get
are valid. After detecting an invalid MSI vector,
we udelay(1) in the interrupt cascade for up to
100 times before giving up.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, we can end up in an infinite loop if we get a signal
while the kernel has faulted in spufs_ps_fault. Eg:
alarm(1);
write(fd, some_spu_psmap_register_address, 4);
- the write's copy_from_user will fault on the ps mapping, and
signal_pending will be non-zero. Because returning from the fault
handler will never clear TIF_SIGPENDING, so we'll just keep faulting,
resulting in an unkillable process using 100% of CPU.
This change returns VM_FAULT_SIGBUS if there's a fatal signal pending,
letting us escape the loop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
This gets rid of this build warning:
arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'
This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes this error on Cell when CONFIG_KEXEC = n:
arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register'
We have to include <asm/kexec.h> because it contains the dummy
definition of crash_shutdown_register that is used when
CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in
that case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits)
Revert "powerpc: Sync RPA note in zImage with kernel's RPA note"
powerpc: Fix compile errors with CONFIG_BUG=n
powerpc: Fix format string warning in arch/powerpc/boot/main.c
powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen()
powerpc: Remove duplicate DMA entry from mpc8313erdb device tree
powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function
powerpc/mpic: Fix regression caused by change of default IRQ affinity
powerpc: Update remaining dma_mapping_ops to use map/unmap_page
powerpc/pci: Fix unmapping of IO space on 64-bit
powerpc/pci: Properly allocate bus resources for hotplug PHBs
OF-device: Don't overwrite numa_node in device registration
powerpc: Fix swapcontext system for VSX + old ucontext size
powerpc: Fix compiler warning for the relocatable kernel
powerpc: Work around ld bug in older binutils
powerpc/ppc64/kdump: Better flag for running relocatable
powerpc: Use is_kdump_kernel()
powerpc: Kexec exit should not use magic numbers
powerpc/44x: Update 44x defconfigs
powerpc/40x: Update 40x defconfigs
powerpc: enable heap randomization for linkstations
...
The Freescale implementation of MPIC only allows a single CPU destination
for non-IPI interrupts. We add a flag to the mpic_init to distinquish
these variants of MPIC. We pull in the irq_choose_cpu from sparc64 to
select a single CPU as the destination of the interrupt.
This is to deal with the fact that the default smp affinity was
changed by commit 1840475676 ("genirq:
Expose default irq affinity mask (take 3)") to be all CPUs.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After the merge of the 32 and 64bit DMA code, dma_direct_ops lost
their map/unmap_single() functions but gained map/unmap_page(). This
caused a problem for Cell because Cell's dma_iommu_fixed_ops called
the dma_direct_ops if the fixed linear mapping was to be used or the
iommu ops if the dynamic window was to be used. So in order to fix
this problem we need to update the 64bit DMA code to use
map/unmap_page.
First, we update the generic IOMMU code so that iommu_map_single()
becomes iommu_map_page() and iommu_unmap_single() becomes
iommu_unmap_page(). Then we propagate these changes up through all
the callers of these two functions and in the process update all the
dma_mapping_ops so that they have map/unmap_page rahter than
map/unmap_single. We can do this because on 64bit there is no HIGHMEM
memory so map/unmap_page ends up performing exactly the same function
as map/unmap_single, just taking different arguments.
This has no affect on drivers because the dma_map_single_attrs() just
ends up calling the map_page() function of the appropriate
dma_mapping_ops and similarly the dma_unmap_single_attrs() calls
unmap_page().
This fixes an oops on Cell blades, which oops on boot without this
because they call dma_direct_ops.map_single, which is NULL.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.
Not having these resources allocated causes an oops when removing
the PHB when we try to release them.
The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.
This updates the just added powerpc code to use it. This is needed
for the next commit, which will remove __kdump_flag.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The i2c bus defn is broken on linkstation / kurobox machines since at
least 2.6.27. Fix it. Also remove CONFIG_SERIAL_OF_PLATFORM, which, if
enabled, breaks the serial console after the
"console handover: boot [udbg0] -> real [ttyS1]" message.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Fix the HCU4 Kconfig option to 'default n'. We don't want the
board to always be enabled for other board defconfigs.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.
The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.
CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.
This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Most of the platforms were printing the size of the memory
in their show_cpuinfo implementations. This moves that to
the common show_cpuinfo, so that all 32-bit platforms will
now print the size of memory. I also update the code
to deal with the fact that total_memory is now a phys_addr_t.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We used to assume that even numbered threads were the primary
threads, ie those that would be listed and started as a cpu from
open firmware. Replace a left over is even (% 2) check with a check
for it being a primary thread and update the comments.
Tested with a debug print on pseries, identical code found for cell.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The pfn of the memory to be removed should be validated prior to
attempting to remove the memory. In cases where the probe of a
memory section fails during hotplug add, the pfn for the lmb may
not be valid.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a comment to clarify why atomic_dec_if_positive is being used
to decrement gang's aff_sched_count on SPU context unbind.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
This patch improves redability of the code responsible for trying to find
a node with enough SPUs not committed to other affinity gangs.
An additional check is also added, to avoid taking into account gangs that
have no SPU affinity.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
With most file readers (eg cat, dd), reading a context's regs file will
result in two reads: the first to read the data, and the second to
return EOF. Because each read performs a spu_acquire_saved, we end up
descheduling and re-scheduling the context twice.
This change does a simple check to see if we'd return EOF before
calling spu_acquire_saved(), saving the extra schedule operation.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace log will block until the read buffer
is full. This makes it difficult to retrieve the end of the buffer, as
the user will need to read with the right-sized buffer.
In a similar method as 91553a1b5e0df006a3573a88d98ee7cd48a3818a, this
change makes the switch_log return if there has already been data
read.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we use ctx->mapping_lock and ctx->switch_log->lock for the
context switch log. The mapping lock only prevents concurrent open()s,
so we require the switch_lock->lock for reads.
Since writes to the switch log buffer occur on context switches, we're
better off synchronising with the state_mutex, which is held during a
switch. Since we're serialised througout the buffer reads and writes,
we can use the state mutex to protect open and release too, and
can now kfree() the log buffer on release. This allows us to perform
the switch log notify without taking any extra locks.
Because the buffer is only present while the file is open, we can use
it to prevent multiple simultaneous openers.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, read() on the sputrace buffer will only return data when
the user buffer is exhausted. This may mean that we never see the
end of the event log, unless we read() with exactly the right-sized
buffer.
This change makes sputrace_read not block if we have data ready to
return.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, sputrace will start logging to the event buffer before the
log buffer has been open()ed. This results in a heap of "lost samples"
warnings if the sputrace file hasn't yet been opened.
Since the buffer is reset on open() anyway, there's no need to enable
logging when no-one has opened the log.
Because open clears the log, make it return EBUSY for mutliple open
calls.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.
This patch was generated mostly by script, and partially by hand.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds support for the GPIO functions of PPC40x and PPC44x
SOCs.
Signed-off-by: Steve Falco <sfalco@harris.com>
Acked-by: Stefan Roese <sr@denx.de>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Adds support for a HCU4 PPC405GPr based board from Netstal Maschinen AG.
Signed-off-by: Niklaus Giger <niklaus.giger@member.fsf.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds a common board file for almost all of the "simple" PowerPC 40x
boards that exist today. This is intended to be a single place to add
support for boards that do not differ in platform support from most of the
evaluation boards that are used as reference platforms. Boards that have
specific requirements or custom hardware setup should still have their own
board.c file.
The first board ported to this is the AMCC PowerPC 405EZ Acadia board.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
MPC5200 needs to have pipelining disabled for ATA to work. MPC5200B does not.
So, for the latter, don't touch the original setting from the bootloader.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Recently, indirect_pci was changed to test if the bus number requested
is the one hanging straight off the PHB, then it substitutes the bus
number with another one contained in a new "self_busno" field of the
pci_controller structure.
However, this breaks CHRP which didn't initialize this new field, and
which relies on having the right bus number passed to the hardware.
This fixes it by initializing this variable properly for all CHRP bridges
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The detection of the IBM "Python" PCI host bridge on IBM CHRP
machines such as old RS6000 was broken when we changed
of_device_is_compatible() from strncasecmp to strcasecmp (dropped
the "n" variant) due to the way IBM encodes the chip version.
We fix that by instead doing a match on the model property like
we do for others bridges in that file. It should be good enough
for those machines. If yours is still broken, let me know.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need a marker_synchronize_unregister() before the end of exit() to make sure
every probe callers have exited the non preemptible section and thus are not
executing the probe code anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>