For SCHED_RR tasks we can do some really trivial timeslicing. Basically
we fire up a time for every scheduler tick that searches for a higher
or same priority thread that is on the runqueue and if there is one
context switches to it. Because we can't lock spus from timer context
we actually run this from a delayed runqueue instead of a timer.
A nice optimization would be to skip the actual priority bitmap search
when there are less contexts than physical spus available. To implement
this I need a so far unpublished patch from Andre, and it will be added
after we have that patch in.
Note that right now we only do the time slicing for SCHED_RR tasks.
The code would work for SCHED_OTHER tasks aswell, but their prio
value is defered from the one the PPU thread has at time of spu_run,
and using this for spu scheduling decisions would make the code very
unfair. SCHED_OTHER support will be enabled once we the spu scheduler
knows how to calculcate cpu_context.prio (very soon)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
use DECLARE_BITMAP in the spu scheduler instead of reimplementing it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we start a spu context with realtime priority we want it to run
immediately and not wait until some other lower priority thread has
finished. Try to find a suitable victim and use it's spu in this
case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Give spu_yield a kerneldoc comment and remove the old comment
documenting spu_activate, spu_deactive and spu_yield as all of them
now have descriptive kerneldoc comments of their own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we call spu_remove_from_active_list that spu is always guaranteed
to be on the active list and in runnable state, so we can simply
do a list_del to remove it and unconditionally take the was_active
codepath.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no need to directly wake up contexts in spu_activate when
called from spu_run, so add a flag to surpress this wakeup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This is the biggest patch in this series, and it reworks the guts of
the spu scheduler runqueue mechanism:
- instead of embedding a waitqueue in the runqueue there is now a
simple doubly-linked list, the actual wakeups happen by reusing
the stop_wq in the spu context (maybe we should rename it one day)
- spu_free and spu_prio_wakeup are merged into a single spu_reschedule
function
- various functionality is split out into small helpers, and kerneldoc
comments are added in various places to document what's going on.
- spu_activate is rewritten into a tight loop by removing test for
various impossible conditions and using the infrastructure in this
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It doesn't make any sense to have a priority field in the physical spu
structure. Move it into the spu context instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups in code surrounding the state semaphore:
- inline spu_acquire/spu_release
- cleanup spu_acquire_* and add kerneldoc comments to these functions
- remove spu_release_exclusive and replace it with spu_release
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The r/w semaphore to lock the spus was overkill and can be replaced
with a mutex to make it faster, simpler and easier to debug. It also
helps to allow making most spufs interruptible in future patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups to sched.c that don't change the global control flow:
- add kerneldoc comments to various functions
- add spu_ prefixes to various functions
- add/remove context from the runqueue in bind/unbind_context as
it's part of the logical operation
- add a call to put_active_spu to spu_unbind_contex as it's logically
part of the unbind operation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Only bind_context/unbind_context change the spu context state. Thus
we can move all assignents of SPU_STATE_RUNNABLE into bind_context,
which parallels the unbind side aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
unbind_context already sets the context state to SPU_STATE_SAVED, thus
the spu_deactivate callers don't need to do it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the empty last line in arch/powerpc/platforms/cell/spufs/run.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the SPU_CONTEXT_PREEMPT define. It's unused and won't be used
in this form after the scheduler rework.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It looks like we've had some serious bitrot there mostly due to tracking
of address_space's of mmap'ed files getting out of sync with the actual
mmap code. The mfc, mss and psmap were not tracked properly and thus
not invalidated on context switches (oops !)
I also removed the various file->f_mapping = inode->i_mapping;
assignments that were done in the other open() routines since that
is already done for us by __dentry_open.
One improvement we might want to do later is to assign the various
ctx-> fields at mmap time instead of file open/close time so that we
don't call unmap_mapping_range() on thing that have not been mmap'ed
Finally, I added some smp_wmb's after assigning the ctx-> fields to make
sure they are visible to other CPUs. I don't think this is really
necessary as I suspect locking in the fs layer will make that happen
anyway but better safe than sorry.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the need for struct page for SPE local store
and registers from spufs. It also makes the locking much more
obvious and no longer relying on the truncate logic black magic
for protecting against races between unmap_mapping_range() and
new pages faulted in. It does so by switching to a nopfn() handler
and using the new vm_insert_pfn() to setup the PTEs itself while
holding a lock on the SPE.
The nice thing is that this patch actually removes a lot more code
than it adds :-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spu management ops in arch/platforms/cell/spu_priv1_mmio.h can be used
commonly in of based platform. This patch separates spu management ops
from native cell code and uses on celleb platform.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu->register_lock should be held before accessing registers.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The difference between 'nid' and 'node' fields in an
spu structure was used incorrectly. The common 'node'
number now reflects the NUMA node, and it is used
in other places in the code as well.
The 'nid' value is meaningful only in one place, namely
the computation of the interrupt numbers based on the
physical location of an spu. Consequently, we look it
up directly in the place where it is used now.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Some SPU code was a bit too convoluted and broke when adding support for
the new style device-tree, most notably the struct pages for SPEs no
longer get created. oops...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Don't limit spider I/O workarounds to the first two buses.
The IBM Cell blade has three of them (one PCI, two PCIe)
and we want to handle them all.
Signed-off-by: Jens Osterkamp <jens@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits)
[POWERPC] Generic BUG for powerpc
[PPC] Fix compile failure do to introduction of PHY_POLL
[POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
[POWERPC] Remove old dcr.S
[POWERPC] Fix SPU coredump code for max_fdset removal
[POWERPC] Fix irq routing on some 32-bit PowerMacs
[POWERPC] ps3: Add vuart support
[POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
[POWERPC] dont allow pSeries_probe to succeed without initialising MMU
[POWERPC] micro optimise pSeries_probe
[POWERPC] Add SPURR SPR to sysfs
[POWERPC] Add DSCR SPR to sysfs
[POWERPC] Fix 440SPe CPU table entry
[POWERPC] Add support for FP emulation for the e300c2 core
[POWERPC] of_device_register: propagate device_create_file return code
[POWERPC] Fix mmap of PCI resource with hack for X
[POWERPC] iSeries: head_64.o needs to depend on lparmap.s
[POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group
[POWERPC] Remove QE header files from lite5200.c
[POWERPC] of_platform_make_bus_id(): make `magic' int
...
Commit bbea9f6966 removed the max_fdset
element of struct fdtable. It appears that checking max_fds is
sufficient now.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds NULL to the initialization of the attribute_groups.
The spu_attributes and ppe_attributes arrays are arrays of pointers
that need to be terminated with a NULL entry.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make sure that the pmu is not initialised unless we are running on a cell.
Also make the init routine static.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, we only send a sigtrap if the current task is being ptraced.
This is somewhat inconsistant, and it breaks utrace support in fedora.
Removing the check should do the right thing in all cases.
Cc: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This changes the spu_create system call to return an error (-ENODEV) if
and isolated spu context is requested on hardware that doesn't support
isolated mode.
Tested on systemsim with and without isolation support
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This adds a platform specific spu management abstraction and the coresponding
routines to support the IBM Cell Blade. It also removes the hypervisor only
resources that were included in struct spu.
Three new platform specific routines are introduced, spu_enumerate_spus(),
spu_create_spu() and spu_destroy_spu(). The underlying design uses a new
type, struct spu_management_ops, to hold function pointers that the platform
setup code is expected to initialize to instances appropriate to that platform.
For the IBM Cell Blade support, I put the hypervisor only resources that were
in struct spu into a platform specific data structure struct spu_pdata.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
With soft-disabled interrupts in power_save, we can
still get external exceptions on Cell, even if we are
in pause(0) a.k.a. sleep state.
When the CPU really wakes up through the 0x100 (system reset)
vector, while we have already started processing the 0x500
(external) exception, we get a panic in unrecoverable_exception()
because of the lost state.
This occurred in Systemsim for Cell, but as far as I can see,
it can theoretically occur on any machine that uses the
system reset exception to get out of sleep state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds SPU elf notes to the coredump. It creates a separate note
for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1,
/signal1_type, /signal2, /signal2_type, /event_mask, /event_status,
/mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id.
A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to
specify they have extra elf core notes.
A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the
additional notes could be calculated and added to the notes phdr entry.
A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes
would be written after the existing notes.
The SPU coredump code resides in spufs. Stub functions are provided in the
kernel which are hooked into the spufs code which does the actual work via
register_arch_coredump_calls().
A new set of __spufs_<file>_read/get() functions was provided to allow the
coredump code to read from the spufs files without having to lock the
SPU context for each file read from.
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Add PPU event-based and cycle-based profiling support to Oprofile for Cell.
Oprofile is expected to collect data on all CPUs simultaneously.
However, there is one set of performance counters per node. There are
two hardware threads or virtual CPUs on each node. Hence, OProfile must
multiplex in time the performance counter collection on the two virtual
CPUs.
The multiplexing of the performance counters is done by a virtual
counter routine. Initially, the counters are configured to collect data
on the even CPUs in the system, one CPU per node. In order to capture
the PC for the virtual CPU when the performance counter interrupt occurs
(the specified number of events between samples has occurred), the even
processors are configured to handle the performance counter interrupts
for their node. The virtual counter routine is called via a kernel
timer after the virtual sample time. The routine stops the counters,
saves the current counts, loads the last counts for the other virtual
CPU on the node, sets interrupts to be handled by the other virtual CPU
and restarts the counters, the virtual timer routine is scheduled to run
again. The virtual sample time is kept relatively small to make sure
sampling occurs on both CPUs on the node with a relatively small
granularity. Whenever the counters overflow, the performance counter
interrupt is called to collect the PC for the CPU where data is being
collected.
The oprofile driver relies on a firmware RTAS call to setup the debug bus
to route the desired signals to the performance counter hardware to be
counted. The RTAS call must set the routing registers appropriately in
each of the islands to pass the signals down the debug bus as well as
routing the signals from a particular island onto the bus. There is a
second firmware RTAS call to reset the debug bus to the non pass thru
state when the counters are not in use.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The following routines are added to arch/powerpc/platforms/cell/pmu.c:
cbe_clear_pm_interrupts()
cbe_enable_pm_interrupts()
cbe_disable_pm_interrupts()
cbe_query_pm_interrupts()
cbe_pm_irq()
cbe_init_pm_irq()
This also adds a routine in arch/powerpc/platforms/cell/interrupt.c and
some macros in cbe_regs.h to manipulate the IIC_IR register:
iic_set_interrupt_routing()
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move some PMU-related macros and function prototypes from cbe_regs.h
and pmu.h in arch/powerpc/platforms/cell/ to a new header at
include/asm-powerpc/cell-pmu.h
This is cleaner to use from the oprofile code, since that sits in
arch/powerpc/oprofile, not in the cell platform directory.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
More macros for manipulating bits in the Cell PMU control registers.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add symbol-exports for the new routines in arch/powerpc/platforms/cell/pmu.c.
They are needed for Oprofile, which can be built as a module.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order to fit with the "don't-run-spus-outside-of-spu_run" model, this
patch starts the isolated-mode loader in spu_run, rather than
spu_create. If spu_run is passed an isolated-mode context that isn't in
isolated mode state, it will run the loader.
This fixes potential races with the isolated SPE app doing a
stop-and-signal before the PPE has called spu_run: bugzilla #29111.
Also (in conjunction with a mambo patch), this addresses #28565, as we
always set the runcntrl register when entering spu_run.
It is up to libspe to ensure that isolated-mode apps are cleaned up
after running to completion - ie, put the app through the "ISOLATE EXIT"
state (see Ch11 of the CBEA).
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change adds a read accessor for the SPE problem-state run control
register.
This is required for for applying (userspace) changes made to the run
control register while the SPE is stopped - simply asserting the master
run control bit is not sufficient. My next patch for isolated-mode
setup requires this.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When the user changes the runcontrol register, an SPU might be
running without a process being attached to it and waiting for
events. In order to prevent this, make sure we always disable
the priv1 master control when we're not inside of spu_run.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When fixing spufs to map the 'mem' file backing store cacheable,
I incorrectly set the physical mapping to use both cache-inhibited
and guarded mapping, which resulted in a serious performance
degradation.
Debugged-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When one of the spufs files is mapped into a process address
space, regular users can use ptrace to attempt accessing
them with access_process_vm(). With the way that the
mappings currently work, this likely causes an oops.
Setting the vm_flags to VM_IO makes sure that ptrace can
not access them but returns an error code. This is not
the perfect solution in case of the local store mapping,
but it fixes the oops in a well-defined way.
Also remove leftover VM_RESERVED flags in spufs. The
VM_RESERVED flag is on it's way out and not checked by
the memory managment code anymore.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Christoph Hellwig <chellwig@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When there is pending signals, current spufs_run_spu() always returns
-ERESTARTSYS and it is called again automatically.
But, if spe already stopped by stop-and-signal or halt instruction,
returning -ERESTARTSYS makes stop-and-signal/halt lost and
spu run over the end-point.
For your convenience, I attached a sample code to restage this bug.
If there is no bug, printed NPC will be 0x4000.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we attempt an MFC DMA to an unmapped address, the event
returned from spu_run should be SPE_EVENT_SPE_DATA_STORAGE,
not SPE_EVENT_INVALID_DMA.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace the use of the platform specific variable spu.nid with the
platform independednt variable spu.node.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to check the channel count of the signal notification registers
before reading them, because it can be undefined when the count is
zero. In order to read count and data atomically, we read from the
saved context.
This patch uses spu_acquire_saved() to force a context save before a
/signal1 or /signal2 read. Because of this it is no longer necessary to
have backing_ops and hw_ops versions of this function so they have been
removed.
Regular applications should not rely on reading this register
to be fast, as it's conceptually a write-only file from the PPE
perspective.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch implements read only access to
/mbox_info - SPU Write Outbound Mailbox
/ibox_info - SPU Write Outbound Interrupt Mailbox
/wbox_info - SPU Read Inbound Mailbox
These files are used by gdb in order to look into the current mailbox
queues without changing the contents at the same time. They are
not meant for general programming use, since the access requires
a context save and is therefore rather slow.
It would be good to complement this patch with one that adds
write support as well.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the /spu_tag_mask file from spufs. The data provided by
this file is also available from the /dma_info file in the dma_info_mask
of the spu_dma_info struct.
The file was intended to be used by gdb, but that never used it, and
now it has been replaced with the more verbose dma_info file.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The /lslr file gives read access to the SPU_LSLR register in hex; 0x3fff
for example The /dma_info file provides read access to the SPU Command
Queue in a binary format. The /proxydma_info files provides read access
access to the Proxy Command Queue in a binary format. The spu_info.h
file provides data structures for interpreting the binary format of
/dma_info and /proxydma_info.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patches changes /npc, /decr, /decr_status, /spu_tag_mask,
/event_mask, /event_status, and /srr0 files to provide output according to
the format string "0x%llx" instead of "%llx".
Before this patch some files used "0x%llx" and other used "%llx" which is
inconsistent and potentially confusing. A user might assume "%llx" numbers
were decimal if they happened to not contain any a-f digits. This change
will break any code cannot tolerate a leading 0x in the file contents. The
only known users of these files are the libspe but there might also be
some scripts which access these files. This risk is deemed acceptable for
future consistency.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds full cell iommu support (and iommu disabled mode).
It implements mapping/unmapping of iommu pages on demand using the
standard powerpc iommu framework. It also supports running with
iommu disabled for machines with less than 2GB of memory. (The
default is off in that case, though it can be forced on with the
kernel command line option iommu=force).
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that the direct DMA ops supports an offset, we use that instead
of defining our own.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch implements a workaround for a Spider PCI host bridge bug
where it doesn't enforce some of the PCI ordering rules unless some
manual manipulation of a special register is done. In order to be
fully compliant with the PCI spec, I do this on every MMIO read
operation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes the Cell DMA code work on both the Spider and the Axon
south bridges by turning cell_dma_valid into a variable instead of a
constant. This is a temporary patch until we have full iommu support.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When enabled in Kconfig, it will pick up any of_platform_device
matching it's match list (currently type "pci", "pcix", "pcie",
or "ht" and setup a PHB for it.
Platform must provide a ppc_md.pci_setup_phb() for it to work
(for doing the necessary initialisations specific to a given PHB
like setting up the config space ops).
It's currently only available on 64 bits as the 32 bits PCI code
can't quite cope with it in it's current form. I will fix that
later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a bus device notifier to the of_platform bus type on
cell to setup the DMA operations for of_platform_devices. We currently
use the PCI operations as Cell use a special version of them that
happens to be suitable for our needs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch completely refactors DMA operations for 64 bits powerpc. 32 bits
is untouched for now.
We use the new dev_archdata structure to add the dma operations pointer
and associated data to struct device. While at it, we also add the OF node
pointer and numa node. In the future, we might want to look into merging
that with pci_dn as well.
The old vio, pci-iommu and pci-direct DMA ops are gone. They are now replaced
by a set of generic iommu and direct DMA ops (non PCI specific) that can be
used by bus types. The toplevel implementation is now inline.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Hook up of_platform_bus_probe with the cell platform in order to publish
the non-PCI devices in the device-tree of cell blades as of_platform_device(s)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support for southbridges using the MPIC interrupt controller to
the native cell platforms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch reworks the way IRQs are fixed up on PCI for arch powerpc.
It makes pci_read_irq_line() called by default in the PCI code for
devices that are probed, and add an optional per-device fixup in
ppc_md for platforms that really need to correct what they obtain
from pci_read_irq_line().
It also removes ppc_md.irq_bus_setup which was only used by pSeries
and should not be needed anymore.
I've also removed the pSeries s7a workaround as it can't work with
the current interrupt code anyway. I'm trying to get one of these
machines working so I can test a proper fix for that problem.
I also haven't updated the old-style fixup code from 85xx_cds.c
because it's actually buggy :) It assigns pci_dev->irq hard coded
numbers which is no good with the new IRQ mapping code. It should
at least use irq_create_mapping(NULL, hard_coded_number); and possibly
also set_irq_type() to set them as level low.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a typo in the "new style" code for mapping SPE resources,
which causes it to try to map the same resource 4 times.
It also adds some pr_debug's that are useful to track down issues with
the firmware when bringinh up new machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a cpufreq backend driver to enable frequency scaling on cell.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for stopping, and restarting, spus
from xmon. We use the spu master runcntl bit to stop execution,
this is apparently the "right" way to control spu execution and
spufs will be changed in the future to use this bit.
Testing has shown that to restart execution we have to turn the
master runcntl bit on and also rewrite the spu runcntl bit, even
if it is already set to 1 (running).
Stopping spus is triggered by the xmon command 'ss' - "spus stop"
perhaps. Restarting them is triggered via 'sr'. Restart doesn't
start execution on spus unless they were running prior to being
stopped by xmon.
Walking the spu->full_list in xmon after a panic, would mean
corruption of any spu struct would make all the others
inaccessible. To avoid this, and also to make the next patch
easier, we cache pointers to all spus during boot.
We attempt to catch and recover from errors while stopping and
restarting the spus, but as with most xmon functionality there are
no guarantees that performing these operations won't crash xmon
itself.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the cell idle function to use the default cpu_idle
with a special power_save callback, like all other platforms
except iSeries already do.
It also makes it possible to disable this power_save function
with a new powerpc-specific boot option "powersave=off".
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a module that registers sysfs attributes to CPU and SPU
containing the temperature of the CBE.
They can be found under
/sys/devices/system/spu/cpuX/thermal/temperature[0|1]
/sys/devices/system/spu/spuX/thermal/temperature
The temperature is read from the on-chip temperature sensors.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order to add sysfs attributes to all spu's, there is a
need for a list of all available spu's. Adding the device_node
makes also sense, as it is needed for proper register access.
This patch also adds two functions to create and remove sysfs
attributes and attribute_groups to all spus.
That allows to group spu attributes in a subdirectory like:
/sys/devices/system/spu/spuX/group_name/what_ever
This will be used by cbe_thermal to group all attributes dealing with
thermal support in one directory.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add routines for accessing the registers and counters in the performance
monitoring unit.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Many of the registers in the performance monitoring unit are write-only.
We need to save a "shadow" copy when we write to those registers so we
can retrieve the values if we need them later.
The new cbe_pmd_shadow_regs structure is added to the cbe_regs_map structure
so we have the appropriate per-node copies of these shadow values.
Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are a few definitions that are required by subsequent patches,
so add them here.
The original patch is from David Erb, but is significantly cleaned
up by Kevon Corry.
Cc: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When in isolated mode, SPEs have access to an area of persistent
storage, which is per-SPE. In order for isolated-mode apps to
communicate arbitrary data through this storage, we need to ensure that
isolated physical SPEs can be reused for subsequent applications.
Add a file ("recycle") in a spethread dir to enable isolated-mode
recycling. By writing to this file, the kernel will reload the
isolated-mode loader kernel, allowing a new app to be run on the same
physical SPE.
This requires the spu_acquire_exclusive function to enforce exclusive
access to the SPE while the loader is initialised.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds general support for isolated mode SPE apps.
Isolated apps are started indirectly, by a dedicated loader "kernel".
This patch starts the loader when spe_create is invoked with the
ISOLATE flag. We do this at spe_create time to allow libspe to pass the
isolated app in before calling spe_run.
The loader is read from the device tree, at the location
"/spu-isolation/loader". If the loader is not present, an attempt to
start an isolated SPE binary will fail with -ENODEV.
Update: loader needs to be correctly aligned - copy to a kmalloced buf.
Update: remove workaround for systemsim/spurom 'L-bit' bug, which has
been fixed.
Update: don't write to runcntl on spu_run_init: SPU is already running.
Update: do spu_setup_isolated earlier
Tested on systemsim.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds two new flags to spu_create:
SPU_CREATE_NONSCHED: create a context that is never moved
away from an SPE once it has started running. This flag
can only be used by tasks with the CAP_SYS_NICE capability.
SPU_CREATE_ISOLATED: create a nonschedulable context that
enters isolation mode upon first run. This requires the
SPU_CREATE_NONSCHED flag.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the mostly unused variable isrc from struct spu and a forgotten
function declaration.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SPRN_SDR1 and the SPE's MFC SDR are hypervisor resources and
are not accessible from a logical partition. This change adds an
access wrapper.
When running on bare H/W, the spufs needs to only set the SPE's MFC SDR
to the value of the PPE's SPRN_SDR1 once at SPE initialization, so this
change renames mfc_sdr_set() to mfc_sdr_setup() and moves the
access of SPRN_SDR1 into the mmio wrapper. It also removes the now
unneeded member mfc_sdr_RW from struct spu_priv1_collapsed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, spufs_mbox_read transfers more bytes than requested on a
read. If you ask for four bytes, you get eight. This fixes it to
transfer the largest multiple of four bytes that is less than or equal
to the number you asked for.
Note: one nasty property of this file in spufs is that you can only
read multiples of four bytes in the first place, since there is no way
to atomically put back a few bytes into the hardware register. Thus,
reading less than four bytes returns -EINVAL. Asking for more than
four returns the largest possible multiple of four.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the /signal2 file to actually give signal2 data.
Signed-off-by: Dwayne Grant Mcconnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a memory leak introduced by "spufs: add support
for read/write oncntl", which was missing a call to simple_attr_close.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPU code will crash if CONFIG_NUMA is not set and SPUs are found on
a non-0 node. This workaround will ignore those SPEs and just print an
message in the kernel log.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove struct pt_regs * from all handlers.
Also remove the regs argument from get_irq() functions.
Compile tested with arch/powerpc/config/* and
arch/ppc/configs/prep_defconfig
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix up some of the buildbreaks from the irq handler changes.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
- Some long constants should be marked 'ul'.
- When using desc->handler_data to pass an __iomem
register area, we need to add casts to and from
__iomem.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This enables support for new firmware test releases.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds an 'object-id' file that the spe library can
use to store a pointer to its ELF object. This was
originally meant for use by oprofile, but is now
also used by the GNU debugger, if available.
In order for oprofile to find the location in an spu-elf
binary where an event counter triggered, we need a way
to identify the binary in the first place.
Unfortunately, that binary itself can be embedded in a
powerpc ELF binary. Since we can assume it is mapped into
the effective address space of the running process,
have that one write the pointer value into a new spufs
file.
When a context switch occurs, pass the user value to
the profiler so that can look at the mapped file (with
some care).
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The properties we used traditionally in the device tree are somewhat
nonstandard. This adds support for a more conventional format using
'interrupts' and 'reg' properties.
The interrupts are specified in three cells (class 0, 1 and 2) and
registered at the interrupt-parent.
The reg property contains either three or four register areas in the
order 'local-store', 'problem', 'priv2', and 'priv1', so the priv1 one
can be left out in case of hypervisor driven systems that access these
through hcalls.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Writing to cntl can be used to stop execution on the
spu and to restart it, reading from cntl gives the
contents of the current status register.
The access is always in ascii, as for most other files.
This was always meant to be there, but we had a little
problem with writing to runctl so it was left out so
far.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Any firmware that still uses the 'spc' nodes already
stopped running for other reasons, so let's get rid of this.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since libspe2 will provide a function that can read/write
multiple mailbox elements at once, the kernel should handle
that efficiently.
read/write on the three mailbox files can now access the
spe context multiple times to operate on any number of
mailbox data elements.
If the spu application keeps writing to its outbound
mailbox, the read call will pick up all the data in a
single system call.
Unfortunately, if the user passes an invalid pointer,
we may lose a mailbox element on read, since we can't
put it back. This probably impossible to solve, if the
user also accesses the mailbox through direct register
access.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This hopefully fixes a long-standing bug in the spu file system.
An spu context comes with local memory that can be either saved
in kernel pages or point directly to a physical SPE.
When mapping the physical SPE, that mapping needs to be cache-inhibited.
For simplicity, we used to map the kernel backing memory that way
too, but unfortunately that was not only inefficient, but also incorrect
because the same page could then be accessed simultaneously through
a cacheable and a cache-inhibited mapping, which is not allowed
by the powerpc specification and in our case caused data inconsistency
for which we did a really ugly workaround in user space.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the concept of a gang to spufs as a new type of object.
So far, this has no impact whatsover on scheduling, but makes
it possible to add that later.
A new type of object in spufs is now a spu_gang. It is created
with the spu_create system call with the flags argument set
to SPU_CREATE_GANG (0x2). Inside of a spu_gang, it
is then possible to create spu_context objects, which until
now was only possible at the root of spufs.
There is a new member in struct spu_context pointing to
the spu_gang it belongs to, if any. The spu_gang maintains
a list of spu_context structures that are its children.
This information can then be used in the scheduler in the
future.
There is still a bug that needs to be resolved in this
basic infrastructure regarding the order in which objects
are removed. When the spu_gang file descriptor is closed
before the spu_context descriptors, we leak the dentry
and inode for the gang. Any ideas how to cleanly solve
this are appreciated.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This tries to fix spufs so we have an interface closer to what is
specified in the man page for events returned in the third argument of
spu_run.
Fortunately, libspe has never been using the returned contents of that
register, as they were the same as the return code of spu_run (duh!).
Unlike the specification that we never implemented correctly, we now
require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in
order to get the new behavior. When this flag is not passed, spu_run
will simply ignore the third argument now.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For better explanation, I break down the page fault handling into steps:
1) There is a page fault caused by DMA operation initiated by SPU and
DMA is suspended.
2) The interrupt handler 'spu_irq_class_1()/__spu_trap_data_map()' is
called and it just wakes up the sleeping spe-manager thread.
3) by PPE scheduler, the corresponding bottom half,
spu_irq_class_1_bottom() is called in process context and DMA is
restarted.
There can be a quite large time gap between 2) and 3) and I found
the following problem:
Between 2) and 3) If the context becomes unbound, 3) is not executed
because when the spe-manager thread is awaken, the context is already
saved. (This situation can happen, for example, when a high priority spe
thread newly started in that time gap)
But the actual problem is that the corresponding SPU context does not
work even if it is bound again to a SPU.
Besides I can see the following warning in mambo simulator when the
context becomes
unbound(in save_mfc_cmd()), i.e. when unbind() is called for the
context after step 2) before 3) :
'WARNING: 61392752237: SPE2: MFC_CMD_QUEUE channel count of 15 is
inconsistent with number of available DMA queue entries of 16'
After I go through available documents, I found that the problem is
because the suspended DMA is not restarted when it is bound again.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds NUMA support to the the spufs scheduler.
The new arch/powerpc/platforms/cell/spufs/sched.c is greatly
simplified, in an attempt to reduce complexity while adding
support for NUMA scheduler domains. SPUs are allocated starting
from the calling thread's node, moving to others as supported by
current->cpus_allowed. Preemption is gone as it was buggy, but
should be re-enabled in another patch when stable.
The new arch/powerpc/platforms/cell/spu_base.c maintains idle
lists on a per-node basis, and allows caller to specify which
node(s) an SPU should be allocated from, while passing -1 tells
spu_alloc() that any node is allowed.
Since the patch removes the currently implemented preemptive
scheduling, it is technically a regression, but practically
all users have since migrated to this version, as it is
part of the IBM SDK and the yellowdog distribution, so there
is not much point holding it back while the new preemptive
scheduling patch gets delayed further.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new "psmap" file to spufs that allows mmap of all of
the problem state mapping of SPEs. It is compatible with 64k pages. In
addition, it removes mmap ability of individual files when using 64k
pages, with the exception of signal1 and signal2 which will both map the
entire 64k page holding both registers. It also removes
CONFIG_SPUFS_MMAP as there is no point in not building mmap support in
spufs.
It goes along a separate patch to libspe implementing usage of that new
file to access problem state registers.
Another patch will follow up to fix races opened up by accessing
the 'runcntl' register directly, which is made possible with this
patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/configh:
Remove all inclusions of <linux/config.h>
Manually resolved trivial path conflicts due to removed files in
the sound/oss/ subdirectory.
This patch reworks the cell iic interrupt handling so that:
- Node ID is back in the interrupt number (only one IRQ host is created
for all nodes). This allows interrupts from sources on another node to
be routed non-locally. This will allow possibly one day to fix maxcpus=1
or 2 and still get interrupts from devices on BE 1. (A bit more fixing
is needed for that) and it will allow us to implement actual affinity
control of external interrupts.
- Added handling of the IO exceptions interrupts (badly named, but I
re-used the name initially used by STI). Those are the interrupts
exposed by IIC_ISR and IIC_IRR, such as the IOC translation exception,
performance monitor, etc... Those get their special numbers in the IRQ
number space and are internally implemented as a cascade on unit 0xe,
class 1 of each node.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (29 commits)
[POWERPC] Fix rheap alignment problem
[POWERPC] Use check_legacy_ioport() for ISAPnP
[POWERPC] Avoid NULL pointer in gpio1_interrupt
[POWERPC] Enable generic rtc hook for the MPC8349 mITX
[POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class
[POWERPC] Create a "wrapper" script and use it in arch/powerpc/boot
[POWERPC] fix spin lock nesting in hvc_iseries
[POWERPC] EEH failure to mark pci slot as frozen.
[POWERPC] update powerpc defconfig files after libata kconfig breakage
[POWERPC] enable sysrq in pmac32_defconfig
[POWERPC] UPIO_TSI cleanup
[POWERPC] rewrite mkprep and mkbugboot in sane C
[POWERPC] maple/pci iomem annotations
[POWERPC] powerpc oprofile __user annotations
[POWERPC] cell spufs iomem annotations
[POWERPC] NULL noise removal: spufs
[POWERPC] ppc math-emu needs -fno-builtin-fabs for math.c and fabs.c
[POWERPC] update mpc8349_itx_defconfig and remove some debug settings
[POWERPC] Always call cede in pseries dedicated idle loop
[POWERPC] Fix loop logic in irq_alloc_virt()
...
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The rtas console doesn't have to be Cell specific. If we get both
RTAS tokens, we should just enabled the console then and there.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cleanup CPU inits a bit more, Geoff Levand already did some earlier.
* Move CPU state save to cpu_setup, since cpu_setup is only ever done
on cpu 0 on 64-bit and save is never done more than once.
* Rename __restore_cpu_setup to __restore_cpu_ppc970 and add
function pointers to the cputable to use instead. Powermac always
has 970 so no need to check there.
* Rename __970_cpu_preinit to __cpu_preinit_ppc970 and check PVR before
calling it instead of in it, it's too early to use cputable.
* Rename pSeries_secondary_smp_init to generic_secondary_smp_init since
everyone but powermac and iSeries use it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that get_property() returns a void *, there's no need to cast its
return value. Also, treat the return value as const, so we can
constify get_property later.
cell platform changes.
Built for cell_defconfig
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch slightly reworks the new irq code to fix a small design error. I
removed the passing of the trigger to the map() calls entirely, it was not a
good idea to have one call do two different things. It also fixes a couple of
corner cases.
Mapping a linux virtual irq to a physical irq now does only that. Setting the
trigger is a different action which has a different call.
The main changes are:
- I no longer call host->ops->map() for an already mapped irq, I just return
the virtual number that was already mapped. It was called before to give an
opportunity to change the trigger, but that was causing issues as that could
happen while the interrupt was in use by a device, and because of the
trigger change, map would potentially muck around with things in a racy way.
That was causing much burden on a given's controller implementation of
map() to get it right. This is much simpler now. map() is only called on
the initial mapping of an irq, meaning that you know that this irq is _not_
being used. You can initialize the hardware if you want (though you don't
have to).
- Controllers that can handle different type of triggers (level/edge/etc...)
now implement the standard irq_chip->set_type() call as defined by the
generic code. That means that you can use the standard set_irq_type() to
configure an irq line manually if you wish or (though I don't like that
interface), pass explicit trigger flags to request_irq() as defined by the
generic kernel interfaces. Also, using those interfaces guarantees that
your controller set_type callback is called with the descriptor lock held,
thus providing locking against activity on the same interrupt (including
mask/unmask/etc...) automatically. A result is that, for example, MPIC's
own map() implementation calls irq_set_type(NONE) to configure the hardware
to the default triggers.
- To allow the above, the irq_map array entry for the new mapped interrupt
is now set before map() callback is called for the controller.
- The irq_create_of_mapping() (also used by irq_of_parse_and_map()) function
for mapping interrupts from the device-tree now also call the separate
set_irq_type(), and only does so if there is a change in the trigger type.
- While I was at it, I changed pci_read_irq_line() (which is the helper I
would expect most archs to use in their pcibios_fixup() to get the PCI
interrupt routing from the device tree) to also handle a fallback when the
DT mapping fails consisting of reading the PCI_INTERRUPT_PIN to know wether
the device has an interrupt at all, and the the PCI_INTERRUPT_LINE to get an
interrupt number from the device. That number is then mapped using the
default controller, and the trigger is set to level low. That default
behaviour works for several platforms that don't have a proper interrupt
tree like Pegasos. If it doesn't work for your platform, then either
provide a proper interrupt tree from the firmware so that fallback isn't
needed, or don't call pci_read_irq_line()
- Add back a bit that got dropped by my main rework patch for properly
clearing pending IPIs on pSeries when using a kexec
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the new irq remapper core and removes the old one. Because
there are some fundamental conflicts with the old code, like the value
of NO_IRQ which I'm now setting to 0 (as per discussions with Linus),
etc..., this commit also changes the relevant platform and driver code
over to use the new remapper (so as not to cause difficulties later
in bisecting).
This patch removes the old pre-parsing of the open firmware interrupt
tree along with all the bogus assumptions it made to try to renumber
interrupts according to the platform. This is all to be handled by the
new code now.
For the pSeries XICS interrupt controller, a single remapper host is
created for the whole machine regardless of how many interrupt
presentation and source controllers are found, and it's set to match
any device node that isn't a 8259. That works fine on pSeries and
avoids having to deal with some of the complexities of split source
controllers vs. presentation controllers in the pSeries device trees.
The powerpc i8259 PIC driver now always requests the legacy interrupt
range. It also has the feature of being able to match any device node
(including NULL) if passed no device node as an input. That will help
porting over platforms with broken device-trees like Pegasos who don't
have a proper interrupt tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adapts the generic powerpc interrupt handling code, and all of
the platforms except for the embedded 6xx machines, to use the new
genirq framework.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(Only fails with -Werror-implicit-function-declaration, but
it should still be fixed).
arch/powerpc/platforms/cell/setup.c:145: error: implicit declaration of function 'udbg_init_rtas_console'
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the new IRQF_ constants and remove the SA_INTERRUPT define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits)
[POWERPC] Use little-endian bit from firmware ibm,pa-features property
[POWERPC] Make sure smp_processor_id works very early in boot
[POWERPC] U4 DART improvements
[POWERPC] todc: add support for Time-Of-Day-Clock
[POWERPC] Make lparcfg.c work when both iseries and pseries are selected
[POWERPC] Fix idr locking in init_new_context
[POWERPC] mpc7448hpc2 (taiga) board config file
[POWERPC] Add tsi108 pci and platform device data register function
[POWERPC] Add general support for mpc7448hpc2 (Taiga) platform
[POWERPC] Correct the MAX_CONTEXT definition
powerpc: minor cleanups for mpc86xx
[POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set
[POWERPC] Simplify the code defining the 64-bit CPU features
[POWERPC] powerpc: kconfig warning fix
[POWERPC] Consolidate some of kernel/misc*.S
[POWERPC] Remove unused function call_with_mmu_off
[POWERPC] update asm-powerpc/time.h
[POWERPC] Clean up it_lp_queue.h
[POWERPC] Skip the "copy down" of the kernel if it is already at zero.
[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
...
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.
While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.
The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.
This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.
As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.
The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.
We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.
This patch:
rename desc->handler to desc->chip.
Originally i did not want to do this, because it's a big patch. But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.
I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.
So lets get over with this quickly. The conversion was done automatically
via scripts and converts all the code in the kernel.
This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.
[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The class zero interrupt handling for spus was confusing alignment and
error interrupts, so swap them.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_base.c calls __add_pages, which depends on CONFIG_MEMORY_HOTPLUG.
Moved the selection of CONFIG_MEMORY_HOTPLUG from CONFIG_SPUFS_MMAP
to CONFIG_SPU_FS.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In the context save/restore code, the SPU MFC command queue purge
code has a bug:
static inline void wait_purge_complete(struct spu_state *csa, struct
spu *spu)
{
struct spu_priv2 __iomem *priv2 = spu->priv2;
/* Save, Step 28:
* Poll MFC_CNTL[Ps] until value '11' is
* read
* (purge complete).
*/
POLL_WHILE_FALSE(in_be64(&priv2->mfc_control_RW)
& MFC_CNTL_PURGE_DMA_COMPLETE);
}
This will exit as soon as _one_ of the 2 bits that compose
MFC_CNTL_PURGE_DMA_COMPLETE is set, and one of them happens to be
"purge in progress"... which means that we'll happily continue
restoring the MFC while it's being purged at the same time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a bug where we don't properly map SPE MMIO space as guarded,
causing various test cases to fail, probably due to write combining and other
niceties caused by the lack of the G bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable the RTAS udbg console on IBM Cell Blade, this allows xmon
to work.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Initialise the ppc_md htab callbacks earlier, in the probe routines. This
allows us to call htab_finish_init() from htab_initialize(), and makes it
private to hash_utils_64.c. Move htab_finish_init() and make_bl() above
htab_initialize() to avoid forward declarations.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
locking init cleanups:
- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner
this patch was generated automatically.
Motivation:
- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's going away.
I wonder if this code really meant to iterate across not-present, not-online
CPUs.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid duplication of the syscall table for the cell platform. Based on an
idea from David Woodhouse.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPU context save/restore code is currently built
for a 4k page size and we provide a _shipped version
of it since most people don't have the spu toolchain
that is needed to rebuild that code.
This patch hardcodes the data structures to a 64k
page alignment, which also guarantees 4k alignment
but unfortunately wastes 60k of memory per SPU
context that is created in the running system.
We will follow up on this with another patch to
reduce that overhead or maybe redo the context
save/restore logic to do this part entirely different,
but for now it should make experimental systems
work with either page size.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At this time, all flags are invalid. Since we are
planning to actually add valid flags in the future,
we better check if any were passed by the user.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SPU interrupt status must be cleared before handle it.
Otherwise, kernel may drop some interrupt packet.
Currently, class2 interrupt treated like:
1) call callback to wake up waiting process
2) mask raised mailbox interrupt
3) clear interrupt status
I changed like:
1) mask raised mailbox interrupt
2) clear interrupt status
3) call callback to wake up waiting process
Clearing status before masking will make spurious interrupt.
Thus, it is necessary to hold by steps I described above, I think.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch remove 'stop_code' -- discarded member of struct spu.
It is written at initialize and interrupt, but never read
in current implementation.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the hypervisor abstraction of setting cpu affinity to a
higher level to avoid platform dependent interrupt controller
routines. I replaced spu_priv1_ops:spu_int_route_set() with a
new routine spu_priv1_ops:spu_cpu_affinity_set().
As a by-product, this change eliminated what looked like an
existing bug in the set affinity code where spu_int_route_set()
mistakenly called int_stat_get().
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To support muti-platform binaries the spu hypervisor accessor
routines must have runtime binding.
I removed the existing statically linked routines in spu.h
and spu_priv1_mmio.c and created new accessor routines in spu_priv1.h
that operate indirectly through an ops struct spu_priv1_ops.
spu_priv1_mmio.c contains the instance of the accessor routines
for running on raw hardware.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Creates new config variables PPC_CELL_NATIVE and PPC_IBM_CELL_BLADE.
The existing CONFIG_PPC_CELL is now used to denote the generic
Cell processor support.
PPC_CELL = make descends into platforms/cell
PPC_CELL_NATIVE = add bare metal support
PPC_IBM_CELL_BLADE = add blade device drivers, etc.
Also renames spu_priv1.c to spu_priv1_mmio.c.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The save/restore sequence for SPE contexts currently attempts to save
and restore the channel count for SPE channel 1 (the SPU_WriteEventMask
channel. But the CBE architecture (section 9.11.2) clearly states
that this channel does not have an associated count. Hardware simply
ignores the attempt to write this count, but the simulator generates
a warning message.
WARNING: 279721590: SPE7: Attempt to write channel count for CH 1 with
no associated count is ignored.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up create_spu() a little by using kzalloc instead of kmalloc +
assignments.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The wbox channel count of an spu is now initialized
to four for the saved context. This makes it possible
to write to the mailbox right away without waiting
for the SPE to become scheduled first.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For performance analysis, it is often interesting to know
which physical SPE a thread is currently running on, and,
more importantly, if it is running at all.
This patch adds a simple attribute to each SPU directory
with that information.
The attribute is read-only and called 'phys-id'. It contains
an ascii string with the number of the physical SPU (e.g.
"0x5"), or alternatively the string "0xffffffff" (32 bit -1)
when it is not running at all at the time that the file
is read.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs currently knows only 4k pages and 16M hugetlb
pages. Make it use the regular methods for deciding on
the SLB bits.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_rmdir tries to acquire the spufs root
i_mutex, which is already held by spufs_create_thread.
This was tracked as Bug #H9512.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A recent change to the way that the mfc file gets mapped made it
impossible to map the SPE Multi-Source Synchronization register
into user space, but that may be needed by some applications.
This restores the missing functionality.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spu_base module is rather deeply intermixed with the
core kernel, so it makes sense to have that built-in.
This will let us extend the base in the future without
having to export more core symbols just for it.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SPUs are registered as system devices, exposing attributes through
sysfs. Since the sysdev includes a kref, we can remove the one in
struct spu (it isn't used at the moment anyway).
Currently only the interrupt source and numa node attributes are added.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Checking the priority field to test for irq validity is
completely bogus and breaks with future external interrupt
controllers.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is a first version of support for the Cell BE "Reliability,
Availability and Serviceability" features.
It doesn't yet handle some of the RAS interrupts (the ones described in
iic_is/iic_irr), I'm still working on a proper way to expose these. They
are essentially a cascaded controller by themselves (sic !) though I may
just handle them locally to the iic driver. I need also to sync with
David Erb on the way he hooked in the performance monitor interrupt.
So that's all for 2.6.17 and I'll do more work on that with my rework of
the powerpc interrupt layer that I'm hacking on at the moment.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For pseries IOMMU bypass I want to be able to fall back to the regular
IOMMU ops. Do this by creating a dma_mapping_ops struct, and convert
the others while at it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The IBM Cell blade firmware might confuse the kernel to think it's a
pSeries machine. This fixes it for now. With a bit of luck, the firmware
will be updated to avoid that in the future but currently that patch is
needed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Syscall number 224 was absent from the table, which I believe means that
the SPU can cause an oops by attempting to use it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an nid member to the spu structure, and store the numa id of the spu there
on creation.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on an older patch from Mike Kravetz <kravetz@us.ibm.com>
We need to have a mem_map for high addresses in order to make fops->no_page
work on spufs mem and register files. So far, we have used the
memory_present() function during early bootup, but that did not work when
CONFIG_NUMA was enabled.
We now use the __add_pages() function to add the mem_map when loading the
spufs module, which is a lot nicer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use kzalloc when allocating a new spu context, rather than kmalloc +
zeroing.
Booted & tested on cell.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch disables and saves local interrupts during
hash_page processing for SPE contexts.
We have to do it explicitly in the spu_irq_class_1_bottom
function. For the interrupt handlers, we get the behaviour
implicitly by using SA_INTERRUPT to disable interrupts while
in the handler.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Wire up *at syscalls.
This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
Every time a new syscall gets added, a BUILD_BUG_ON in
arch/powerpc/platforms/cell/spu_callbacks.c gets triggered.
Since the addition of a new syscall is rather harmless,
the error should just be removed.
While we're here, add sys_tee to the list and add a comment
to systbl.S to remind people that there is another list
on powerpc.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We found that when the 'decrementer' is saved, the PPE saves the current
time 'csa->suspend_time'. When restoring the 'decrementer', (Step 34)
decrementer seems to be adjusted with the number of cycles th= at a spu
thread has not been running.
In that code it is missing a substract ('-') because 'delta_time' is
assigned a not substracted(see bellow).
Acked-by: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Missing include for __NR_syscalls, and missing sys_splice() that
causes build-time failure due to compile-time bounds check on
spu_syscall_table.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.
We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_init and spufs_exit should be marked correctly so
they can be removed when not needed.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If an SPE attempts a DMA put to a local store after already doing
a get, the kernel must update the HW PTE to allow the write access.
This case was not being handled correctly.
From: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Mike Kistler <mkistler@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I'm not sure where the information came from, but I assumed
that doing cache-inhibited mappings for mmio regions was
sufficient.
It seems we also need the guarded bit set, like everyone
else, which is the default for ioremap.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noticed by Milton Miller, setting the initial affinity in
spider-pic can go wrong if the target node field was not orinally
empty.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
the mfc member of a new context was not initialized to zero,
which potentially leads to wild memory accesses.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.
This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.
The signal1 & signal2 files may be mmap()'d by regular user
processes. The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.
This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.
I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.
This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Apparently we have found a bug in the CPU that causes
external interrupts to sometimes get disabled indefinitely.
This adds a workaround for the problem.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current interrupt controller setup on Cell is done
in a rather ad-hoc way with device tree properties
that are not standardized at all.
In an attempt to do something that follows the OF standard
(or at least the IBM extensions to it) more closely,
we have now come up with this patch. It still provides
a fallback to the old behaviour when we find older firmware,
that hack can not be removed until the existing customer
installations have upgraded.
Cc: hpenner@de.ibm.com
Cc: stk@de.ibm.com
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Cc: benh@kernel.crashing.org
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A small bug crept in the iommu driver when we made it more
generic. This patch is needed for boards that have a dma
window that does not start at bus address zero.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
set_page_count usage outside mm/ is limited to setting the refcount to 1.
Remove set_page_count from outside mm/, and replace those users with
init_page_count() and set_page_refcounted().
This allows more debug checking, and tighter control on how code is allowed
to play around with page->_count.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a small fix to get the spufs init sequence right.
init_spu_base() in spu_base.c should be called (via
module_init(init_spu_base)) before spufs_init() (via
module_init(spufs_init)) in spufs/inode.c gets called.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These symbols are only used in the file that they are defined in,
so they should not be in the global namespace.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPE Book IV indicates that MFC DMA operations must be
suspended and restored on SPU context switch (in Step 8).
This patch adds that operation, which is missing from the
current spufs implementation.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For far, all SPU triggered interrupts always end up on
the first SMT thread, which is a bad solution.
This patch implements setting the affinity to the
CPU that was running last when entering execution on
an SPU. This should result in a significant reduction
in IPI calls and better cache locality for SPE thread
specific data.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>