This patch provides the spu affinity placement logic for the spufs scheduler.
Each time a gang is going to be scheduled, the placement of a reference
context is defined. The placement of all other contexts with affinity from
the gang is defined based on this reference context location and on a
precomputed displacement offset.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for additional flags at spu_create, which relate
to the establishment of affinity between contexts and contexts to memory.
A fourth, optional, parameter is supported. This parameter represent
a affinity neighbor of the context being created, and is used when defining
SPU-SPU affinity.
Affinity is represented as a doubly linked list of spu_contexts.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spu_sched->bitmap has MAX_PRIO(=140) width in bits.However, since
ff80a77f20, sched_find_first_bit()
only supports 100-bit bitmaps.
Thus, spu_sched->bitmap should be treated by generic find_first_bit().
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Sebastian Siewior <cbe-oss-dev@ml.breakpoint.cc>
The 'file' argument is unused in spufs_run_spu(). This change removes
it.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The SPU decrementer should be restored after the LSCSA DMA has
completed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
No need to halt the SPE decrementer at context restore step 47, it will
be done in step 7.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
At save step 8, the mfc control register in the CSA should be written
_only_ with Sc and Sm bits (at least MFC_CNTL[Dh] should be set to 0)
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is valid only in the sequence of context
restore. Thus, it's nonsense to read and/or write it through spufs.
This patch changes decr_status node to access MFC_CNTL[Ds] in the CSA.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is confusedly used as two meanings:
* SPU decrementer was running
* SPU decrementer was wrapped as a result of adjust
and the code to set decr_status is missing.
This patch fixes these problems by using the decr_status argument as a
set of flags. This requires a rebuild of the shipped spu_restore code.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The following steps are not needed in the SPE context save/restore
paths:
Save Step 12: save_mfc_decr()
save suspend_time to CSA (It will be done by step 14)
save ch 7 (decrementer value will be saved in LSCSA by spe-side step 10)
Restore Step 59: restore_ch_part1()
restore ch 1 (it will be done by spe-side step 15)
This change removes the unnecessary steps.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Based on a fix from Masato Noguchi <Masato.Noguchi@jp.sony.com>.
Remove the (incorrect) array size declarations in the spufs channel
arrays, and use ARRAY_SIZE rather than hardcoded values.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Currently a process is removed from the physical spu when spu_acquire_saved
is saved but never put back. This patch adds a new spu_release_saved
that is to be paired with spu_acquire_saved and put the process back if
it has been in RUNNABLE state before.
Niether Jeremy not be are entirely happy about this exact patch because
it adds another spu_activate call outside of the owner thread, but I
feel this is the best short-term fix we can come up with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
In 6cbf93960e64f313f6e247cbca7afaa50e3ee2c we added a WARN_ON for
calling spu_deactivate on contexts created with the SPU_CREATE_NOSCHED
flag. However, all NOSCHED contexts will need to be deactivated when
the context is destroyed, so this gives a spurious warning when any
NOSCHED context is closed.
This change removes the WARN_ON.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Reading from the signal{1,2} files requires a spu_acquire_saved, so
make these files write-only for contexts created with
SPU_CREATE_NOSCHED.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The current SPU context saving procedure in SPUFS unexpectedly
restarts MFC when halting decrementer, because MFC_CNTL[Dh] is set
without MFC_CNTL[Sm]. This bug causes, for example, saving broken DMA
queues. Here is a patch to fix the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
WARNING: arch/powerpc/platforms/cell/spufs/spufs.o(.init.text+0x158): Section
mismatch: reference to .exit.text:.spu_sched_exit (between '.init_module' and
'.spu_sched_init')
was introduced by c99c1994a2
This patch removes the warning.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
83c54070ee broke spufs by incorrectly
updating the code, this patch gets it to compile again.
It's probably still broken due to the scheduler changes, but this
at least makes sure cell kernels can still be built.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function backing_ops->read_mfc_tagstatus() doesn't return a
correct value because the dma_tagstatus_R register isn't saved in
CSA. This fixes the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When waiting for I/O events on mfc in an SPU context by using
poll/epoll syscalls, some of the events can be lost because of wrong
order of poll_wait and MFC status checks in the spufs_mfc_poll
function and non-atomic update of tagwait. This fixes the
problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_activate can be called from multiple threads at the same time on
behalf of the same spu context. We need to make sure to only add it
once to avoid runqueue corruption.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Only enable the scheduler tick if we have any context waiting to be
scheduled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're currently too permissive with counting libassist calls - fix the
check on the SPE stop-and-signal status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Provide load average information for spu context. The format
is identical to /proc/loadavg, which is also where a lot of code
and concepts is borrowed from.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The new tid file contains the ID of the thread currently running the
context, if any. This is used so that the new spu-top and spu-ps
tools can find the thread in /proc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove redundant whitespace in arch/powerpc/platforms/cell/spufs/
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_dir_inode_operations is exactly the same as
simple_dir_inode_operations. Use that instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
And last but not least we need to make sure the scheduler tick never
preempts a nosched context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_deactivate should never be called for nosched contets. Put in
a check so we can print a stacktrace and exit early in case it
happes erroneously.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a cpus_allowed allowed filed to struct spu_context so that we always
use the cpu mask of the owning thread instead of the one happening to
call into the scheduler. Also use this information in
grab_runnable_context to avoid spurious wakeups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update scheduling information on every spu_run to allow for setting
threads to realtime priority just before running them. This requires
some slightly ugly code in spufs_run_spu because we can just update
the information unlocked if the spu is not runnable, but we need to
acquire the active_mutex when it is runnable to protect against
find_victim. This locking scheme requires opencoding
spu_acquire_runnable in spufs_run_spu which actually is a nice cleanup
all by itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Print out a few scheduler tuning parameters when we've compiled
with DEBUG defined.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current timeslice code mixes 'jiffies' up with 'spesched ticks'. This
change correctly defines the number of time slices each SPE contexts is
given, and clarifies the comment.
This brings the default timeslice for SPE contexts into a reasonable
range.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Enable preemptive scheduling for non-RT contexts.
We use the same algorithms as the CPU scheduler to calculate the time
slice length, and for now we also use the same timeslice length as the
CPU scheduler. This might be not enough for good performance and can be
changed after some benchmarking.
Note that currently we do not boost the priority for contexts waiting
on the runqueue for a long time, so contexts with a higher nice value
could starve ones with less priority. This could easily be fixed once
the rework of the spu lists that Luke and I discussed is done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Get rid of the scheduler workqueues that complicated things a lot to
a dedicated spu scheduler thread that gets woken by a traditional
scheduler tick. By default this scheduler tick runs a HZ * 10, aka
one spu scheduler tick for every 10 cpu ticks.
Currently the tick is not disabled when we have less context than
available spus, but I will implement this later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a bit define from book, and replace one hex number with a
symbol, for clarity.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently it fails with gcc from sdk 2.1 because of a spec change [1].
Maybe we should start using the definitions from spu_mfcio.h.
[1] http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01598.html
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a "capabilities" file to spu contexts consisting of a
list of linefeed separated capability names. The current exposed
capabilities are "sched" (the context is scheduleable) and
"step" (the context supports single stepping).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for SPU single stepping. The single
step bit is set in the SPU when the current process is
being single-stepped via ptrace. The spu then stops and
returns with a specific flag set and the syscall exit code
will generate the SIGTRAP.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The error path in spufs_fill_dir() is broken. If d_alloc_name() or
spufs_new_file() fails, spufs_prune_dir() is getting called. At this time
dir->inode is not set and a NULL pointer is dereferenced by mutex_lock().
This bugfix replaces spufs_prune_dir() with a shorter version that does
not touch dir->inode but simply removes all children.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Nosched context sould never be scheduled out, thus we must not
deactivate them in spu_yield ever.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix the race between checking for contexts on the runqueue and actually
waking them in spu_deactive and spu_yield.
The guts of spu_reschedule are split into a new helper called
grab_runnable_context which shows if there is a runnable thread below
a specified priority and if yes removes if from the runqueue and uses
it. This function is used by the new __spu_deactivate hepler shared
by preemption and spu_yield to grab a new context before deactivating
a specified priority and if yes removes if from the runqueue and uses
it. This function is used by the new __spu_deactivate hepler shared
by preemption and spu_yield to grab a new context before deactivating
the old one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make sure the mapping_lock also protects access to the various address_space
pointers used for tearing down the ptes on a spu context switch.
Because unmap_mapping_range can sleep we need to turn mapping_lock from
a spinlock into a sleeping mutex.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In case spufs_fill_dir() fails only put_spu_context()
gets called for cleanup and the acquired mm_struct never gets freed.
Signed-off-by: Sebastian Siewior <bigeasy@linux.vnet.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Previously, closing a SPE gang that still has contexts would trigger
a WARN_ON, and leak the allocated gang.
This change fixes the problem by using the gang's reference counts to
destroy the gang instead. The gangs will persist until their last
reference (be it context or open file handle) is gone.
Also, avoid using statements with side-effects in a WARN_ON().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently spufs_mem_release and the mem file doesn't have any release
method hooked up, leading to leaks everytime is used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As noticed by David Woodhouse, it's currently possible to mount
spufs on any machine, which means that it actually will get
mounted by fedora.
This refuses to load the module on platforms that have no
support for SPUs.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds an option to spufs when the kernel is configured for
4K page to give it the ability to use 64K pages for SPE local store
mappings.
Currently, we are optimistic and try order 4 allocations when creating
contexts. If that fails, the code will fallback to 4K automatically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change fixes the case where spu_base and spufs are initialised on a
system with no SPEs - unconditionally create the spu_lists so spu_alloc
doesn't explode, and check for spu_management ops before starting spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
arch/powerpc/platforms/cell/spu_base.c | 7 ++++---
arch/powerpc/platforms/cell/spufs/inode.c | 5 +++++
2 files changed, 9 insertions(+), 3 deletions(-)
- remove the spu_acquire_runnable from spu_run_init. I need to
opencode it in spufs_run_spu in the next patch
- remove various inline attributes, we don't really want to inline
long functions with multiple callsites
- cleanup return values and runcntl_write calls in spu_run_init
- use normal kernel codingstyle in spu_reacquire_runnable
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Dynamically allocated read/write buffer in spufs_arch_write_note() will
not be freed. Convert it to get_free_page at the same time.
Cc: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Change the loop in spu_wait to be a little more straightforward.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Add a 'mode=' option to spufs mount arguments. This allows more
control over access to the top-level spufs directory.
Tested on Cell.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
GCC may generates inline copy loop to handle memcpy() function
instead of kernel defined memcpy(). But this inlined version of memcpy()
causes an alignment interrupt when copying from local store.
This patch uses memcpy_fromio() and memcpy_toio to copy local store
to prevent memcpy() being inlined.
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
We now have proper locking around assignets of the mapping pointers,
and the spin_unlock implies enough of a barrier to get rid of the
explicit one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
When SPU isolation mode enabled, isolated_loader would be
allocated by spufs_init_isolated_loader() on module_init().
But anyone do not free it.
This patch introduces spufs_exit_isolated_loader() which is
the opposite of spufs_init_isolated_loader() and called on
module_exit().
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
spufs module_init forgot to call a few cleanup functions
on error path. This patch also includes cosmetic changes in
spu_sched_init() (identation fix and return error code).
[modified by hch to apply ontop of the latest schedule changes]
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch checks return value of spu_acquire_runnable() in
spufs_mfc_write().
Signed-off-by: Akinobu Mita <mita@fixstars.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no reason for run_sema to be a struct semaphore. Changing
it to a mutex and rename it accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This change populates a siginfo struct for SPE application exceptions
(ie, invalid DMAs and illegal instructions).
Tested on an IBM Cell Blade.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Until now, we have always entered the spu page fault handler
with a mutex for the spu context held. This has multiple
bad side-effects:
- it becomes impossible to suspend the context during
page faults
- if an spu program attempts to access its own mmio
areas through DMA, we get an immediate livelock when
the nopage function tries to acquire the same mutex
This patch makes the page fault logic operate on a
struct spu_context instead of a struct spu, and moves it
from spu_base.c to a new file fault.c inside of spufs.
We now also need to copy the dar and dsisr contents
of the last fault into the saved context to have it
accessible in case we schedule out the context before
activating the page fault handler.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition to stop_wq needs to happen before adding to the runqeueue and
under the same lock so that we don't have a race window for a lost
wake up in the spu scheduler.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
For quite a while now spu state is protected by a simple mutex instead
of the old rw_semaphore, and this means we can simplify the locking
around spu_setup_isolated a lot.
Instead of doing an spu_release before entering spu_setup_isolated and
then calling the complicated spu_acquire_exclusive we can now simply
enter the function locked an in guaranteed runnable state, so that the
only bit of spu_acquire_exclusive that's left is the call to
spu_unmap_mappings.
Similarly there's no more need to unlock and reacquire the state_mutex
when spu_setup_isolated is done, but we can always return with the
lock held and only drop it in spu_run_init in the failure case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
A single context should only be woken once, and we should not have
more wakeups for a given priority than the number of contexts on
that runqueue position.
Also add some asserts to trap future problems in this area more
easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
set_bit does not guarantee ordering on powerpc, so using it
for communication between threads requires explicit
mb() calls.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
To not lose a spu thread we need to make sure it always gets put back
on the runqueue. In find_victim aswell as in the scheduler tick as done
in the previous patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
To not lose a spu thread we need to make sure it always gets put back
on the runqueue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Make sure the pointers to various mappings are cleared once the last
user stopped using them. This avoids accessing freed memory when
tearing down the gang directory aswell as optimizing away
pte invalidations if no one uses these.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The scheduler workqueue may rearm itself and deadlock when we try to stop
it. Put a flag in place to avoid skip the work if we're tearing down
the context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is no reason to yield the CPU in spu_yield - if the backing
thread reenters spu_run it gets added to the end of the runqueue for
it's priority. So the yield is just a slowdown for the case where
we have higher priority contexts waiting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPU code doesn't properly invalidate SPUs SLBs when necessary,
for example when changing a segment size from the hugetlbfs code. In
addition, it saves and restores the SLB content on context switches
which makes it harder to properly handle those invalidations.
This patch removes the saving & restoring for now, something more
efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
that can be used by the core mm code to flush the SLBs of all SPEs that
are running a given mm at the time of the flush.
In order to do that, it adds a spinlock to the list of all SPEs and move
some bits & pieces from spufs to spu_base.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Due to a buggy unsigned comparison, it was possible to write
beyond the end of the local store file in spufs under some
circumstances.
This rewrites the buggy function to look more like
simple_copy_from_buffer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
I found an exploit in current kernel.
Currently, there is no range check about mmapping "/mem" node in
spufs. Thus, an application can access privilege memory region.
In case this kernel already worked on a public server, I send this
information only here.
If there are such servers in somewhere, please replace it, ASAP.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
For SCHED_RR tasks we can do some really trivial timeslicing. Basically
we fire up a time for every scheduler tick that searches for a higher
or same priority thread that is on the runqueue and if there is one
context switches to it. Because we can't lock spus from timer context
we actually run this from a delayed runqueue instead of a timer.
A nice optimization would be to skip the actual priority bitmap search
when there are less contexts than physical spus available. To implement
this I need a so far unpublished patch from Andre, and it will be added
after we have that patch in.
Note that right now we only do the time slicing for SCHED_RR tasks.
The code would work for SCHED_OTHER tasks aswell, but their prio
value is defered from the one the PPU thread has at time of spu_run,
and using this for spu scheduling decisions would make the code very
unfair. SCHED_OTHER support will be enabled once we the spu scheduler
knows how to calculcate cpu_context.prio (very soon)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
use DECLARE_BITMAP in the spu scheduler instead of reimplementing it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we start a spu context with realtime priority we want it to run
immediately and not wait until some other lower priority thread has
finished. Try to find a suitable victim and use it's spu in this
case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Give spu_yield a kerneldoc comment and remove the old comment
documenting spu_activate, spu_deactive and spu_yield as all of them
now have descriptive kerneldoc comments of their own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
If we call spu_remove_from_active_list that spu is always guaranteed
to be on the active list and in runnable state, so we can simply
do a list_del to remove it and unconditionally take the was_active
codepath.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no need to directly wake up contexts in spu_activate when
called from spu_run, so add a flag to surpress this wakeup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This is the biggest patch in this series, and it reworks the guts of
the spu scheduler runqueue mechanism:
- instead of embedding a waitqueue in the runqueue there is now a
simple doubly-linked list, the actual wakeups happen by reusing
the stop_wq in the spu context (maybe we should rename it one day)
- spu_free and spu_prio_wakeup are merged into a single spu_reschedule
function
- various functionality is split out into small helpers, and kerneldoc
comments are added in various places to document what's going on.
- spu_activate is rewritten into a tight loop by removing test for
various impossible conditions and using the infrastructure in this
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It doesn't make any sense to have a priority field in the physical spu
structure. Move it into the spu context instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups in code surrounding the state semaphore:
- inline spu_acquire/spu_release
- cleanup spu_acquire_* and add kerneldoc comments to these functions
- remove spu_release_exclusive and replace it with spu_release
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The r/w semaphore to lock the spus was overkill and can be replaced
with a mutex to make it faster, simpler and easier to debug. It also
helps to allow making most spufs interruptible in future patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Various cleanups to sched.c that don't change the global control flow:
- add kerneldoc comments to various functions
- add spu_ prefixes to various functions
- add/remove context from the runqueue in bind/unbind_context as
it's part of the logical operation
- add a call to put_active_spu to spu_unbind_contex as it's logically
part of the unbind operation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Only bind_context/unbind_context change the spu context state. Thus
we can move all assignents of SPU_STATE_RUNNABLE into bind_context,
which parallels the unbind side aswell.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
unbind_context already sets the context state to SPU_STATE_SAVED, thus
the spu_deactivate callers don't need to do it again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the empty last line in arch/powerpc/platforms/cell/spufs/run.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Remove the SPU_CONTEXT_PREEMPT define. It's unused and won't be used
in this form after the scheduler rework.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
It looks like we've had some serious bitrot there mostly due to tracking
of address_space's of mmap'ed files getting out of sync with the actual
mmap code. The mfc, mss and psmap were not tracked properly and thus
not invalidated on context switches (oops !)
I also removed the various file->f_mapping = inode->i_mapping;
assignments that were done in the other open() routines since that
is already done for us by __dentry_open.
One improvement we might want to do later is to assign the various
ctx-> fields at mmap time instead of file open/close time so that we
don't call unmap_mapping_range() on thing that have not been mmap'ed
Finally, I added some smp_wmb's after assigning the ctx-> fields to make
sure they are visible to other CPUs. I don't think this is really
necessary as I suspect locking in the fs layer will make that happen
anyway but better safe than sorry.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the need for struct page for SPE local store
and registers from spufs. It also makes the locking much more
obvious and no longer relying on the truncate logic black magic
for protecting against races between unmap_mapping_range() and
new pages faulted in. It does so by switching to a nopfn() handler
and using the new vm_insert_pfn() to setup the PTEs itself while
holding a lock on the SPE.
The nice thing is that this patch actually removes a lot more code
than it adds :-)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits)
[POWERPC] Generic BUG for powerpc
[PPC] Fix compile failure do to introduction of PHY_POLL
[POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set
[POWERPC] Remove old dcr.S
[POWERPC] Fix SPU coredump code for max_fdset removal
[POWERPC] Fix irq routing on some 32-bit PowerMacs
[POWERPC] ps3: Add vuart support
[POWERPC] Support ibm,dynamic-reconfiguration-memory nodes
[POWERPC] dont allow pSeries_probe to succeed without initialising MMU
[POWERPC] micro optimise pSeries_probe
[POWERPC] Add SPURR SPR to sysfs
[POWERPC] Add DSCR SPR to sysfs
[POWERPC] Fix 440SPe CPU table entry
[POWERPC] Add support for FP emulation for the e300c2 core
[POWERPC] of_device_register: propagate device_create_file return code
[POWERPC] Fix mmap of PCI resource with hack for X
[POWERPC] iSeries: head_64.o needs to depend on lparmap.s
[POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group
[POWERPC] Remove QE header files from lite5200.c
[POWERPC] of_platform_make_bus_id(): make `magic' int
...
Commit bbea9f6966 removed the max_fdset
element of struct fdtable. It appears that checking max_fds is
sufficient now.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, we only send a sigtrap if the current task is being ptraced.
This is somewhat inconsistant, and it breaks utrace support in fedora.
Removing the check should do the right thing in all cases.
Cc: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This changes the spu_create system call to return an error (-ENODEV) if
and isolated spu context is requested on hardware that doesn't support
isolated mode.
Tested on systemsim with and without isolation support
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds SPU elf notes to the coredump. It creates a separate note
for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1,
/signal1_type, /signal2, /signal2_type, /event_mask, /event_status,
/mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id.
A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to
specify they have extra elf core notes.
A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the
additional notes could be calculated and added to the notes phdr entry.
A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes
would be written after the existing notes.
The SPU coredump code resides in spufs. Stub functions are provided in the
kernel which are hooked into the spufs code which does the actual work via
register_arch_coredump_calls().
A new set of __spufs_<file>_read/get() functions was provided to allow the
coredump code to read from the spufs files without having to lock the
SPU context for each file read from.
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
In order to fit with the "don't-run-spus-outside-of-spu_run" model, this
patch starts the isolated-mode loader in spu_run, rather than
spu_create. If spu_run is passed an isolated-mode context that isn't in
isolated mode state, it will run the loader.
This fixes potential races with the isolated SPE app doing a
stop-and-signal before the PPE has called spu_run: bugzilla #29111.
Also (in conjunction with a mambo patch), this addresses #28565, as we
always set the runcntrl register when entering spu_run.
It is up to libspe to ensure that isolated-mode apps are cleaned up
after running to completion - ie, put the app through the "ISOLATE EXIT"
state (see Ch11 of the CBEA).
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change adds a read accessor for the SPE problem-state run control
register.
This is required for for applying (userspace) changes made to the run
control register while the SPE is stopped - simply asserting the master
run control bit is not sufficient. My next patch for isolated-mode
setup requires this.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When the user changes the runcontrol register, an SPU might be
running without a process being attached to it and waiting for
events. In order to prevent this, make sure we always disable
the priv1 master control when we're not inside of spu_run.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When fixing spufs to map the 'mem' file backing store cacheable,
I incorrectly set the physical mapping to use both cache-inhibited
and guarded mapping, which resulted in a serious performance
degradation.
Debugged-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When one of the spufs files is mapped into a process address
space, regular users can use ptrace to attempt accessing
them with access_process_vm(). With the way that the
mappings currently work, this likely causes an oops.
Setting the vm_flags to VM_IO makes sure that ptrace can
not access them but returns an error code. This is not
the perfect solution in case of the local store mapping,
but it fixes the oops in a well-defined way.
Also remove leftover VM_RESERVED flags in spufs. The
VM_RESERVED flag is on it's way out and not checked by
the memory managment code anymore.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Christoph Hellwig <chellwig@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When there is pending signals, current spufs_run_spu() always returns
-ERESTARTSYS and it is called again automatically.
But, if spe already stopped by stop-and-signal or halt instruction,
returning -ERESTARTSYS makes stop-and-signal/halt lost and
spu run over the end-point.
For your convenience, I attached a sample code to restage this bug.
If there is no bug, printed NPC will be 0x4000.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When we attempt an MFC DMA to an unmapped address, the event
returned from spu_run should be SPE_EVENT_SPE_DATA_STORAGE,
not SPE_EVENT_INVALID_DMA.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to check the channel count of the signal notification registers
before reading them, because it can be undefined when the count is
zero. In order to read count and data atomically, we read from the
saved context.
This patch uses spu_acquire_saved() to force a context save before a
/signal1 or /signal2 read. Because of this it is no longer necessary to
have backing_ops and hw_ops versions of this function so they have been
removed.
Regular applications should not rely on reading this register
to be fast, as it's conceptually a write-only file from the PPE
perspective.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch implements read only access to
/mbox_info - SPU Write Outbound Mailbox
/ibox_info - SPU Write Outbound Interrupt Mailbox
/wbox_info - SPU Read Inbound Mailbox
These files are used by gdb in order to look into the current mailbox
queues without changing the contents at the same time. They are
not meant for general programming use, since the access requires
a context save and is therefore rather slow.
It would be good to complement this patch with one that adds
write support as well.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes the /spu_tag_mask file from spufs. The data provided by
this file is also available from the /dma_info file in the dma_info_mask
of the spu_dma_info struct.
The file was intended to be used by gdb, but that never used it, and
now it has been replaced with the more verbose dma_info file.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The /lslr file gives read access to the SPU_LSLR register in hex; 0x3fff
for example The /dma_info file provides read access to the SPU Command
Queue in a binary format. The /proxydma_info files provides read access
access to the Proxy Command Queue in a binary format. The spu_info.h
file provides data structures for interpreting the binary format of
/dma_info and /proxydma_info.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patches changes /npc, /decr, /decr_status, /spu_tag_mask,
/event_mask, /event_status, and /srr0 files to provide output according to
the format string "0x%llx" instead of "%llx".
Before this patch some files used "0x%llx" and other used "%llx" which is
inconsistent and potentially confusing. A user might assume "%llx" numbers
were decimal if they happened to not contain any a-f digits. This change
will break any code cannot tolerate a leading 0x in the file contents. The
only known users of these files are the libspe but there might also be
some scripts which access these files. This risk is deemed acceptable for
future consistency.
Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When in isolated mode, SPEs have access to an area of persistent
storage, which is per-SPE. In order for isolated-mode apps to
communicate arbitrary data through this storage, we need to ensure that
isolated physical SPEs can be reused for subsequent applications.
Add a file ("recycle") in a spethread dir to enable isolated-mode
recycling. By writing to this file, the kernel will reload the
isolated-mode loader kernel, allowing a new app to be run on the same
physical SPE.
This requires the spu_acquire_exclusive function to enforce exclusive
access to the SPE while the loader is initialised.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds general support for isolated mode SPE apps.
Isolated apps are started indirectly, by a dedicated loader "kernel".
This patch starts the loader when spe_create is invoked with the
ISOLATE flag. We do this at spe_create time to allow libspe to pass the
isolated app in before calling spe_run.
The loader is read from the device tree, at the location
"/spu-isolation/loader". If the loader is not present, an attempt to
start an isolated SPE binary will fail with -ENODEV.
Update: loader needs to be correctly aligned - copy to a kmalloced buf.
Update: remove workaround for systemsim/spurom 'L-bit' bug, which has
been fixed.
Update: don't write to runcntl on spu_run_init: SPU is already running.
Update: do spu_setup_isolated earlier
Tested on systemsim.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds two new flags to spu_create:
SPU_CREATE_NONSCHED: create a context that is never moved
away from an SPE once it has started running. This flag
can only be used by tasks with the CAP_SYS_NICE capability.
SPU_CREATE_ISOLATED: create a nonschedulable context that
enters isolation mode upon first run. This requires the
SPU_CREATE_NONSCHED flag.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SPRN_SDR1 and the SPE's MFC SDR are hypervisor resources and
are not accessible from a logical partition. This change adds an
access wrapper.
When running on bare H/W, the spufs needs to only set the SPE's MFC SDR
to the value of the PPE's SPRN_SDR1 once at SPE initialization, so this
change renames mfc_sdr_set() to mfc_sdr_setup() and moves the
access of SPRN_SDR1 into the mmio wrapper. It also removes the now
unneeded member mfc_sdr_RW from struct spu_priv1_collapsed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, spufs_mbox_read transfers more bytes than requested on a
read. If you ask for four bytes, you get eight. This fixes it to
transfer the largest multiple of four bytes that is less than or equal
to the number you asked for.
Note: one nasty property of this file in spufs is that you can only
read multiples of four bytes in the first place, since there is no way
to atomically put back a few bytes into the hardware register. Thus,
reading less than four bytes returns -EINVAL. Asking for more than
four returns the largest possible multiple of four.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the /signal2 file to actually give signal2 data.
Signed-off-by: Dwayne Grant Mcconnell <decimal@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a memory leak introduced by "spufs: add support
for read/write oncntl", which was missing a call to simple_attr_close.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds an 'object-id' file that the spe library can
use to store a pointer to its ELF object. This was
originally meant for use by oprofile, but is now
also used by the GNU debugger, if available.
In order for oprofile to find the location in an spu-elf
binary where an event counter triggered, we need a way
to identify the binary in the first place.
Unfortunately, that binary itself can be embedded in a
powerpc ELF binary. Since we can assume it is mapped into
the effective address space of the running process,
have that one write the pointer value into a new spufs
file.
When a context switch occurs, pass the user value to
the profiler so that can look at the mapped file (with
some care).
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Writing to cntl can be used to stop execution on the
spu and to restart it, reading from cntl gives the
contents of the current status register.
The access is always in ascii, as for most other files.
This was always meant to be there, but we had a little
problem with writing to runctl so it was left out so
far.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since libspe2 will provide a function that can read/write
multiple mailbox elements at once, the kernel should handle
that efficiently.
read/write on the three mailbox files can now access the
spe context multiple times to operate on any number of
mailbox data elements.
If the spu application keeps writing to its outbound
mailbox, the read call will pick up all the data in a
single system call.
Unfortunately, if the user passes an invalid pointer,
we may lose a mailbox element on read, since we can't
put it back. This probably impossible to solve, if the
user also accesses the mailbox through direct register
access.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This hopefully fixes a long-standing bug in the spu file system.
An spu context comes with local memory that can be either saved
in kernel pages or point directly to a physical SPE.
When mapping the physical SPE, that mapping needs to be cache-inhibited.
For simplicity, we used to map the kernel backing memory that way
too, but unfortunately that was not only inefficient, but also incorrect
because the same page could then be accessed simultaneously through
a cacheable and a cache-inhibited mapping, which is not allowed
by the powerpc specification and in our case caused data inconsistency
for which we did a really ugly workaround in user space.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the concept of a gang to spufs as a new type of object.
So far, this has no impact whatsover on scheduling, but makes
it possible to add that later.
A new type of object in spufs is now a spu_gang. It is created
with the spu_create system call with the flags argument set
to SPU_CREATE_GANG (0x2). Inside of a spu_gang, it
is then possible to create spu_context objects, which until
now was only possible at the root of spufs.
There is a new member in struct spu_context pointing to
the spu_gang it belongs to, if any. The spu_gang maintains
a list of spu_context structures that are its children.
This information can then be used in the scheduler in the
future.
There is still a bug that needs to be resolved in this
basic infrastructure regarding the order in which objects
are removed. When the spu_gang file descriptor is closed
before the spu_context descriptors, we leak the dentry
and inode for the gang. Any ideas how to cleanly solve
this are appreciated.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This tries to fix spufs so we have an interface closer to what is
specified in the man page for events returned in the third argument of
spu_run.
Fortunately, libspe has never been using the returned contents of that
register, as they were the same as the return code of spu_run (duh!).
Unlike the specification that we never implemented correctly, we now
require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in
order to get the new behavior. When this flag is not passed, spu_run
will simply ignore the third argument now.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For better explanation, I break down the page fault handling into steps:
1) There is a page fault caused by DMA operation initiated by SPU and
DMA is suspended.
2) The interrupt handler 'spu_irq_class_1()/__spu_trap_data_map()' is
called and it just wakes up the sleeping spe-manager thread.
3) by PPE scheduler, the corresponding bottom half,
spu_irq_class_1_bottom() is called in process context and DMA is
restarted.
There can be a quite large time gap between 2) and 3) and I found
the following problem:
Between 2) and 3) If the context becomes unbound, 3) is not executed
because when the spe-manager thread is awaken, the context is already
saved. (This situation can happen, for example, when a high priority spe
thread newly started in that time gap)
But the actual problem is that the corresponding SPU context does not
work even if it is bound again to a SPU.
Besides I can see the following warning in mambo simulator when the
context becomes
unbound(in save_mfc_cmd()), i.e. when unbind() is called for the
context after step 2) before 3) :
'WARNING: 61392752237: SPE2: MFC_CMD_QUEUE channel count of 15 is
inconsistent with number of available DMA queue entries of 16'
After I go through available documents, I found that the problem is
because the suspended DMA is not restarted when it is bound again.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds NUMA support to the the spufs scheduler.
The new arch/powerpc/platforms/cell/spufs/sched.c is greatly
simplified, in an attempt to reduce complexity while adding
support for NUMA scheduler domains. SPUs are allocated starting
from the calling thread's node, moving to others as supported by
current->cpus_allowed. Preemption is gone as it was buggy, but
should be re-enabled in another patch when stable.
The new arch/powerpc/platforms/cell/spu_base.c maintains idle
lists on a per-node basis, and allows caller to specify which
node(s) an SPU should be allocated from, while passing -1 tells
spu_alloc() that any node is allowed.
Since the patch removes the currently implemented preemptive
scheduling, it is technically a regression, but practically
all users have since migrated to this version, as it is
part of the IBM SDK and the yellowdog distribution, so there
is not much point holding it back while the new preemptive
scheduling patch gets delayed further.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new "psmap" file to spufs that allows mmap of all of
the problem state mapping of SPEs. It is compatible with 64k pages. In
addition, it removes mmap ability of individual files when using 64k
pages, with the exception of signal1 and signal2 which will both map the
entire 64k page holding both registers. It also removes
CONFIG_SPUFS_MMAP as there is no point in not building mmap support in
spufs.
It goes along a separate patch to libspe implementing usage of that new
file to access problem state registers.
Another patch will follow up to fix races opened up by accessing
the 'runcntl' register directly, which is made possible with this
patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (29 commits)
[POWERPC] Fix rheap alignment problem
[POWERPC] Use check_legacy_ioport() for ISAPnP
[POWERPC] Avoid NULL pointer in gpio1_interrupt
[POWERPC] Enable generic rtc hook for the MPC8349 mITX
[POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class
[POWERPC] Create a "wrapper" script and use it in arch/powerpc/boot
[POWERPC] fix spin lock nesting in hvc_iseries
[POWERPC] EEH failure to mark pci slot as frozen.
[POWERPC] update powerpc defconfig files after libata kconfig breakage
[POWERPC] enable sysrq in pmac32_defconfig
[POWERPC] UPIO_TSI cleanup
[POWERPC] rewrite mkprep and mkbugboot in sane C
[POWERPC] maple/pci iomem annotations
[POWERPC] powerpc oprofile __user annotations
[POWERPC] cell spufs iomem annotations
[POWERPC] NULL noise removal: spufs
[POWERPC] ppc math-emu needs -fno-builtin-fabs for math.c and fabs.c
[POWERPC] update mpc8349_itx_defconfig and remove some debug settings
[POWERPC] Always call cede in pseries dedicated idle loop
[POWERPC] Fix loop logic in irq_alloc_virt()
...
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits)
[POWERPC] Use little-endian bit from firmware ibm,pa-features property
[POWERPC] Make sure smp_processor_id works very early in boot
[POWERPC] U4 DART improvements
[POWERPC] todc: add support for Time-Of-Day-Clock
[POWERPC] Make lparcfg.c work when both iseries and pseries are selected
[POWERPC] Fix idr locking in init_new_context
[POWERPC] mpc7448hpc2 (taiga) board config file
[POWERPC] Add tsi108 pci and platform device data register function
[POWERPC] Add general support for mpc7448hpc2 (Taiga) platform
[POWERPC] Correct the MAX_CONTEXT definition
powerpc: minor cleanups for mpc86xx
[POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set
[POWERPC] Simplify the code defining the 64-bit CPU features
[POWERPC] powerpc: kconfig warning fix
[POWERPC] Consolidate some of kernel/misc*.S
[POWERPC] Remove unused function call_with_mmu_off
[POWERPC] update asm-powerpc/time.h
[POWERPC] Clean up it_lp_queue.h
[POWERPC] Skip the "copy down" of the kernel if it is already at zero.
[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
...
In the context save/restore code, the SPU MFC command queue purge
code has a bug:
static inline void wait_purge_complete(struct spu_state *csa, struct
spu *spu)
{
struct spu_priv2 __iomem *priv2 = spu->priv2;
/* Save, Step 28:
* Poll MFC_CNTL[Ps] until value '11' is
* read
* (purge complete).
*/
POLL_WHILE_FALSE(in_be64(&priv2->mfc_control_RW)
& MFC_CNTL_PURGE_DMA_COMPLETE);
}
This will exit as soon as _one_ of the 2 bits that compose
MFC_CNTL_PURGE_DMA_COMPLETE is set, and one of them happens to be
"purge in progress"... which means that we'll happily continue
restoring the MFC while it's being purged at the same time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a bug where we don't properly map SPE MMIO space as guarded,
causing various test cases to fail, probably due to write combining and other
niceties caused by the lack of the G bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
locking init cleanups:
- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner
this patch was generated automatically.
Motivation:
- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SPU context save/restore code is currently built
for a 4k page size and we provide a _shipped version
of it since most people don't have the spu toolchain
that is needed to rebuild that code.
This patch hardcodes the data structures to a 64k
page alignment, which also guarantees 4k alignment
but unfortunately wastes 60k of memory per SPU
context that is created in the running system.
We will follow up on this with another patch to
reduce that overhead or maybe redo the context
save/restore logic to do this part entirely different,
but for now it should make experimental systems
work with either page size.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At this time, all flags are invalid. Since we are
planning to actually add valid flags in the future,
we better check if any were passed by the user.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch remove 'stop_code' -- discarded member of struct spu.
It is written at initialize and interrupt, but never read
in current implementation.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the hypervisor abstraction of setting cpu affinity to a
higher level to avoid platform dependent interrupt controller
routines. I replaced spu_priv1_ops:spu_int_route_set() with a
new routine spu_priv1_ops:spu_cpu_affinity_set().
As a by-product, this change eliminated what looked like an
existing bug in the set affinity code where spu_int_route_set()
mistakenly called int_stat_get().
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
To support muti-platform binaries the spu hypervisor accessor
routines must have runtime binding.
I removed the existing statically linked routines in spu.h
and spu_priv1_mmio.c and created new accessor routines in spu_priv1.h
that operate indirectly through an ops struct spu_priv1_ops.
spu_priv1_mmio.c contains the instance of the accessor routines
for running on raw hardware.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The save/restore sequence for SPE contexts currently attempts to save
and restore the channel count for SPE channel 1 (the SPU_WriteEventMask
channel. But the CBE architecture (section 9.11.2) clearly states
that this channel does not have an associated count. Hardware simply
ignores the attempt to write this count, but the simulator generates
a warning message.
WARNING: 279721590: SPE7: Attempt to write channel count for CH 1 with
no associated count is ignored.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The wbox channel count of an spu is now initialized
to four for the saved context. This makes it possible
to write to the mailbox right away without waiting
for the SPE to become scheduled first.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For performance analysis, it is often interesting to know
which physical SPE a thread is currently running on, and,
more importantly, if it is running at all.
This patch adds a simple attribute to each SPU directory
with that information.
The attribute is read-only and called 'phys-id'. It contains
an ascii string with the number of the physical SPU (e.g.
"0x5"), or alternatively the string "0xffffffff" (32 bit -1)
when it is not running at all at the time that the file
is read.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs currently knows only 4k pages and 16M hugetlb
pages. Make it use the regular methods for deciding on
the SLB bits.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs_rmdir tries to acquire the spufs root
i_mutex, which is already held by spufs_create_thread.
This was tracked as Bug #H9512.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
A recent change to the way that the mfc file gets mapped made it
impossible to map the SPE Multi-Source Synchronization register
into user space, but that may be needed by some applications.
This restores the missing functionality.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The spu_base module is rather deeply intermixed with the
core kernel, so it makes sense to have that built-in.
This will let us extend the base in the future without
having to export more core symbols just for it.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use kzalloc when allocating a new spu context, rather than kmalloc +
zeroing.
Booted & tested on cell.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We found that when the 'decrementer' is saved, the PPE saves the current
time 'csa->suspend_time'. When restoring the 'decrementer', (Step 34)
decrementer seems to be adjusted with the number of cycles th= at a spu
thread has not been running.
In that code it is missing a substract ('-') because 'delta_time' is
assigned a not substracted(see bellow).
Acked-by: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Missing include for __NR_syscalls, and missing sys_splice() that
causes build-time failure due to compile-time bounds check on
spu_syscall_table.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
spufs_init and spufs_exit should be marked correctly so
they can be removed when not needed.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
the mfc member of a new context was not initialized to zero,
which potentially leads to wild memory accesses.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.
This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.
The signal1 & signal2 files may be mmap()'d by regular user
processes. The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.
The file is meant to be used from an abstraction in libspe
that is added by a different patch.
From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.
The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.
This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.
I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.
This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These symbols are only used in the file that they are defined in,
so they should not be in the global namespace.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPE Book IV indicates that MFC DMA operations must be
suspended and restored on SPU context switch (in Step 8).
This patch adds that operation, which is missing from the
current spufs implementation.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For far, all SPU triggered interrupts always end up on
the first SMT thread, which is a bad solution.
This patch implements setting the affinity to the
CPU that was running last when entering execution on
an SPU. This should result in a significant reduction
in IPI calls and better cache locality for SPE thread
specific data.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
One local variable is missing an __iomem modifier,
in another place, we pass a completely unused argument
with a missing __user modifier.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In a hypervisor based setup, direct access to the first
priviledged register space can typically not be allowed
to the kernel and has to be implemented through hypervisor
calls.
As suggested by Masato Noguchi, let's abstract the register
access trough a number of function calls. Since there is
currently no public specification of actual hypervisor
calls to implement this, I only provide a place that
makes it easier to hook into.
Cc: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Cc: Geoff Levand <geoff.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The logic for sys_spu_run keeps growing and it does
not really belong into file.c any more since we
moved away from using regular file operations to our
own syscall.
No functional change in here.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
checking bits manually might not be synchonized with
the use of set_bit/clear_bit. Make sure we always use
the correct bitops by removing the unnecessary
identifiers.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If creating one entry failed in spufs_fill_dir,
we never cleaned up the freshly created entries.
Fix this by calling the cleanup function on error.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If get_unused_fd failed in sys_spu_create, we never cleaned
up the created directory. Fix that by restructuring the
error path.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When spu_activate fails in spu_acquire_runnable, the
state must still be SPU_STATE_SAVED, we were
incorrectly setting it to SPU_STATE_RUNNABLE.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
During an earlier cleanup, we lost the serialization
of multiple spu_run calls performed on the same
spu_context. In order to get this back, introduce a
mutex in the spu_context that is held inside of spu_run.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Only checking for SPUFS_MAGIC is not reliable, because
it might not be unique in theory. Worse than that,
we accidentally allow spu_run to be performed on
any file in spufs, not just those returned from
spu_create as intended.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spu_forget will do mmput on the DMA address space,
which can lead to lots of other stuff getting triggered.
We better not hold a semaphore here that we might
need in the process.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to check for validity of owner under down_write,
down_read is not enough.
Noticed by Al Viro.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Output from hexdump with "%08x" depends on HOST platform's endian.
When building linux by cross toolchain, that difference makes errors.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoff.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
One of my last patches contained a broken line
from splitting out some other changes, this
restores a working version.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Handling mailbox interrupts was broken in multiple respects,
the combination of which was hiding the bugs most of the time.
- The ibox interrupt mask was open initially even though there
are no waiters on a newly created SPU.
- Acknowledging the mailbox interrupt did not work because
it is level triggered and the mailbox data is never retrieved
from inside the interrupt handler.
- The interrupt handler delivered interrupts with a disabled
mask if another interrupt is triggered for the same class
but a different mask.
- The poll function did not enable the interrupt if it had not
been enabled, so we might run into the poll timeout if none of
the other bugs saved us and no signal was delivered.
We probably still have a similar problem with blocking
read/write on mailbox files, but that will result in extra
wakeup in the worst case, not in incorrect behaviour.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch reduces lock complexity of SPU scheduler, particularly
for involuntary preemptive switches. As a result the new code
does a better job of mapping the highest priority tasks to SPUs.
Lock complexity is reduced by using the system default workqueue
to perform involuntary saves. In this way we avoid nasty lock
ordering problems that the previous code had. A "minimum timeslice"
for SPU contexts is also introduced. The intent here is to avoid
thrashing.
While the new scheduler does a better job at prioritization it
still does nothing for fairness.
From: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes it easier to preempt an SPU context by
having the scheduler hold ctx->state_sema for much shorter
periods of time.
As part of this restructuring, the control logic for the "run"
operation is moved from arch/ppc64/kernel/spu_base.c to
fs/spufs/file.c. Of course the base retains "bottom half"
handlers for class{0,1} irqs. The new run loop will re-acquire
an SPU if preempted.
From: Mark Nutter <mnutter@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
spufs is rather noisy when debugging is enabled, this
turns off the messages for production use.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With the new rules for reserved pages, the spufs now
needs working page reference counting.
I should probably look into converting to vm_insert_page,
but for now this patch makes spufs work again.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a scheduler for SPUs to make it possible to use
more logical SPUs than physical ones are present in the
system.
Currently, there is no support for preempting a running
SPU thread, they have to leave the SPU by either triggering
an event on the SPU that causes it to return to the
owning thread or by sending a signal to it.
This patch also adds operations that enable accessing an SPU
in either runnable or saved state. We use an RW semaphore
to protect the state of the SPU from changing underneath
us, while we are holding it readable. In order to change
the state, it is acquired writeable and a context save
or restore is executed before downgrading the semaphore
to read-only.
From: Mark Nutter <mnutter@us.ibm.com>,
Uli Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the source code that is used to generate spu_save_dump.h and
spu_restore_dump.h. Since a full spu tool chain is needed to
generate these files, the default remains to use the shipped
versions in order to keep the number of tools for building the
kernel down.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds the code needed to perform a context switch from
spufs, following the recommended 76-step sequence.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add some infrastructure for saving and restoring the context of an
SPE. This patch creates a new structure that can hold the whole
state of a physical SPE in memory. It also contains code that
avoids races during the context switch and the binary code that
is loaded to the SPU in order to access its registers.
The actual PPE- and SPE-side context switch code are two separate
patches.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is the current version of the spu file system, used
for driving SPEs on the Cell Broadband Engine.
This release is almost identical to the version for the
2.6.14 kernel posted earlier, which is available as part
of the Cell BE Linux distribution from
http://www.bsc.es/projects/deepcomputing/linuxoncell/.
The first patch provides all the interfaces for running
spu application, but does not have any support for
debugging SPU tasks or for scheduling. Both these
functionalities are added in the subsequent patches.
See Documentation/filesystems/spufs.txt on how to use
spufs.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>