Remove struct pt_regs * from all handlers.
Also remove the regs argument from get_irq() functions.
Compile tested with arch/powerpc/config/* and
arch/ppc/configs/prep_defconfig
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The "linux,tce-size" property is only 32 bits (see
prom_initialize_tce_table() in arch/powerpc/kernel/prom_init.c).
Treating it as an unsigned long in iommu_table_setparms() leads to
access beyond the end of the property's buffer, so we pass garbage to
the memset() in that function.
[boot]0020 XICS Init
i8259 legacy interrupt controller initialized
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
cpu 0x0: Vector: 300 (Data Access) at [c0000000fe783850]
pc: c000000000035e90: .memset+0x60/0xfc
lr: c000000000044fa4: .iommu_table_setparms+0xb0/0x158
sp: c0000000fe783ad0
msr: 9000000000009032
dar: c000000100000000
dsisr: 42010000
current = 0xc00000000450e810
paca = 0xc000000000411580
pid = 1, comm = swapper
enter ? for help
[link register ] c000000000044fa4 .iommu_table_setparms+0xb0/0x158
[c0000000fe783ad0] c000000000044f4c .iommu_table_setparms+0x58/0x158
(unreliable)
[c0000000fe783b70] c00000000004529c
.iommu_bus_setup_pSeries+0x1c4/0x254
[c0000000fe783c00] c00000000002b8ac .do_bus_setup+0x3c/0xe4
[c0000000fe783c80] c00000000002c924 .pcibios_fixup_bus+0x64/0xd8
[c0000000fe783d00] c0000000001a2d5c .pci_scan_child_bus+0x6c/0x10c
[c0000000fe783da0] c00000000002be28 .scan_phb+0x17c/0x1b4
[c0000000fe783e40] c0000000003cfa00 .pcibios_init+0x58/0x19c
[c0000000fe783ec0] c0000000000094b4 .init+0x1e8/0x3d8
[c0000000fe783f90] c000000000026e54 .kernel_thread+0x4c/0x68
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This should probably say "mpic" to save confusion.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (29 commits)
[POWERPC] Fix rheap alignment problem
[POWERPC] Use check_legacy_ioport() for ISAPnP
[POWERPC] Avoid NULL pointer in gpio1_interrupt
[POWERPC] Enable generic rtc hook for the MPC8349 mITX
[POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class
[POWERPC] Create a "wrapper" script and use it in arch/powerpc/boot
[POWERPC] fix spin lock nesting in hvc_iseries
[POWERPC] EEH failure to mark pci slot as frozen.
[POWERPC] update powerpc defconfig files after libata kconfig breakage
[POWERPC] enable sysrq in pmac32_defconfig
[POWERPC] UPIO_TSI cleanup
[POWERPC] rewrite mkprep and mkbugboot in sane C
[POWERPC] maple/pci iomem annotations
[POWERPC] powerpc oprofile __user annotations
[POWERPC] cell spufs iomem annotations
[POWERPC] NULL noise removal: spufs
[POWERPC] ppc math-emu needs -fno-builtin-fabs for math.c and fabs.c
[POWERPC] update mpc8349_itx_defconfig and remove some debug settings
[POWERPC] Always call cede in pseries dedicated idle loop
[POWERPC] Fix loop logic in irq_alloc_virt()
...
The last change for partport_pc did fix the common case for all PowerMacs,
but it broke the case for PCI multiport IO cards. In fact, the config
option CONFIG_PARPORT_PC_SUPERIO=y lead to a hard crash when cups probed
the parport driver. It enables the winbond and smsc probing.
Remove the PARPORT_BASE check again, parport_pc_find_nonpci_ports() will
take care of it. All powerpc configs should have
CONFIG_PARPORT_PC_SUPERIO=n, the code did not find anything on the chrp
boards we tested it on.
Tested on a G4/466 with a PCI card:
0001:10:13.0 Serial controller: Timedia Technology Co Ltd PCI2S550 (Dual 16550 UART) (rev 01) (prog-if 02 [16550])
Subsystem: Timedia Technology Co Ltd Unknown device 5079
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 53
Region 0: I/O ports at f2000800 [size=32]
Region 2: I/O ports at f2000870 [size=8]
Region 3: I/O ports at f2000860 [size=8]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use. This patch replaces those with a the init_utsname
helper.
Changes: Removed several uses of init_utsname(). Hope I picked all the
right ones in net/ipv4/ipconfig.c. These are now changed to
utsname() (the per-process namespace utsname) in the previous
patch (2/7)
[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ppc can boot one single binary on prep, chrp and pmac boards. ppc64 can
boot one single binary on pseries and G5 boards. pmac has no legacy io,
probing for PC style legacy hardware (or accessing the legacy io area
regulary) may lead to a hard crash:
* add check for parport_pc, exit on pmac. 32bit chrp has no
->check_legacy_ioport, the probe is always called. 64bit chrp has
check_legacy_ioport, check for a "parallel" node
* add check for isapnp, only PReP boards may have real ISA slots. 32bit
PReP will have no ->check_legacy_ioport, the probe is always called.
* update code in i8042_platform_init. Run ->check_legacy_ioport first,
always call request_region. No functional change. Remove whitespace
before i8042_reset init.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280). It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().
Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.
Eric's original description:
There are a lot of places in the kernel where we test for init
because we give it special properties. Most significantly init
must not die. This results in code all over the kernel test
->pid == 1.
Introduce is_init to capture this case.
With multiple pid spaces for all of the cases affected we are
looking for only the first process on the system, not some other
process that has pid == 1.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bug fix: when marking a slot as frozen, we forgot to mark
pci device itself as frozen. (we did manage to mark the
pci children, but forget the parent itself.)
This is needed so that some device drivers can check the
pci status in critical sections (e.g. in spin loops with
interrupts disabled).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The smt_snooze_delay logic changed a bit when the idle loops were
consolidated. A value of 0 used to mean we always polled, now it means
we always sleep. Instead of restoring the old behaviour, lets put a
reasonable default in smt_snooze_delay. This means we spin for a bit
(in case an external interrupt comes in) and then sleep.
Also the pseries dedicated idle loop currently does not cede both
threads in an SMT pair. The hypervisor wants us to call in so it can
power manage, so lets do that.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On detection of an EEH error, some Power4 systems seem to occasionally
want to be reset twice before they report themselves as fully recovered.
This patch re-arranges the code to attempt additional resets if the first
one doesn't take.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update to the PowerPC PCI error recovery code.
Add code to enable MMIO if a device driver reports that it is capable
of recovering on its own. One anticipated use of this having a device
driver enable MMIO so that it can take a register dump, which might
then be followed by the device driver requesting a full reset.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add wrapper around the rtas call to enable MMIO or DMA on a frozen pci
slot.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up subroutine documentation; mostly formatting changes, with
some new content.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This corrects a pci_dev get/put imbalance that can occur only in
highly unlikely situations (kmalloc failures, pci devices with
overlapping resource addresses). No actual failures seen, this was
spotted during code review.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add instrumentation for hypervisor calls on pseries. Call statistics
include number of calls, wall time and cpu cycles (if available) and
are made available via debugfs. Instrumentation code is behind the
HCALL_STATS config option and has no impact if not enabled.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As part of the new irq code pseries_kexec_cpu_down() was split into a
xics and mpic version. The vpa unregister logic is now only done in the
xics routine, and although that's ok in practice (we don't have SPLPAR
machines with mpic), I'd rather have the two concepts stay separate.
So move the vpa unregister into pseries_kexec_cpu_down(), which gets called
by both the xics and mpic routines. This also gives us an obvious place to
put any new kexec-down logic needed in future.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cleanup CPU inits a bit more, Geoff Levand already did some earlier.
* Move CPU state save to cpu_setup, since cpu_setup is only ever done
on cpu 0 on 64-bit and save is never done more than once.
* Rename __restore_cpu_setup to __restore_cpu_ppc970 and add
function pointers to the cputable to use instead. Powermac always
has 970 so no need to check there.
* Rename __970_cpu_preinit to __cpu_preinit_ppc970 and check PVR before
calling it instead of in it, it's too early to use cputable.
* Rename pSeries_secondary_smp_init to generic_secondary_smp_init since
everyone but powermac and iSeries use it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a shadow buffer for the SLBs and regsiters it with PHYP.
Only the bolted SLB entries (top 3) are shadowed.
The SLB shadow buffer tells the hypervisor what the kernel needs to
have in the SLB for the kernel to be able to function. The hypervisor
can use this information to speed up partition context switches.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Noticing the following might_sleep warning (dump_stack()) during kdump
testing when CONFIG_DEBUG_SPINLOCK_SLEEP is enabled. All secondary CPUs
will be calling rtas_set_indicator with interrupts disabled to remove
them from global interrupt queue.
BUG: sleeping function called from invalid context at
arch/powerpc/kernel/rtas.c:463
in_atomic():1, irqs_disabled():1
Call Trace:
[C00000000FFFB970] [C000000000010234] .show_stack+0x68/0x1b0 (unreliable)
[C00000000FFFBA10] [C000000000059354] .__might_sleep+0xd8/0xf4
[C00000000FFFBA90] [C00000000001D1BC] .rtas_busy_delay+0x20/0x5c
[C00000000FFFBB20] [C00000000001D8A8] .rtas_set_indicator+0x6c/0xcc
[C00000000FFFBBC0] [C000000000048BF4] .xics_teardown_cpu+0x118/0x134
[C00000000FFFBC40] [C00000000004539C]
.pseries_kexec_cpu_down_xics+0x74/0x8c
[C00000000FFFBCC0] [C00000000002DF08] .crash_ipi_callback+0x15c/0x188
[C00000000FFFBD50] [C0000000000296EC] .smp_message_recv+0x84/0xdc
[C00000000FFFBDC0] [C000000000048E08] .xics_ipi_dispatch+0xf0/0x130
[C00000000FFFBE50] [C00000000009EF10] .handle_IRQ_event+0x7c/0xf8
[C00000000FFFBF00] [C0000000000A0A14] .handle_percpu_irq+0x90/0x10c
[C00000000FFFBF90] [C00000000002659C] .call_handle_irq+0x1c/0x2c
[C00000000058B9C0] [C00000000000CA10] .do_IRQ+0xf4/0x1a4
[C00000000058BA50] [C0000000000044EC] hardware_interrupt_entry+0xc/0x10
--- Exception: 501 at .plpar_hcall_norets+0x14/0x1c
LR = .pseries_dedicated_idle_sleep+0x190/0x1d4
[C00000000058BD40] [C00000000058BDE0] 0xc00000000058bde0 (unreliable)
[C00000000058BDF0] [C00000000001270C] .cpu_idle+0x10c/0x1e0
[C00000000058BE70] [C000000000009274] .rest_init+0x44/0x5c
To fix this issue, rtas_set_indicator_fast() is added so that will not
wait for RTAS 'busy' delay and this new function is used for kdump (in
xics_teardown_cpu()) and for CPU hotplug ( xics_migrate_irqs_away() and
xics_setup_cpu()).
Note that the platform architecture spec says that set-indicator
on the indicator we're using here is not permitted to return the
busy or extended busy status codes.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We should not be calling power4_enable_pmcs() in
pseries_lpar_enable_pmcs(); just doing the hypercall is sufficient.
Prior to 2.6.15 we did not call power4_enable_pmcs() for an lpar.
power4_enable_pmcs() tries to read the hid0 register which is no
longer legal for an lpar in newer Power processors.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Our pseries hcall interfaces are out of control:
plpar_hcall_norets
plpar_hcall
plpar_hcall_8arg_2ret
plpar_hcall_4out
plpar_hcall_7arg_7ret
plpar_hcall_9arg_9ret
Create 3 interfaces to cover all cases:
plpar_hcall_norets: 7 arguments no returns
plpar_hcall: 6 arguments 4 returns
plpar_hcall9: 9 arguments 9 returns
There are only 2 cases in the kernel that need plpar_hcall9, hopefully
we can keep it that way.
Pass in a buffer to stash return parameters so we avoid the &dummy1,
&dummy2 madness.
Signed-off-by: Anton Blanchard <anton@samba.org>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that get_property() returns a void *, there's no need to cast its
return value. Also, treat the return value as const, so we can
constify get_property later.
pseries platform changes.
Built for pseries_defconfig
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On the JS21 systems, they have the SPLPAR hypertas set, but are not SMT
capable. So, they are not making the H_CEDE call. This is causing the
hypervisor to have to queue up work for the hdecr, taking an excessive
amount of time in maintenance code, and causing jitter on the box.
Making the H_CEDE call helps alleviate that problem.
Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch slightly reworks the new irq code to fix a small design error. I
removed the passing of the trigger to the map() calls entirely, it was not a
good idea to have one call do two different things. It also fixes a couple of
corner cases.
Mapping a linux virtual irq to a physical irq now does only that. Setting the
trigger is a different action which has a different call.
The main changes are:
- I no longer call host->ops->map() for an already mapped irq, I just return
the virtual number that was already mapped. It was called before to give an
opportunity to change the trigger, but that was causing issues as that could
happen while the interrupt was in use by a device, and because of the
trigger change, map would potentially muck around with things in a racy way.
That was causing much burden on a given's controller implementation of
map() to get it right. This is much simpler now. map() is only called on
the initial mapping of an irq, meaning that you know that this irq is _not_
being used. You can initialize the hardware if you want (though you don't
have to).
- Controllers that can handle different type of triggers (level/edge/etc...)
now implement the standard irq_chip->set_type() call as defined by the
generic code. That means that you can use the standard set_irq_type() to
configure an irq line manually if you wish or (though I don't like that
interface), pass explicit trigger flags to request_irq() as defined by the
generic kernel interfaces. Also, using those interfaces guarantees that
your controller set_type callback is called with the descriptor lock held,
thus providing locking against activity on the same interrupt (including
mask/unmask/etc...) automatically. A result is that, for example, MPIC's
own map() implementation calls irq_set_type(NONE) to configure the hardware
to the default triggers.
- To allow the above, the irq_map array entry for the new mapped interrupt
is now set before map() callback is called for the controller.
- The irq_create_of_mapping() (also used by irq_of_parse_and_map()) function
for mapping interrupts from the device-tree now also call the separate
set_irq_type(), and only does so if there is a change in the trigger type.
- While I was at it, I changed pci_read_irq_line() (which is the helper I
would expect most archs to use in their pcibios_fixup() to get the PCI
interrupt routing from the device tree) to also handle a fallback when the
DT mapping fails consisting of reading the PCI_INTERRUPT_PIN to know wether
the device has an interrupt at all, and the the PCI_INTERRUPT_LINE to get an
interrupt number from the device. That number is then mapped using the
default controller, and the trigger is set to level low. That default
behaviour works for several platforms that don't have a proper interrupt
tree like Pegasos. If it doesn't work for your platform, then either
provide a proper interrupt tree from the firmware so that fallback isn't
needed, or don't call pci_read_irq_line()
- Add back a bit that got dropped by my main rework patch for properly
clearing pending IPIs on pSeries when using a kexec
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the new irq remapper core and removes the old one. Because
there are some fundamental conflicts with the old code, like the value
of NO_IRQ which I'm now setting to 0 (as per discussions with Linus),
etc..., this commit also changes the relevant platform and driver code
over to use the new remapper (so as not to cause difficulties later
in bisecting).
This patch removes the old pre-parsing of the open firmware interrupt
tree along with all the bogus assumptions it made to try to renumber
interrupts according to the platform. This is all to be handled by the
new code now.
For the pSeries XICS interrupt controller, a single remapper host is
created for the whole machine regardless of how many interrupt
presentation and source controllers are found, and it's set to match
any device node that isn't a 8259. That works fine on pSeries and
avoids having to deal with some of the complexities of split source
controllers vs. presentation controllers in the pSeries device trees.
The powerpc i8259 PIC driver now always requests the legacy interrupt
range. It also has the feature of being able to match any device node
(including NULL) if passed no device node as an input. That will help
porting over platforms with broken device-trees like Pegasos who don't
have a proper interrupt tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adapts the generic powerpc interrupt handling code, and all of
the platforms except for the embedded 6xx machines, to use the new
genirq framework.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the new IRQF_ constants and remove the SA_INTERRUPT define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits)
[POWERPC] Use little-endian bit from firmware ibm,pa-features property
[POWERPC] Make sure smp_processor_id works very early in boot
[POWERPC] U4 DART improvements
[POWERPC] todc: add support for Time-Of-Day-Clock
[POWERPC] Make lparcfg.c work when both iseries and pseries are selected
[POWERPC] Fix idr locking in init_new_context
[POWERPC] mpc7448hpc2 (taiga) board config file
[POWERPC] Add tsi108 pci and platform device data register function
[POWERPC] Add general support for mpc7448hpc2 (Taiga) platform
[POWERPC] Correct the MAX_CONTEXT definition
powerpc: minor cleanups for mpc86xx
[POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set
[POWERPC] Simplify the code defining the 64-bit CPU features
[POWERPC] powerpc: kconfig warning fix
[POWERPC] Consolidate some of kernel/misc*.S
[POWERPC] Remove unused function call_with_mmu_off
[POWERPC] update asm-powerpc/time.h
[POWERPC] Clean up it_lp_queue.h
[POWERPC] Skip the "copy down" of the kernel if it is already at zero.
[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.
...
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.
While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.
The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.
This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.
As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.
The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.
We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.
This patch:
rename desc->handler to desc->chip.
Originally i did not want to do this, because it's a big patch. But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.
I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.
So lets get over with this quickly. The conversion was done automatically
via scripts and converts all the code in the kernel.
This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.
[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initialise the ppc_md htab callbacks earlier, in the probe routines. This
allows us to call htab_finish_init() from htab_initialize(), and makes it
private to hash_utils_64.c. Move htab_finish_init() and make_bl() above
htab_initialize() to avoid forward declarations.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
During kdump boot, noticed some machines checkstop on dma protection
fault for ongoing DMA left in the first kernel. Instead of initializing
TCE entries in iommu_init() for the kdump boot, this patch fixes this
issue by walking through the each TCE table and checks whether the
entries are in use by the first kernel. If so, reserve those entries by
setting the corresponding bit in tbl->it_map such that these entries
will not be available for the kdump boot.
However it could be possible that all TCE entries might be used up due
to the driver bug that does continuous mapping. My observation is around
1700 TCE entries are used on some systems (Ex: P4) at some point of
time during kdump boot and saving dump (either write into the disk or
sending to remote machine). Hence, this patch will make sure that
minimum of 2048 entries will be available such that kdump boot could be
successful in some cases.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
locking init cleanups:
- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner
this patch was generated automatically.
Motivation:
- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't dereference a device node that isn't there. A "shouldn't
happen" case, but someone ran into it with a possibly misconfigured
device tree.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Looking for class-code in PCI children breaks with direct slots. Lets
just count all children.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The PCI error recovery code will printk diagnostic info when
a PCI error event occurs. Change the messages to include the slot
location code, which is how most sysadmins will know the device.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Export both news RTAS delay functions, and change the scanlog module to
use the new delay functions.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Allocate IOMMU tables local to the relevant node.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Micro-optimisation - add no-minimal-toc to some more arch/powerpc Makefiles.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The IBM Cell blade firmware might confuse the kernel to think it's a
pSeries machine. This fixes it for now. With a bit of luck, the firmware
will be updated to avoid that in the future but currently that patch is
needed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the pseries iommu init code to use the new of_parse_dma_window()
to parse the ibm,dma-window and ibm,my-dma-window properties of pci and
virtual device nodes.
Also, clean up vio_build_iommu_table() a little.
Tested on pseries, with both vio and pci devices.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When a PCI device driver does not support PCI error recovery,
the powerpc/pseries code takes a walk through a branch of code
that resets the failure counter. Because of this, if a broken
PCI card is present, the kernel will attempt to reset it an
infinite number of times. (This is annoying but mostly harmless:
each reset takes about 10-20 seconds, and uses almost no CPU time).
This patch preserves the failure count across resets.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We are displaying the wrong thing on the operator panel (2x40
character LCD). This got broken in commit cebb21b5, when UTS_RELEASE
got changed to system_utsname.version.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc code is currently performing PCI setup before memory
initialization. PCI setup touches PCI config space registers. If the PCI
card is bad, this will evoke an error, which currrently can't be handled,
as the PCI error recovery code expects kmalloc() to be functional. This
patch will cause the system to punt instead of crashing with
cpu 0x0: Vector: 300 (Data Access) at [c0000000004434d0]
pc: c0000000000c06b4: .kmem_cache_alloc+0x8c/0xf4
lr: c00000000004ad6c: .eeh_send_failure_event+0x48/0xfc
This patch will also print name of the offending pci device.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It's been long overdue to kill the union tce_entry in the pSeries/iSeries
TCE code, especially since I asked the Summit guys to do it on the code
they copied from us.
Also, while I was at it, I cleaned up some whitespace.
Built and booted on pSeries, built on iSeries.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This requires the compatible properties having vaules that are empty
strings instead of just being empty properties.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As an added bonus, since every vio_dev now has a device_node
associated with it, hotplug now works.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current PCI error recovery system keeps track of the number of PCI card
resets, and refuses to bring a card back up if this number is too large.
The goal of doing this was to avoid an infinite loop of resets if a card is
obviously dead. However, if the failures are rare, but the machine has a
high uptime, this mechanism might still be triggered; this is too harsh.
This patch will avoids this problem by decrementing the fail count after an
hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it
will continue to be reset indefinitely. If it's failure rate is greater
than that, it will be taken off-line permanently.
This patch is larger than it might otherwise be because it changes
indentation by removing a pointless while-loop. The while loop is not
needed, as the handler is invoked once fo each event (by schedule_work());
the loop is leftover cruft from an earlier implementation.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Most users won't really know the difference between a started RTAS
daemon and a missing event-scan. Move it to debug levels.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This isn't really a dangerous thing any more; most systems lack
ISA interrupt controllers.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
No need to write out what idle loop is used on every boot.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In some crash scenarios, the kexec CPU is not responding to an IPI sent by
secondary CPU after init thread is forked, causing the system to drop into
xmon during kdump boot. This problem can be reproduced each time when the
debugger is enabled and soft-reset is used to invoke kdump boot. The first
CPU sends an IPI - setting the IPI priority for all secondary cpus
(xics_cause_ipi()). But some CPUs will enter into the xmon via soft-reset,
i.e, not executing xics_ipi_action(). Hence, IPI is not cleared. When
exited from the debugger, one of these CPUs could become the primary kexec
CPU. Since the IPI is not cleared, causing this issue in kdump boot. This
patch clears and EOI IPI for kexec CPU as well before the kdump boot
started.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Repeated calls to eeh_remove_device() can result in multiple
(and thus unbalanced) calls to pci_dev_put(). Make sure the
pci_device_put() is called only once (since there was only
one call to the matching pci_device_get()).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix __initcall return in proc_rtas_init and rtas_init.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes unnecessary exports, marks functions as static when
possible, and simplifies some list-related code.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The recent patch to print device names in EEH reset messages
was lacking ... this patch works better.
Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This extends the HCALL interface for InfiniBand usage. I've
made the patch against the linux-2.6 git tree and Segher's patch:
[PATCH] Change H_StudlyCaps to H_SHOUTING_CAPS
We moved this into the common powerpc code based on comments we
got after posting the first eHCA InfiniBand device driver patch.
Signed-off-by: Heiko j Schick <schickhj@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also cleans up some nearby whitespace problems.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current code prints an ambiguous message if the recovery
of a failed PCI device fails. Give this special case its own
unique message.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This forces the processing of EEH PCI events to be serialized,
using a very simple mutex lock. This serialization is required to
avoid races involving additional PCI device failures that may occur
during the recovery phase of a previous failure.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.
We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Non zero initcalls (except for -ENODEV) have started warning at boot.
Fix smt_setup and init_ras_IRQ.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These are some updates from both Ryan and Arnd for the hvc_console
driver:
The main point is to enable the inclusion of a console driver
for rtas, which is currrently needed for the cell platform.
Also shuffle around some data-type declarations and moves some
functions out of include/asm-ppc64/hvconsole.h and into a new
drivers/char/hvc_console.h file.
Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com>
Signed-off-by: Arnd Bergmann <abergman@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to export ppc64_firmware_features for modules. Before we do that
I think we should probably rename it to powerpc_firmware_features.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since pSeries only wants to do something different in the idle loop when
there is no work to do, we can simplify the code by implementing
ppc_md.power_save functions instead of complete idle loops. There are
two versions: one for shared-processor partitions and one for dedicated-
processor partitions.
With this we also do a cede_processor() call on dedicated processor
partitions if the poll_pending() call indicates that the hypervisor
has work it wants to do.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently have a hack to flip the boot cpu and its secondary thread
to logical cpuid 0 and 1. This means the logical - physical mapping will
differ depending on which cpu is boot cpu. This is most apparent on
kexec, where we might kexec on any cpu and therefore change the mapping
from boot to boot.
The patch below does a first pass early on to work out the logical cpuid
of the boot thread. We then fix up some paca structures to match.
Ive also removed the boot_cpuid_phys variable for ppc64, to be
consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the dynamic PCI probe function for pSeries to use
ppc_md.pci_probe_mode() when appropriate.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It has been decreed that platform numbers are evil, so as a step in that
direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit.
Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might
have to clean that up if we need to be more specific about what LPAR actually
means. But that's another patch ...
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When iommu_init_early_pSeries() was added, ages ago, we forgot to remove
the code that checks /chosen/linux,iommu-off in pSeries_init_early(). We
do it now in iommu_init_early_pSeries().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The lparcfg code needs several things which are pretty arcane internal
details and which we don't want to export, which means that lparcfg
doesn't work when built as a module. This makes it a bool instead of
a tristate in the Kconfig so that users can't try to build it as a
module.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some hotplug driver functions were migrated to the kernel for use by EEH
in commit 2bf6a8fa21.
Previously, the PCI Hotplug module had been changed to use the new
OFDT-based PCI probe when appropriate:
5fa80fcdca
When rpaphp_pci_config_slot() was moved from the rpaphp driver to the
new kernel function pcibios_add_pci_devices(), the OFDT-based probe
stuff was dropped. This patch restores it.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
HMT support is currently broken and needs to be reworked to play nicely
with the SMT scheduler. Remove the bit rotten bits for the time being.
I also updated an incorrect comment, we enter __secondary_hold with the
physical cpu id in r3.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the logical and physical cpu ids of a secondary thread don't match, we will
fail to spin the thread up on pSeries machines due to a bug in pseries/smp.c
We call the RTAS "start-cpu" method with the physical cpu id, the address of
pSeries_secondary_smp_init and the value to pass that function in r3. Currently
we pass "lcpu", the logical cpu id, but pSeries_secondary_smp_init expects
the physical cpu id in r3.
We should be passing pcpu instead.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we have some stuff in firmware.h and kernel/firmware.c that is
#ifdef CONFIG_PPC_PSERIES. Move it all into platforms/pseries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up fw_feature_init in platforms/pseries/setup.c. Clean up white space
and replace the while loop with a for loop - which seems clearer to me.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We call unregister_vpa but we don't check to see if the hypervisor
supports this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Anton Blanchard <anton@samba.org>
--
arch/powerpc/platforms/pseries/setup.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Build break: Building PCI hotplug on PowerPC results in
a build break, due to failure to export symbols.
Reported today by Dave Jones <davej@redhat.com>:
drivers/pci/hotplug/rpaphp.ko needs unknown symbol pcibios_add_pci_devices
This patch fixes the break in the arch/powerpc tree.
Next patch fixes same problem in drivers/pci tree
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure. This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.
This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA. On the other hand it means an
extra level of dereference in many accesses to the lppaca.
The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support to reconfigure the device tree through the existing
proc filesystem interface. Add "add_property", "remove_property",
and "update_property" commands to the existing interface.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These symbols are only used in the file that they are defined in,
so they should not be in the global namespace.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove warning in eeh code about mixed variables and code.
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a crash on null-pointer deref during dlpar slot addition.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 1c87c0f84943fbbc91826967ff4fea1b059a526f commit)
<asm/systemcfg.h> is gone now, and the PCI error recovery constants
in include/linux/pci.h changed their names in the process of getting
accepted.
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 5a2516156c591fc3d2059fbd93f97e15eb6010d6 commit)
242-eeh-no-percpu-counters.patch
Remove per-cpu counters from the EEH code. These statistics counters
are incremented at a very low frequency, and the performance gains of
per-cpu variables are negligable. By contrast, the counters weren't
safe against cpu off/online operations, and its not worth the effort
to make them so (other than to turn them into plain globals).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from be3b5d1be053ccb41e91fa5a6f43ef5db301357d commit)
241-eeh-save-bars-earlier.patch
Save the PCI device bars *before* any PCI probing is done.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
239-eeh-multifunction-consolidate.patch
New-style firmware will often place multiple different functions
under a non-EEH-aware parent. However, these devices might share
a common PE "partition endpoint" and config address, ad thus any
EEH events will affect all of the devices in common. This patch
makes the effort to find all of these common devices and handle
them together.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)
238-eeh-stop-if-reset_failed.patch
If the firmware is unable to reset the PCI slot for some reason, then
don't attempt any further recovery steps after that point. Instead,
mark the device as permanently failed.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
237-eeh-bridge-token.patch
Minor: the rtas-bridge token should be set up the same way that all
the other rtas tokens are set up.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 78379b6c5fc17b6666c40b05988e6708e98479c0 commit)
236-eeh-config-addr.patch
The PE configuration address wasn't being cnsistently used in all locations
where a config address is called for. This patch adds it to the places it
should have appeared in.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from c2bc904a28095aca0b04a37854b63b78622a032e commit)
235-eeh-set-pcidev-bugfix.patch
The pci device field of the pci_dn struct should be initialized to a
valid value.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from beb45c93d494a11c36e5b24f638e610db8428b54 commit)
234-eeh-find-pe.patch
The find_device_pe() routine is duplicated in two files. Remove one of
the two copies, declare the other extern.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
26-eeh-partition-endpoint.patch
New versions of firmware introduce a new method by which the
"partitionable endpoint" (the point at which the pci bus is cut)
should be located. This code adds the support for this (mandatory)
new feature.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
25-pci-address-cache.patch
The core EEH file is rather large. This patch splits out a self-contained
chunk of it into its own file. This is the chunk that performes the
caching and lookup of pci devices based on the i/o addresses of thier
resoures. This code is almos architecture-independent and could be
used by any system that wanted to find a pci device based only on
the i/o address used by the device.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
Various PCI bus errors can be signaled by newer PCI controllers. The
core error recovery routines are architecture dependent. This patch adds
a recovery infrastructure for the PPC64 pSeries systems.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
This patch enables support for pause(0) power management state
for the Cell Broadband Processor, which is import for power efficient
operation. The pervasive infrastructure will in the future enable
us to introduce more functionality specific to the Cell's
pervasive unit.
From: Maximino Aguilar <maguilar@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, we are not looking at all interrupt controller nodes in the
device tree even though the proper node was not found. This is causing
the system panic. The attached patch will scan all nodes until it finds
the proper interrupt controller type.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There is code in the RPAPHP directory that is identical to this routine;
I'll be removing that code in an upcoming patch, but this patch is needed
to expose the function to make it callable.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Implementing the machine_crash_shutdown which will be called by
crash_kexec (called in case of a panic, sysrq etc.). Disable the
interrupts, shootdown cpus using debugger IPI and collect regs
for all CPUs.
elfcorehdr= specifies the location of elf core header stored by
the crashed kernel. This command line option will be passed by
the kexec-tools to capture kernel.
savemaxmem= specifies the actual memory size that the first kernel
has and this value will be used for dumping in the capture kernel.
This command line option will be passed by the kexec-tools to
capture kernel.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The fwnmi vectors can be anywhere < 32 MB, so we need to use a trampoline
for them. The kdump kernel will register the trampoline addresses, which will
then jump up to the real code above 32 MB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Minor: use macro to perform void pointer deref; this may someday help
avoid pointer typecasting errors.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The udbg low level io layer has an issue with udbg_getc() returning a
char (unsigned on ppc) instead of an int, thus the -1 if you had no
available input device could end up turned into 0xff, filling your
display with bogus characters. This fixes it, along with adding a little
blob to xmon to do a delay before exiting when getting an EOF and fixing
the detection of ADB keyboards in udbg_adb.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
23-rpaphp-migrate.patch (parts)
This patch moves some pci device add & remove code from the PCI
hotplug directory to the arch/powerpc/kernel directory, and cleans
it up a tad. The primary reason for this is that the code performs
some fairly generic operations that are shared with the PCI error
recovery code (living in the arch/powerpc/kernel directory).
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
20-rpaphp-eeh-cleanup.patch
This patch move some code from the rpaphp directory, to the powerpc
directory, where it should have been all along (Among other things, I
need it in the powerpc directory for the PCI error recovery.)
Please note that patch affects TWO maintainers: Paul, after applying
the powerpc part, please ask that GregKH appli the PCI part. It is safe
to have the powerpc part go in first. It would be bad to have the
PCI part go in first.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch unifies udbg for both ppc32 and ppc64 when building the
merged achitecture. xmon now has a single "back end". The powermac udbg
stuff gets enriched with some ADB capabilities and btext output. In
addition, the early_init callback is now called on ppc32 as well,
approx. in the same order as ppc64 regarding device-tree manipulations.
The init sequences of ppc32 and ppc64 are getting closer, I'll unify
them in a later patch.
For now, you can force udbg to the scc using "sccdbg" or to btext using
"btextdbg" on powermacs. I'll implement a cleaner way of forcing udbg
output to something else than the autodetected OF output device in a
later patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the discovery of legacy serial ports to a separate file,
makes it common to ppc32 and ppc64, and reworks it to use the new OF
address translators to get to the ports early. This new version can also
detect some PCI serial cards using legacy chips and will probably match
those discovered port with the default console choice.
Only ppc64 gets udbg still yet, unifying udbg isn't finished yet.
It also adds some speed-probing code to udbg so that the default console
can come up at the same speed it was set to by the firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch merges, to some extent, the PPC32 and PPC64 kexec implementations.
We adopt the PPC32 approach of having ppc_md callbacks for the kexec functions.
The current PPC64 implementation becomes the "default" implementation for PPC64
which platforms can select if they need no special treatment.
I've added these default callbacks to pseries/maple/cell/powermac, this means
iSeries no longer supports kexec - but it never worked anyway.
I've renamed PPC32's machine_kexec_simple to default_machine_kexec, inline with
PPC64. Judging by the comments it might be better named machine_kexec_non_of,
or something, but at the moment it's the only implementation for PPC32 so it's
the "default".
Kexec requires machine_shutdown(), which is in machine_kexec.c on PPC32, but we
already have in setup-common.c on powerpc. All this does is call
ppc_md.nvram_sync, which only powermac implements, so instead make
machine_shutdown a ppc_md member and have it call core99_nvram_sync directly
on powermac.
I've also stuck relocate_kernel.S into misc_32.S for powerpc.
Built for ARCH=ppc, and 32 & 64 bit ARCH=powerpc, with KEXEC=y/n. Booted on
P5 LPAR and successfully kexec'ed.
Should apply on top of 493f25ef40.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It turns out that commit f9bd170a87
broke the cascade from XICS to i8259 on pSeries machines; specifically
we ended up not ever doing the EOI on the XICS for the cascade. The
result was that interrupts from the serial ports (and presumably any
other devices using ISA interrupts) didn't get through. This fixes
it and also simplifies the code, by doing the EOI on the XICS in the
xics_get_irq routine after reading and acking the interrupt on the
i8259.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some debug code wasn't properly removed from the initial 64k pages
patch, and while it's harmless, it's also slowing down significantly a
very hot code path, thus it should really be removed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The previous commit will use the page-at-a-time hypervisor call for
setting up IOMMU entries when we are using 64k pages and setting up
one 64k page, even though that means 16 calls to the hypervisor, since
the hypervisor still works on 4k pages. This optimizes this case by
using the multi-page IOMMU setup hypervisor call instead.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Must adjust tcenum and npages by TCE_PAGE_FACTOR to convert between
64KB pages and TCE (4K) pages. (This is done in other places, except
for this one location.)
Signed-off-by: Michal Ostrowski <mostrows at watson ibm com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Email address update, changing old work address to personal (permanent)
one.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
pseries_dedicated_idle() was using __get_tb which used to be defined
in asm/delay.h. Change it to use get_tb from asm/time.h, which is
in fact exactly the same thing.
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the kernel supports both G5 and pSeries, and CONFIG_EEH is enabled,
eeh_init() is (quite reasonably) never called when we boot on a G5. Yet
eeh_check_failure() still gets called. We should avoid doing that if
!eeh_subsystem_enabled.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also deletes files in arch/ppc64 that are no longer used now that
we don't compile with ARCH=ppc64 any more.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We currently have a ppc_md member called cpu_irq_down, which disables IRQs
for the cpu in question. The only caller of cpu_irq_down is the kexec code.
On pSeries we need to do more than just teardown IRQs at kexec time, so rename
the ppc_md member to kexec_cpu_down and expand it. The pSeries code needs to
know, and other platforms might too, whether we're doing a crash shutdown (ie.
panicking) or a regular kexec, so add a flag for that.
The pSeries implementation of kexec_cpu_down does an unregister VPA call, which
tells the Hypervisor to stop writing stuff into our pacas. Without this we can
get weird memory corruption bugs when we kexec, caused by the Hypervisor
writing into the first kernel's pacas which happens to be somewhere interesting
in the second kernel's memory.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We needed the VDSO symbols in the arch/ppc asm-offsets.c, and there
were a few usages of _systemcfg still left lying around.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have been printing the raw ppc64_firmware_features during boot. Since
we can work it out from the device tree, lets remove it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
17-eeh-slot-marking-bug.patch
A device that experiences a PCI outage may be just one deivce out
of many that was affected. In order to avoid repeated reports of
a failure, the entire tree of affected devices should be marked
as failed. This patch marks up the entire tree.
Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
phbs_remap_io(), which maps the PCI IO space into the kernel virtual space,
is called too early on powermac, and thus doesn't work.
This fixes it by removing the call from all platforms and putting it back
into the ppc64 common code where it belongs, after the actual probing of
the bus.
That means that before that call, only the ISA IO space (if any) is mapped,
any PIO access (from quirks for example) will fail. This happens not to be
a problem for now, but we'll have to rework that code if it becomes one in
the future.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch merges platform codes. systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32. Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
scanlog.c is only compiled on pSeries. Thus, this patch moves it to
platforms/pseries.
Built and booted on pSeries LPAR (ARCH=powerpc and ARCH=ppc64). Built
for iSeries (ARCH=powerpc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
14-eeh-device-bar-save.patch
After a PCI device has been resest, the device BAR's and other config
space info must be restored to the same state as they were in when
the firmware first handed us this device. This will allow the
PCI device driver, when restarted, to correctly recognize and set up
the device.
Tis patch saves the device config space as early as reasonable after
the firmware has handed over the device. Te state resore funcion
is inteded for use by the EEH recovery routines.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
13-eeh-recovery-support-routines.patch
EEH Recovery support routines
This patch adds routines required to help drive the recovery of
EEH-frozen slots. The main function is to drive the PCI #RST
signal line high for a qurter of a second, and then allow for
a second & a half of settle time.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
12-eeh-event-dispatcher.patch
ppc64: EEH Recovery dispatcher thread
This patch adds a mechanism to create recovery threads when an
EEH event is received. Since an EEH freeze state may be detected
within an interrupt context, we need to get out of the interrupt
context before starting recovery. This dispatcher does this in
two steps: first, it uses a workqueue to get out, and then
lanuches a kernel thread, so that the recovery routine can
sleep for exteded periods without upseting the keventd.
A kernel thread is created with each EEH event, rather than
having one long-running daemon started at boot time. This is
because it is anticipated that EEH events will be very rare
(very very rare, ideally) and so its pointless to cluter the
process tables with a daemon that will almost never run.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
11-eeh-move-to-powerpc.patch
Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c
No other changes (except for Makefile to build it)
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use __do_IRQ instead. The only difference is that every controller
is now assumed to have an end() routine (only xics_8259 did not).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
This is the arch/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in arch/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define ppc_md.set_dabr for both 32 + 64 bit. Cleanup the implementation for
pSeries also, it was needlessly complex. Now we just do two firmware tests at
setup time, and use one of two functions, rather than using one function and
testing on every call.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Mostly this involves adding #include <asm/smp.h>, since that defines
things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which
are SMP-related but still needed on UP. This incorporates fixes
posted by Olof Johansson and Heikki Lindholm.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ancient ppcdebug/PPCDBG mechanism is now only used in two places.
First, in the hash setup code, one of the bits allows the size of the
hash table to be reduced by a factor of 8 - which would be better
accomplished with a command line option for that purpose. The other
was a bunch of bus walking related messages in the iSeries code, which
would seem to be insufficient reason to keep the mechanism.
This patch removes the last traces of this mechanism.
Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
base page size to 64K. The resulting kernel still boots on any
hardware. On current machines with 4K pages support only, the kernel
will maintain 16 "subpages" for each 64K page transparently.
Note that while real 64K capable HW has been tested, the current patch
will not enable it yet as such hardware is not released yet, and I'm
still verifying with the firmware architects the proper to get the
information from the newer hypervisors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
register_vpa() doesn't actually do a VPA register call it just uses the flags
you pass it, so rename it to vpa_call() to be clearer.
We can then define register_vpa() and unregister_vpa() which are both simple
wrappers around vpa_call(). (we'll need unregister_vpa() for kexec soon)
We can then cleanup vpa_init(), and because vpa_init() is only called from
platforms/pseries we remove the definition in asm-ppc64/smp.h.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
The extraction of PCI stuff from struct device_node left some false
assumptions in notifier code. As a result, dynamic add crashes when
non-PCI nodes are added. This patch fixes these assumptions.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move plpar_wrappers.h into arch/powerpc/platforms/pseries, fixup white space,
and update callers.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Move pSeries specific code in set_dabr() into a ppc_md function, this will
allow us to keep plpar_wrappers.h private to platforms/pseries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
This moves rtas-proc.c and rtas_flash.c into arch/powerpc/kernel, since
cell wants them as well as pseries (and chrp can use rtas-proc.c too,
at least in principle). rtas_fw.c is gone, with its bits moved into
rtas_flash.c and rtas.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
A recent commit that removed rtas-fw.h and moved its contents to
include/asm-powerpc/rtas.h forgot to also remove the inclusion of
it in arch/powerpc/platforms/pseries/setup.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cell uses the same code as pSeries for flashing the firmware
through rtas, so the implementation should not be part of
platforms/pseries.
Put it into arch/powerpc/kernel instead.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch moves the XICS interrupt controller code into the
platforms/pseries directory, since it only appears on pSeries
machines. If it ever appears on some other machine we can move it to
sysdev, although xics.c itself will need a bunch of changes in that
case to remove pSeries specific assumptions.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits arch/ppc64/kernel/rtas.c into arch/powerpc/kernel/rtas.c,
which contains generic RTAS functions useful on any CHRP platform,
and arch/powerpc/platforms/pseries/rtas-fw.[ch], which contain
some pSeries-specific firmware flashing bits. The parts of rtas.c
that are to do with pSeries-specific error logging are protected
by a new CONFIG_RTAS_ERROR_LOGGING symbol. The inclusion of rtas.o
is controlled by the CONFIG_PPC_RTAS symbol, and the relevant
platforms select that.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the parameters for i8259_init so that it takes two
parameters: a physical address for generating an interrupt
acknowledge cycle, and an interrupt number offset. i8259_init
now sets the irq_desc[] for its interrupts; all the callers
were doing this, and that code is gone now. This also defines
a CONFIG_PPC_I8259 symbol to select i8259.o for inclusion, and
makes the platforms that need it select that symbol.
Signed-off-by: Paul Mackerras <paulus@samba.org>
ras.o is only built for CONFIG_PPC_PSERIES, so move it into
arch/powerpc/platforms/pseries. Update Makefiles to suit.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This way they get done in one place for all platforms, and it is
more consistent with what ppc32 does.
Signed-off-by: Paul Mackerras <paulus@samba.org>
I missed a few places where ppc code was still assuming that the
ppc_md.show_[per]cpuinfo functions returned int.
Signed-off-by: Paul Mackerras <paulus@samba.org>
A few things change for consistency between ppc32 and ppc64:
idle functions return void; *_get_boot_time functions return
unsigned long (i.e. time_t) rather than filling in a struct rtc_time
(since that's useful to the callers and easier for pmac to
generate); *_get_rtc_time and *_set_rtc_time functions take
a struct rtc_time; irq_canonicalize is gone; nvram_sync returns
void.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This creates the directory structure under arch/powerpc and a bunch
of Kconfig files. It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.
For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes
to files in those directories and files outside arch/powerpc.
The boot directory is still not merged. That's going to be interesting.
Signed-off-by: Paul Mackerras <paulus@samba.org>