kdump can be interrupted by watchdog timer when the timer is left
activated on the crash kernel. Changed the hpwdt driver to disable
watchdog timer at boot-time. This assures that watchdog timer is
disabled until /dev/watchdog is opened, and prevents watchdog timer
to be left running on the crash kernel.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Lisa Mitchell <lisa.mitchell@hp.com>
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable <stable@vger.kernel.org>
This patch is to unregister for NMI events upon exit. Also we are now
making the default setting for allow_kdump enabled.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This patch converts the PCI watchdog drivers so that they use the
module_pci_driver() macro. This makes the code smaller and simpler.
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: Thomas Mingarelli <thomas.mingarelli@hp.com>
Cc: Marc Vertes <marc.vertes@sigfox.com>
Pull core locking updates from Ingo Molnar:
"This update:
- extends and simplifies x86 NMI callback handling code to enhance
and fix the HP hw-watchdog driver
- simplifies the x86 NMI callback handling code to fix a kmemcheck
bug.
- enhances the hung-task debugger"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Fix the type of the nmiaction.flags field
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled
x86/nmi: Add new NMI queues to deal with IO_CHK and SERR
watchdog, hpwdt: Remove priority option for NMI callback
hung task debugging: Inject NMI when hung and going to panic
This patch is to correct the use of the iLO port 0x72 usage.
The port 0x72 is a byte size write/read and hpwdt is currently
writing a WORD.
Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
In discussions with Thomas Mingarelli about hpwdt, he explained
to me some issues they were some when using their virtual NMI
button to test the hpwdt driver.
It turns out the virtual NMI button used on HP's machines do no
send unknown NMIs but instead send IO_CHK NMIs. The way the
kernel code is written, the hpwdt driver can not register itself
against that type of NMI and therefore can not successfully
capture system information before panic'ing.
To solve this I created two new NMI queues to allow driver to
register against the IO_CHK and SERR NMIs. Or in the hpwdt all
three (if you include unknown NMIs too).
The change is straightforward and just mimics what the unknown
NMI does.
Reported-and-tested-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1333051877-15755-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The NMI_UNKNOWN bucket only allows for one function to register
to it. The reason for that is because only functions which can
not determine if the NMI belongs to them or not should register
and would like to assume/swallow any NMI they see.
As a result it doesn't make sense to let more than one function
like this register. In fact, letting a second function fail
allows us to know that more than one function is going to
swallow NMIs on the current system. This is better than silently
being ignored.
Therefore hpwdt's priority mechanism doesn't make sense any
more. They will be always first on the NMI_UNKNOWN queue, if
they register.
Removing this parameter cleans up the code and simplifies things
for the next patch which changes how nmis are registered.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Thomas Mingarelli <thomas.mingarelli@hp.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1333051877-15755-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the current logging styles.
Make sure all output has a prefix.
Add missing newlines.
Remove now unnecessary PFX, NAME, and miscellaneous other #defines.
Coalesce formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
1. address has to be page aligned.
2. set_memory_x uses page size argument, not size.
Bug causes with following commit:
commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date: Mon Nov 7 10:59:00 2011 +0100
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
commit e67d668e14 upstream.
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable <stable@vger.kernel.org>
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.
This is needed for SLES11 SP2 and the latest upstream kernel as it appears
the NX Execute Disable has grown in its control.
Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable@kernel.org
nmi.c needs an #include <linux/mca.h>:
arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’:
arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function)
arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in
Another one is the hpwdt driver:
drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion. A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.
[Thanks to Ying for finding out the history behind that mce call
https://lkml.org/lkml/2010/5/27/114
And Boris responding that he would like to remove that call because of it
https://lkml.org/lkml/2011/9/21/163]
The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On platforms with no iCRU support don't print two, (possibly conflicting),
"NMI occurred" messages when the firmware is unable to source the NMI.
Please note that one of the enhancements to the v1.3.0 hpwdt driver is to panic and allow
KDUMP to succeed even on NMIs that are unknown to the platform firmware.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This patch is required to enable hpwdt to work on next generation HP servers
with iLO.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
hpwdt_init_nmi_decoding() is called in hpwdt_init_one error handling,
thus remove the __devexit annotation of hpwdt_exit_nmi_decoding().
This patch fixes below warning:
WARNING: drivers/watchdog/hpwdt.o(.devinit.text+0x36f): Section mismatch in reference from the function hpwdt_init_one() to the function .devexit.text:hpwdt_exit_nmi_decoding()
The function __devinit hpwdt_init_one() references
a function __devexit hpwdt_exit_nmi_decoding().
This is often seen when error handling in the init function
uses functionality in the exit path.
The fix is often to remove the __devexit annotation of
hpwdt_exit_nmi_decoding() so it may be used outside an exit section.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
They are a handful of places in the code that register a die_notifier
as a catch all in case no claims the NMI. Unfortunately, they trigger
on events like DIE_NMI and DIE_NMI_IPI, which depending on when they
registered may collide with other handlers that have the ability to
determine if the NMI is theirs or not.
The function unknown_nmi_error() makes one last effort to walk the
die_chain when no one else has claimed the NMI before spitting out
messages that the NMI is unknown.
This is a better spot for these devices to execute any code without
colliding with the other handlers.
The two drivers modified are only compiled on x86 arches I believe, so
they shouldn't be affected by other arches that may not have
DIE_NMIUNKNOWN defined.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: dann frazier <dannf@hp.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c. This shift has caused a whole bunch of
compile problems under different config options. I attempt to
simplify things with the patch below.
In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing,
the former on a local level and the latter on a global level.
With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more. x86 will now use the global
implementation.
The changes below do a few things. First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here). Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.
Second, I removed the x86 implementation of
touch_nmi_watchdog(). It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.
Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86. And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.
Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can
only have one nmi_watchdog.
Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)
Hopefully, after this patch I won't get any more compile broken
emails. :-)
v3:
changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the bulk of the old nmi_watchdog is gone, remove all
the stub variables and hooks associated with it.
This touches lots of files mainly because of how the io_apic
nmi_watchdog was implemented. Now that the io_apic nmi_watchdog
is forever gone, remove all its fingers.
Most of this code was not being exercised by virtue of
nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
risky here.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hpwdt is quite functional without the NMI decoding feature.
This change lets users disable the NMI portion at compile-time
via the new HPWDT_NMI_DECODING config option.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Move NMI-decoding initialisation and exit code to seperate functions so that
we can ifdef-out parts of it in the future.
Also, this is for a device, so let's use dev_info instead of printk.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The term "decoding" more clearly explains what hpwdt is doing. It isn't
just finding the source of the interrupt, but rather aids in decoding what
the interrupt means.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Reorganize this function to remove excess indentation and highlight
the single return code. (No functional change).
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Let applications check the amount of time left before the watchdog will fire.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The hpwdt timer is a 16 bit value with 128ms resolution.
Let applications use this entire range.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Define a macro to convert from seconds to timer ticks.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The 32-bit assembly is guarded by an #ifndef CONFIG_X86_64. Kconfig prevents
us from building this driver on !X86, so that happens to suffice - but we
should really lock it down to #ifdef CONFIG_X86_32.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This driver supports both iLO2 and iLO3, but our user-visible strings
currently only reference iLO2. Let's just call it "iLO2+" to avoid having
to update strings for each iLO generation. This driver doesn't support
iLO ASICs prior to iLO2, but that is sufficiently explained in Kconfig.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* Group together includes specific to NMI sourcing
* Group defines only used by NMI sourcing together
* Group declarations specific to NMI sourcing together
This gives a clean seperation of watchdog specific items and
NMI sourcing specific items (which is needed for making it
possible to build hpwdt without the NMI functionality).
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Reorganization only.
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* remove unnecessary includes
* We use a spinlock, but lacked the include
* We need bitops.h for test_and_set_bit/clear_bit
Signed-off-by: dann frazier <dannf@hp.com>
Acked-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] hpwdt - fix lower timeout limit
[WATCHDOG] iTCO_wdt: TCO Watchdog patch for additional Intel Cougar Point DeviceIDs
[WATCHDOG] doc: Fix use of WDIOC_SETOPTIONS ioctl.
[WATCHDOG] doc: watchdog simple example: don't fail on fsync()
[WATCHDOG] set max63xx driver as ARM only
[WATCHDOG] powerpc: pika_wdt ident cannot be const
[Novell Bug 581103] HP Watchdog driver has arbitrary (wrong) timeout limits.
Fix the lower timeout limit to a more appropriate value.
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable <stable@kernel.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Add a priority option so that the user can choose if we do the NMI
first or last.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Add NMI sourcing functionality (Can only be active if nmi_watchdog is
inactive).
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
At the moment, dmi_walk() lacks flexibility, users can't pass data to
the callback function. Add a pointer for private data to make this
function more flexible.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Roland Dreier <rolandd@cisco.com>
Add the PCI-ID for the upcoming new BMC controller for HP hardware.
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
When the "hpwdt" module is loaded (even if the /dev/watchdog device is not
opened), then kdump does not work. The panic kernel either does not start at
all or crash in various places.
The problem is that hpwdt_pretimeout is registered with register_die_notifier()
with the highest possible priority. Because it returns NOTIFY_STOP, the
crash_nmi_callback which is also registered with register_die_notifier()
is never executed. This causes the shutdown of other CPUs to fail.
Reverting the order is no option: The crash_nmi_callback executes HLT
and so never returns normally. Because of that, it must be executed as
last notifier, which currently is done.
So, that patch returns NOTIFY_OK to keep the crash_nmi_callback executed.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Merge branch 'alan' of ../linux-2.6-watchdog-mm
Fixed Conflicts in the following files:
drivers/watchdog/booke_wdt.c
drivers/watchdog/mpc5200_wdt.c
drivers/watchdog/sc1200wdt.c
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>