OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Greg Kroah-Hartman	e032d80774	mce: fix warning messages about static struct mce_device When suspending, there was a large list of warnings going something like: Device 'machinecheck1' does not have a release() function, it is broken and must be fixed This patch turns the static mce_devices into dynamically allocated, and properly frees them when they are removed from the system. It solves the warning messages on my laptop here. Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Djalal Harouni <tixxdz@opendz.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-16 17:08:42 -08:00
Linus Torvalds	7affca3537	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h	2012-01-07 12:03:30 -08:00
Greg Kroah-Hartman	ff4b8a57f0	Merge branch 'driver-core-next' into Linux 3.2 This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 11:42:52 -08:00
Kay Sievers	8a25a2fd12	cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-12-21 14:29:42 -08:00
Ingo Molnar	a228b5892b	Merge branch 'mce-inject' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce	2011-12-18 09:18:45 +01:00
Chen Gong	2c29d9dd57	x86: add IRQ context simulation in module mce-inject mce-inject provides a mechanism to simulate errors so that test scripts can check for correct operation of the kernel without requiring any specialized hardware to create rare events. The existing code can simulate events in normal process context and also in NMI context - but not in IRQ context. This patch fills that gap. Link: https://lkml.org/lkml/2011/12/7/537 Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-12-16 11:20:02 -08:00
Borislav Petkov	3653ada5d3	x86, mce: Add wrappers for registering on the decode chain No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:50:12 +01:00
Luck, Tony	66f5ddf30a	x86/mce: Make mce_chrdev_ops 'static const' Arjan would like to make struct file_operations const, but mce-inject directly writes to the mce_chrdev_ops to install its write handler. In an ideal world mce-inject would have its own character device, but we have a sizable legacy of test scripts that hardwire "/dev/mcelog", so it would be painful to switch to a separate device now. Instead, this patch switches to a stub function in the mce code, with a registration helper that mce-inject can call when it is loaded. Note that this would also allow for a sane process to allow mce-inject to be unloaded again (with an unregister function, and appropriate module_{get,put}() calls), but that is left for potential future patches. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-08 16:17:11 +01:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Hidetoshi Seto	c7cece89f1	x86, mce: Use mce_sysdev_ prefix to group functions There are many functions named mce_* so use a new prefix for the subset of functions related to sysfs support. And since `f3c6ea1b06` introduces syscore_ops, use the prefix mce_syscore for some functions related to power management which were in sysdev_class before. Before: After: mce_device mce_sysdev mce_sysclass mce_sysdev_class mce_attrs mce_sysdev_attrs mce_dev_initialized mce_sysdev_initialized mce_create_device mce_sysdev_create mce_remove_device mce_sysdev_remove mce_suspend mce_syscore_suspend mce_shutdown mce_syscore_shutdown mce_resume mce_syscore_resume Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED81B.8020506@jp.fujitsu.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-06-16 12:10:16 +02:00
Hidetoshi Seto	2b90e77eae	x86, mce: Replace MCM_ with MCI_MISC_ Follow other MCi register defines. Plus define MCI_MISC_ADDR_LSB() and MCI_MISC_ADDR_MODE(). Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED6E8.9090509@jp.fujitsu.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-06-16 12:10:10 +02:00
Borislav Petkov	dffa4b2f62	x86, mce: Drop the default decoding notifier The default notifier doesn't make a lot of sense to call in the correctable errors case. Drop it and emit the mcelog decoding hint only in the uncorrectable errors case and when no notifier is registered. Also, limit issuing the "mcelog --ascii" message in the rare case when we dump unreported CEs before panicking. While at it, remove unused old x86_mce_decode_callback from the header. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com> Cc: Russ Anderson <rja@sgi.com> Link: http://lkml.kernel.org/r/20110420102349.GB1361@aftab Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:35:10 +02:00
R, Durgadoss	9e76a97efd	x86, hwmon: Add core threshold notification to therm_throt.c This patch adds code to therm_throt.c to notify core thermal threshold events. These thresholds are supported by the IA32_THERM_INTERRUPT register. The status/log for the same is monitored using the IA32_THERM_STATUS register. The necessary #defines are in msr-index.h. A call back is added to mce.h, to further notify the thermal stack, about the threshold events. Signed-off-by: Durgadoss R <durgadoss.r@intel.com> LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-03 08:30:30 -08:00
Huang Ying	3c41758860	x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup It is reported that CMCI is not raised when number of corrected error reaches preset threshold. After inspection, it is found that MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch fixed it. Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64 Software Developer's Manual too. Reported-by: Shaohui Zheng <shaohui.zheng@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1275977350.3444.660.camel@yhuang-dev.sh.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 21:27:36 -07:00
Huang Ying	1f9a0bd498	x86, mce: Rename MSR_IA32_MCx_CTL2 value Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent. Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1275977348.3444.659.camel@yhuang-dev.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-06-10 21:27:26 -07:00
Huang Ying	d334a49113	ACPI, APEI, Generic Hardware Error Source memory error support Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux. Now, only SCI notification type and memory errors are supported. More notification type and hardware error type will be added later. These memory errors are reported to user space through /dev/mcelog via faking a corrected Machine Check, so that the error memory page can be offlined by /sbin/mcelog if the error count for one page is beyond the threshold. On some machines, Machine Check can not report physical address for some corrected memory errors, but GHES can do that. So this simplified GHES is implemented firstly. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-19 22:41:16 -04:00
Alan Cox	df39a2e48f	x86: mce.h: Fix warning in header checks Someone isn't reading their build output: Move the definition out of the exported header. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: linux-kernel@vger.kernelorg Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-01-13 10:41:22 +01:00
Yong Wang	a2202aa292	x86: Under BIOS control, restore AP's APIC_LVTTHMR to the BSP value On platforms where the BIOS handles the thermal monitor interrupt, APIC_LVTTHMR on each logical CPU is programmed to generate a SMI and OS must not touch it. Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all the LVT entries except the mask bit. Essentially this results in all LVT entries including the thermal monitoring interrupt set to masked (clearing the bios programmed value for APIC_LVTTHMR). And this leads to kernel take over the thermal monitoring interrupt on AP's but not on BSP (leaving the bios programmed value only on BSP). As a result of this, we have seen system hangs when the thermal monitoring interrupt is generated. Fix this by reading the initial value of thermal LVT entry on BSP and if bios has taken over the control, then program the same value on all AP's and leave the thermal monitoring interrupt control on all the logical cpu's to the bios. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org	2009-11-10 05:57:55 +01:00
Borislav Petkov	5e09954a9a	x86, mce: Fix up MCE naming nomenclature Prefix global/setup routines with "mcheck_" thus differentiating from the internal facilities prefixed with "mce_". Also, prefix the per cpu calls with mcheck_cpu and rename them to reflect the MCE setup hierarchy of calls better. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1255689093-26921-1-git-send-email-borislav.petkov@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-16 14:46:49 +02:00
Borislav Petkov	fb2531953f	mce, edac: Use an atomic notifier for MCEs decoding Add an atomic notifier which ensures proper locking when conveying MCE info to EDAC for decoding. The actual notifier call overrides a default, negative priority notifier. Note: make sure we register the default decoder only once since mcheck_init() runs on each CPU. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091003065752.GA8935@liondog.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 12:24:45 +02:00
Ingo Molnar	f436f8bb73	x86: EDAC: MCE: Fix MCE decoding callback logic Make decoding of MCEs happen only on AMD hardware by registering a non-default callback only on CPU families which support it. While looking at the interaction of decode_mce() with the other MCE code i also noticed a few other things and made the following cleanups/fixes: - Fixed the mce_decode() weak alias - a weak alias is really not good here, it should be a proper callback. A weak alias will be overriden if a piece of code is built into the kernel - not good, obviously. - The patch initializes the callback on AMD family 10h and 11h. - Added the more correct fallback printk of: No support for human readable MCE decoding on this CPU type. Transcribe the message and run it through 'mcelog --ascii' to decode. On CPUs that dont have a decoder. - Made the surrounding code more readable. Note that the callback allows us to have a default fallback - without having to check the CPU versions during the printout itself. When an EDAC module registers itself, it can install the decode-print function. (there's no unregister needed as this is core code.) version -v2 by Borislav Petkov: - add K8 to the set of supported CPUs - always build in edac_mce_amd since we use an early_initcall now - fix checkpatch warnings Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20091001141432.GA11410@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-02 15:42:18 +02:00
Huang Ying	0dcc66851f	x86, mce: Support specifying raise mode for software MCE injection Raise mode include raising as exception or raising as poll, it is specified via the mce.inject_flags field. This can be used to specify raise mode of UCNA, which is UC error but raised not as exception. And this can be used to test the filter code of poll handler or exception handler too. For example, enforce a poll raise mode for a fatal MCE. ChangeLog: v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:58:41 -07:00
Huang Ying	5b7e88edc6	x86, mce: Support specifying context for software mce injection The cpu context is specified via the new mce.inject_flags fields. This allows more realistic machine check testing in different situations. "RANDOM" context is implemented via NMI broadcasting to add randomization to testing. AK: Fix NMI broadcasting check. Fix 32-bit building. Some race fixes. Move to module. Various changes ChangeLog: v3: - Re-based on latest x86-tip.git/mce4 - Fix 32-bit building v2: - Re-base on latest x86-tip.git/mce3 Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-10 13:58:27 -07:00
Andi Kleen	3ccdccfadb	x86: mce: Lower maximum number of banks to architecture limit The Intel x86 architecture right now only supports 32 machine check banks, more would bump into other MSRs. So lower the max define to 32. This only affects a few bitmaps, most data structures are dynamically sized anyways. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:47 -07:00
Andi Kleen	5bb38adcb5	x86: mce: Remove old i386 machine check code As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE This patch only removes code. The ancient machine check code for very old systems that are not supported by CONFIG_X86_NEW_MCE is still kept. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:46 -07:00
Borislav Petkov	e487683990	x86, mce: fix typo in comment in asm/mce.h Fix comment to match the actual declaration. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-20 23:27:16 -07:00
Hidetoshi Seto	58995d2d58	x86, mce: mce.h cleanup Reorder definitions. - static inline dummy mcheck_init() for !CONFIG_X86_MCE - gather defs for exception, threshold handler Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:10 -07:00
Hidetoshi Seto	8363fc82d3	x86, mce: remove intel_set_thermal_handler() and make intel_thermal_interrupt() static. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	e8ce2c5ee8	x86, mce: unify smp_thermal_interrupt, prepare Let them in same shape. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	c697836985	x86, mce: make mce_disabled boolean The mce_disabled on 32bit is a tristate variable [1,0,-1], while 64bit version is boolean [0,1]. This patch makes mce_disabled always boolean, and use mce_p5_enabled to indicate the third state instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:07 -07:00
Hidetoshi Seto	9e55e44e39	x86, mce: unify mce.h There are 2 headers: arch/x86/include/asm/mce.h arch/x86/kernel/cpu/mcheck/mce.h and in the latter small header: #include <asm/mce.h> This patch move all contents in the latter header into the former, and fix all files using the latter to include the former instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:07 -07:00
Hidetoshi Seto	62fdac5913	x86, mce: Add boot options for corrected errors This patch introduces three boot options (no_cmci, dont_log_ce and ignore_ce) to control handling for corrected errors. The "mce=no_cmci" boot option disables the CMCI feature. Since CMCI is a new feature so having boot controls to disable it will be a help if the hardware is misbehaving. The "mce=dont_log_ce" boot option disables logging for corrected errors. All reported corrected errors will be cleared silently. This option will be useful if you never care about corrected errors. The "mce=ignore_ce" boot option disables features for corrected errors, i.e. polling timer and cmci. All corrected events are not cleared and kept in bank MSRs. Usually this disablement is not recommended, however it will be a help if there are some conflict with the BIOS or hardware monitoring applications etc., that clears corrected events in banks instead of OS. [ And trivial cleanup (space -> tab) for doc is included. ] Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 11:42:18 +02:00
Andi Kleen	9b1beaf2b5	x86, mce: support action-optional machine checks Newer Intel CPUs support a new class of machine checks called recoverable action optional. Action Optional means that the CPU detected some form of corruption in the background and tells the OS about using a machine check exception. The OS can then take appropiate action, like killing the process with the corrupted data or logging the event properly to disk. This is done by the new generic high level memory failure handler added in a earlier patch. The high level handler takes the address with the failed memory and does the appropiate action, like killing the process. In this version of the patch the high level handler is stubbed out with a weak function to not create a direct dependency on the hwpoison branch. The high level handler cannot be directly called from the machine check exception though, because it has to run in a defined process context to be able to sleep when taking VM locks (it is not expected to sleep for a long time, just do so in some exceptional cases like lock contention) Thus the MCE handler has to queue a work item for process context, trigger process context and then call the high level handler from there. This patch adds two path to process context: through a per thread kernel exit notify_user() callback or through a high priority work item. The first runs when the process exits back to user space, the other when it goes to sleep and there is no higher priority process. The machine check handler will schedule both, and whoever runs first will grab the event. This is done because quick reaction to this event is critical to avoid a potential more fatal machine check when the corruption is consumed. There is a simple lock less ring buffer to queue the corrupted addresses between the exception handler and the process context handler. Then in process context it just calls the high level VM code with the corrupted PFNs. The code adds the required code to extract the failed address from the CPU's machine check registers. It doesn't try to handle all possible cases -- the specification has 6 different ways to specify memory address -- but only the linear address. Most of the required checking has been already done earlier in the mce_severity rule checking engine. Following the Intel recommendations Action Optional errors are only enabled for known situations (encoded in MCACODs). The errors are ignored otherwise, because they are action optional. v2: Improve comment, disable preemption while processing ring buffer (reported by Ying Huang) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:59 -07:00
Andi Kleen	9ff36ee966	x86, mce: rename mce_notify_user to mce_notify_irq Rename the mce_notify_user function to mce_notify_irq. The next patch will split the wakeup handling of interrupt context and of process context and it's better to give it a clearer name for this. Contains a fix from Ying Huang [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:04 -07:00
Andi Kleen	ed7290d0ee	x86, mce: implement new status bits The x86 architecture recently added some new machine check status bits: S(ignalled) and AR (Action-Required). Signalled allows to check if a specific event caused an exception or was just logged through CMCI. AR allows the kernel to decide if an event needs immediate action or can be delayed or ignored. Implement support for these new status bits. mce_severity() uses the new bits to grade the machine check correctly and decide what to do. The exception handler uses AR to decide to kill or not. The S bit is used to separate events between the poll/CMCI handler and the exception handler. Classical UC always leads to panic. That was true before anyways because the existing CPUs always passed a PCC with it. Also corrects the rules whether to kill in user or kernel context and how to handle missing RIPV. The machine check handler largely uses the mce-severity grading engine now instead of making its own decisions. This means the logic is centralized in one place. This is useful because it has to be evaluated multiple times. v2: Some rule fixes; Add AO events Fix RIPV, RIPV\|EIPV order (Ying Huang) Fix UCNA with AR=1 message (Ying Huang) Add comment about panicing in m_c_p. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	8ee08347c1	x86, mce: extend struct mce user interface with more information. Experience has shown that struct mce which is used to pass an machine check to the user space daemon currently a few limitations. Also some data which is useful to print at panic level is also missing. This patch addresses most of them. The same information is also printed out together with mce panic. struct mce can be painlessly extended in a compatible way, the mcelog user space code just ignores additional fields with a warning. - It doesn't provide a wall time timestamp. There have been a few complaints about that. Fix that by adding a 64bit time_t - It doesn't provide the exact CPU identification. This makes it awkward for mcelog to decode the event correctly, especially when there are variations in the supported MCE codes on different CPU models or when mcelog is running on a different host after a panic. Previously the administrator had to specify the correct CPU when mcelog ran on a different host, but with the more variation in machine checks now it's better to auto detect that. It's also useful for more detailed analysis of CPU events. Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead. - Socket ID and initial APIC ID are useful to report because they allow to identify the failing CPU in some (not all) cases. This is also especially useful for the panic situation. This addresses one of the complaints from Thomas Gleixner earlier. - The MCG capabilities MSR needs to be reported for some advanced error processing in mcelog Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	d620c67fb9	x86, mce: support more than 256 CPUs in struct mce The old struct mce had a limitation to 256 CPUs. But x86 Linux supports more than that now with x2apic. Add a new field extcpu to report the extended number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	f6fb0ac086	x86, mce: store record length into memory struct mce anchor This makes it easier for tools who want to extract the mcelog out of crash images or memory dumps to adapt to changing struct mce size. The length field replaces padding, so it's fully compatible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	ca84f69697	x86, mce: add MCE poll count to /proc/interrupts Keep a count of the machine check polls (or CMCI events) in /proc/interrupts. Andi needs this for debugging, but it's also useful in general to see what's going in by the kernel. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	01ca79f141	x86, mce: add machine check exception count in /proc/interrupts Useful for debugging, but it's also good general policy to have a counter for all special interrupts there. This makes it easier to diagnose where a CPU is spending its time. [ Impact: feature, debugging tool ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
H. Peter Anvin	38736072d4	x86, mce: drop "extern" from function prototypes in asm/mce.h Function prototypes don't need to be prefixed by "extern". [ Impact: cleanup ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 10:05:33 -07:00
Andi Kleen	ea149b36c7	x86, mce: add basic error injection infrastructure Allow user programs to write mce records into /dev/mcelog. When they do that a fake machine check is triggered to test the machine check code. This uses the MCE MSR wrappers added earlier. The implementation is straight forward. There is a struct mce record per CPU and the MCE MSR accesses get data from there if there is valid data injected there. This allows to test the machine check code relatively realistically because only the lowest layer of hardware access is intercepted. The test suite and injector are available at git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	04b2b1a4df	x86, mce: rename 64bit mce_dont_init to mce_disabled Give it the same name as on 32bit. This makes further merging easier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Thomas Gleixner	01c6680a54	x86, mce: Cleanup MCG definitions Decode more magic constants and turn them into symbols. [ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Ingo Molnar	cb491fca55	x86, mce: Rename sysfs variables Shorten variable names. This also compacts the code a bit. device_mce => mce_dev mce_device_initialized => mce_dev_initialized mce_attribute => mce_attrs [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	06b851d982	x86, mce: unify, prepare 64bit in mce.h Prepare mce.h for unification, so that it will build on 32-bit x86 kernels too. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Andi Kleen	5679af4c16	x86, mce: fix boot logging logic The earlier patch to change the poller to a separate function subtly broke the boot logging logic. This could lead to machine checks getting logged at boot even when disabled or defaulting to off on some systems. Fix that. [ Impact: bug fix - avoid spurious MCE in log ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2009-04-22 13:56:25 -07:00
Andi Kleen	88ccbedd9c	x86, mce, cmci: add CMCI support Impact: Major new feature Intel CMCI (Corrected Machine Check Interrupt) is a new feature on Nehalem CPUs. It allows the CPU to trigger interrupts on corrected events, which allows faster reaction to them instead of with the traditional polling timer. Also use CMCI to discover shared banks. Machine check banks can be shared by CPU threads or even cores. Using the CMCI enable bit it is possible to detect the fact that another CPU already saw a specific bank. Use this to assign shared banks only to one CPU to avoid reporting duplicated events. On CPU hot unplug bank sharing is re discovered. This is done using a thread that cycles through all the CPUs. To avoid races between the poller and CMCI we only poll for banks that are not CMCI capable and only check CMCI owned banks on a interrupt. The shared banks ownership information is currently only used for CMCI interrupts, not polled banks. The sharing discovery code follows the algorithm recommended in the IA32 SDM Vol3a 14.5.2.1 The CMCI interrupt handler just calls the machine check poller to pick up the machine check event that caused the interrupt. I decided not to implement a separate threshold event like the AMD version has, because the threshold is always one currently and adding another event didn't seem to add any value. Some code inspired by Yunhong Jiang's Xen implementation, which was in term inspired by a earlier CMCI implementation by me. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-02-24 13:41:00 -08:00
Andi Kleen	03195c6b40	x86, mce, cmci: define MSR names and fields for new CMCI registers Impact: New register definitions only CMCI means support for raising an interrupt on a corrected machine check event instead of having to poll for it. It's a new feature in Intel Nehalem CPUs available on some machine check banks. For details see the IA32 SDM Vol3a 14.5 Define the registers for it as a preparation for further patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-02-24 13:41:00 -08:00
Andi Kleen	ee031c31d6	x86, mce, cmci: use polled banks bitmap in machine check poller Define a per cpu bitmap that contains the banks polled by the machine check poller. This is needed for the CMCI code in the next patches to be able to disable polling on specific banks. The bank by default contains all banks, so there is no behaviour change. Only future code will remove some banks from the polling set. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-02-24 13:26:05 -08:00

1 2

58 Commits