OpenCloudOS-Kernel/drivers/edac/ghes_edac.c

655 lines
16 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/*
* GHES/EDAC Linux driver
*
* Copyright (c) 2013 by Mauro Carvalho Chehab
*
* Red Hat Inc. http://www.redhat.com
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <acpi/ghes.h>
#include <linux/edac.h>
#include <linux/dmi.h>
#include "edac_module.h"
ghes_edac: Fix RAS tracing With the current version of CPER, there's no way to associate an error with the memory error. So, the error location in EDAC layers is unused. As CPER has its own idea about memory architectural layers, just output whatever is there inside the driver's detail at the RAS tracepoint. The EDAC location keeps untouched, in the case that, in some future, we could actually map the error into the dimm labels. Now, the error message: [ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 72.396627] {1}[Hardware Error]: APEI generic hardware error status [ 72.396628] {1}[Hardware Error]: severity: 2, corrected [ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected [ 72.396632] {1}[Hardware Error]: flags: 0x01 [ 72.396634] {1}[Hardware Error]: primary [ 72.396635] {1}[Hardware Error]: section_type: memory error [ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400 [ 72.396638] {1}[Hardware Error]: node: 3 [ 72.396639] {1}[Hardware Error]: card: 0 [ 72.396640] {1}[Hardware Error]: module: 0 [ 72.396641] {1}[Hardware Error]: device: 0 [ 72.396643] {1}[Hardware Error]: error_type: 18, unknown [ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Is properly represented on the trace event: kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 sockets E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 08:35:41 +08:00
#include <ras/ras_event.h>
struct ghes_pvt {
struct mem_ctl_info *mci;
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
/* Buffers for the error handling routine */
char other_detail[400];
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
char msg[80];
};
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
static refcount_t ghes_refcount = REFCOUNT_INIT(0);
/*
* Access to ghes_pvt must be protected by ghes_lock. The spinlock
* also provides the necessary (implicit) memory barrier for the SMP
* case to make the pointer visible on another CPU.
*/
static struct ghes_pvt *ghes_pvt;
/*
* This driver's representation of the system hardware, as collected
* from DMI.
*/
struct ghes_hw_desc {
int num_dimms;
struct dimm_info *dimms;
} ghes_hw;
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
/* GHES registration mutex */
static DEFINE_MUTEX(ghes_reg_mutex);
/*
* Sync with other, potentially concurrent callers of
* ghes_edac_report_mem_error(). We don't know what the
* "inventive" firmware would do.
*/
static DEFINE_SPINLOCK(ghes_lock);
/* "ghes_edac.force_load=1" skips the platform check */
static bool __read_mostly force_load;
module_param(force_load, bool, 0);
/* Memory Device - Type 17 of SMBIOS spec */
struct memdev_dmi_entry {
u8 type;
u8 length;
u16 handle;
u16 phys_mem_array_handle;
u16 mem_err_info_handle;
u16 total_width;
u16 data_width;
u16 size;
u8 form_factor;
u8 device_set;
u8 device_locator;
u8 bank_locator;
u8 memory_type;
u16 type_detail;
u16 speed;
u8 manufacturer;
u8 serial_number;
u8 asset_tag;
u8 part_number;
u8 attributes;
u32 extended_size;
u16 conf_mem_clk_speed;
} __attribute__((__packed__));
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
static struct dimm_info *find_dimm_by_handle(struct mem_ctl_info *mci, u16 handle)
{
struct dimm_info *dimm;
mci_for_each_dimm(mci, dimm) {
if (dimm->smbios_handle == handle)
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
return dimm;
}
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
return NULL;
}
static void dimm_setup_label(struct dimm_info *dimm, u16 handle)
{
const char *bank = NULL, *device = NULL;
dmi_memdev_name(handle, &bank, &device);
/* both strings must be non-zero */
if (bank && *bank && device && *device)
snprintf(dimm->label, sizeof(dimm->label), "%s %s", bank, device);
}
static void assign_dmi_dimm_info(struct dimm_info *dimm, struct memdev_dmi_entry *entry)
{
u16 rdr_mask = BIT(7) | BIT(13);
if (entry->size == 0xffff) {
pr_info("Can't get DIMM%i size\n", dimm->idx);
dimm->nr_pages = MiB_TO_PAGES(32);/* Unknown */
} else if (entry->size == 0x7fff) {
dimm->nr_pages = MiB_TO_PAGES(entry->extended_size);
} else {
if (entry->size & BIT(15))
dimm->nr_pages = MiB_TO_PAGES((entry->size & 0x7fff) << 10);
else
dimm->nr_pages = MiB_TO_PAGES(entry->size);
}
switch (entry->memory_type) {
case 0x12:
if (entry->type_detail & BIT(13))
dimm->mtype = MEM_RDDR;
else
dimm->mtype = MEM_DDR;
break;
case 0x13:
if (entry->type_detail & BIT(13))
dimm->mtype = MEM_RDDR2;
else
dimm->mtype = MEM_DDR2;
break;
case 0x14:
dimm->mtype = MEM_FB_DDR2;
break;
case 0x18:
if (entry->type_detail & BIT(12))
dimm->mtype = MEM_NVDIMM;
else if (entry->type_detail & BIT(13))
dimm->mtype = MEM_RDDR3;
else
dimm->mtype = MEM_DDR3;
break;
case 0x1a:
if (entry->type_detail & BIT(12))
dimm->mtype = MEM_NVDIMM;
else if (entry->type_detail & BIT(13))
dimm->mtype = MEM_RDDR4;
else
dimm->mtype = MEM_DDR4;
break;
default:
if (entry->type_detail & BIT(6))
dimm->mtype = MEM_RMBS;
else if ((entry->type_detail & rdr_mask) == rdr_mask)
dimm->mtype = MEM_RDR;
else if (entry->type_detail & BIT(7))
dimm->mtype = MEM_SDR;
else if (entry->type_detail & BIT(9))
dimm->mtype = MEM_EDO;
else
dimm->mtype = MEM_UNKNOWN;
}
/*
* Actually, we can only detect if the memory has bits for
* checksum or not
*/
if (entry->total_width == entry->data_width)
dimm->edac_mode = EDAC_NONE;
else
dimm->edac_mode = EDAC_SECDED;
dimm->dtype = DEV_UNKNOWN;
dimm->grain = 128; /* Likely, worse case */
dimm_setup_label(dimm, entry->handle);
if (dimm->nr_pages) {
edac_dbg(1, "DIMM%i: %s size = %d MB%s\n",
dimm->idx, edac_mem_types[dimm->mtype],
PAGES_TO_MiB(dimm->nr_pages),
(dimm->edac_mode != EDAC_NONE) ? "(ECC)" : "");
edac_dbg(2, "\ttype %d, detail 0x%02x, width %d(total %d)\n",
entry->memory_type, entry->type_detail,
entry->total_width, entry->data_width);
}
dimm->smbios_handle = entry->handle;
}
static void enumerate_dimms(const struct dmi_header *dh, void *arg)
{
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
struct ghes_hw_desc *hw = (struct ghes_hw_desc *)arg;
struct dimm_info *d;
if (dh->type != DMI_ENTRY_MEM_DEVICE)
return;
/* Enlarge the array with additional 16 */
if (!hw->num_dimms || !(hw->num_dimms % 16)) {
struct dimm_info *new;
new = krealloc(hw->dimms, (hw->num_dimms + 16) * sizeof(struct dimm_info),
GFP_KERNEL);
if (!new) {
WARN_ON_ONCE(1);
return;
}
hw->dimms = new;
}
d = &hw->dimms[hw->num_dimms];
d->idx = hw->num_dimms;
assign_dmi_dimm_info(d, entry);
hw->num_dimms++;
}
static void ghes_scan_system(void)
{
static bool scanned;
if (scanned)
return;
dmi_walk(enumerate_dimms, &ghes_hw);
scanned = true;
}
void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
{
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
struct ghes_pvt *pvt;
unsigned long flags;
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
char *p;
/*
* We can do the locking below because GHES defers error processing
* from NMI to IRQ context. Whenever that changes, we'd at least
* know.
*/
if (WARN_ON_ONCE(in_nmi()))
return;
spin_lock_irqsave(&ghes_lock, flags);
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
pvt = ghes_pvt;
if (!pvt)
goto unlock;
mci = pvt->mci;
e = &mci->error_desc;
/* Cleans the error report buffer */
memset(e, 0, sizeof (*e));
e->error_count = 1;
EDAC/ghes: Fix grain calculation The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
2019-11-06 17:33:23 +08:00
e->grain = 1;
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
e->msg = pvt->msg;
e->other_detail = pvt->other_detail;
e->top_layer = -1;
e->mid_layer = -1;
e->low_layer = -1;
*pvt->other_detail = '\0';
*pvt->msg = '\0';
switch (sev) {
case GHES_SEV_CORRECTED:
e->type = HW_EVENT_ERR_CORRECTED;
break;
case GHES_SEV_RECOVERABLE:
e->type = HW_EVENT_ERR_UNCORRECTED;
break;
case GHES_SEV_PANIC:
e->type = HW_EVENT_ERR_FATAL;
break;
default:
case GHES_SEV_NO:
e->type = HW_EVENT_ERR_INFO;
}
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
edac_dbg(1, "error validation_bits: 0x%08llx\n",
(long long)mem_err->validation_bits);
/* Error type, mapped on e->msg */
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
p = pvt->msg;
switch (mem_err->error_type) {
case 0:
p += sprintf(p, "Unknown");
break;
case 1:
p += sprintf(p, "No error");
break;
case 2:
p += sprintf(p, "Single-bit ECC");
break;
case 3:
p += sprintf(p, "Multi-bit ECC");
break;
case 4:
p += sprintf(p, "Single-symbol ChipKill ECC");
break;
case 5:
p += sprintf(p, "Multi-symbol ChipKill ECC");
break;
case 6:
p += sprintf(p, "Master abort");
break;
case 7:
p += sprintf(p, "Target abort");
break;
case 8:
p += sprintf(p, "Parity Error");
break;
case 9:
p += sprintf(p, "Watchdog timeout");
break;
case 10:
p += sprintf(p, "Invalid address");
break;
case 11:
p += sprintf(p, "Mirror Broken");
break;
case 12:
p += sprintf(p, "Memory Sparing");
break;
case 13:
p += sprintf(p, "Scrub corrected error");
break;
case 14:
p += sprintf(p, "Scrub uncorrected error");
break;
case 15:
p += sprintf(p, "Physical Memory Map-out event");
break;
default:
p += sprintf(p, "reserved error (%d)",
mem_err->error_type);
}
} else {
strcpy(pvt->msg, "unknown error");
}
/* Error address */
if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
e->page_frame_number = PHYS_PFN(mem_err->physical_addr);
e->offset_in_page = offset_in_page(mem_err->physical_addr);
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
}
/* Error grain */
if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
EDAC/ghes: Fix grain calculation The current code to convert a physical address mask to a grain (defined as granularity in bytes) is: e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); This is broken in several ways: 1) It calculates to wrong grain values. E.g., a physical address mask of ~0xfff should give a grain of 0x1000. Without considering PAGE_MASK, there is an off-by-one. Things are worse when also filtering it with ~PAGE_MASK. This will calculate to a grain with the upper bits set. In the example it even calculates to ~0. 2) The grain does not depend on and is unrelated to the kernel's page-size. The page-size only matters when unmapping memory in memory_failure(). Smaller grains are wrongly rounded up to the page-size, on architectures with a configurable page-size (e.g. arm64) this could round up to the even bigger page-size of the hypervisor. Fix this with: e->grain = ~mem_err->physical_addr_mask + 1; The grain_bits are defined as: grain = 1 << grain_bits; Change also the grain_bits calculation accordingly, it is the same formula as in edac_mc.c now and the code can be unified. The value in ->physical_addr_mask coming from firmware is assumed to be contiguous, but this is not sanity-checked. However, in case the mask is non-contiguous, a conversion to grain_bits effectively converts the grain bit mask to a power of 2 by rounding it up. Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
2019-11-06 17:33:23 +08:00
e->grain = ~mem_err->physical_addr_mask + 1;
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
/* Memory error location, mapped on e->location */
p = e->location;
if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
p += sprintf(p, "node:%d ", mem_err->node);
if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
p += sprintf(p, "card:%d ", mem_err->card);
if (mem_err->validation_bits & CPER_MEM_VALID_MODULE)
p += sprintf(p, "module:%d ", mem_err->module);
if (mem_err->validation_bits & CPER_MEM_VALID_RANK_NUMBER)
p += sprintf(p, "rank:%d ", mem_err->rank);
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
if (mem_err->validation_bits & CPER_MEM_VALID_BANK)
p += sprintf(p, "bank:%d ", mem_err->bank);
if (mem_err->validation_bits & CPER_MEM_VALID_ROW)
p += sprintf(p, "row:%d ", mem_err->row);
if (mem_err->validation_bits & CPER_MEM_VALID_COLUMN)
p += sprintf(p, "col:%d ", mem_err->column);
if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
const char *bank = NULL, *device = NULL;
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
struct dimm_info *dimm;
dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
if (bank != NULL && device != NULL)
p += sprintf(p, "DIMM location:%s %s ", bank, device);
else
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
mem_err->mem_dev_handle);
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
dimm = find_dimm_by_handle(mci, mem_err->mem_dev_handle);
if (dimm) {
e->top_layer = dimm->idx;
strcpy(e->label, dimm->label);
}
}
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
if (p > e->location)
*(p - 1) = '\0';
EDAC/ghes: Setup DIMM label from DMI and use it in error reports The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
2020-05-28 18:13:06 +08:00
if (!*e->label)
strcpy(e->label, "unknown memory");
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
/* All other fields are mapped on e->other_detail */
p = pvt->other_detail;
p += snprintf(p, sizeof(pvt->other_detail),
"APEI location: %s ", e->location);
ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20 06:24:12 +08:00
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
u64 status = mem_err->error_status;
p += sprintf(p, "status(0x%016llx): ", (long long)status);
switch ((status >> 8) & 0xff) {
case 1:
p += sprintf(p, "Error detected internal to the component ");
break;
case 16:
p += sprintf(p, "Error detected in the bus ");
break;
case 4:
p += sprintf(p, "Storage error in DRAM memory ");
break;
case 5:
p += sprintf(p, "Storage error in TLB ");
break;
case 6:
p += sprintf(p, "Storage error in cache ");
break;
case 7:
p += sprintf(p, "Error in one or more functional units ");
break;
case 8:
p += sprintf(p, "component failed self test ");
break;
case 9:
p += sprintf(p, "Overflow or undervalue of internal queue ");
break;
case 17:
p += sprintf(p, "Virtual address not found on IO-TLB or IO-PDIR ");
break;
case 18:
p += sprintf(p, "Improper access error ");
break;
case 19:
p += sprintf(p, "Access to a memory address which is not mapped to any component ");
break;
case 20:
p += sprintf(p, "Loss of Lockstep ");
break;
case 21:
p += sprintf(p, "Response not associated with a request ");
break;
case 22:
p += sprintf(p, "Bus parity error - must also set the A, C, or D Bits ");
break;
case 23:
p += sprintf(p, "Detection of a PATH_ERROR ");
break;
case 25:
p += sprintf(p, "Bus operation timeout ");
break;
case 26:
p += sprintf(p, "A read was issued to data that has been poisoned ");
break;
default:
p += sprintf(p, "reserved ");
break;
}
}
if (mem_err->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
p += sprintf(p, "requestorID: 0x%016llx ",
(long long)mem_err->requestor_id);
if (mem_err->validation_bits & CPER_MEM_VALID_RESPONDER_ID)
p += sprintf(p, "responderID: 0x%016llx ",
(long long)mem_err->responder_id);
if (mem_err->validation_bits & CPER_MEM_VALID_TARGET_ID)
p += sprintf(p, "targetID: 0x%016llx ",
(long long)mem_err->responder_id);
if (p > pvt->other_detail)
*(p - 1) = '\0';
edac_raw_mc_handle_error(e);
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
unlock:
spin_unlock_irqrestore(&ghes_lock, flags);
}
/*
* Known systems that are safe to enable this module.
*/
static struct acpi_platform_list plat_list[] = {
{"HPE ", "Server ", 0, ACPI_SIG_FADT, all_versions},
{ } /* End */
};
int ghes_edac_register(struct ghes *ghes, struct device *dev)
{
bool fake = false;
struct mem_ctl_info *mci;
struct ghes_pvt *pvt;
struct edac_mc_layer layers[1];
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
unsigned long flags;
int idx = -1;
int rc = 0;
if (IS_ENABLED(CONFIG_X86)) {
/* Check if safe to enable on this system */
idx = acpi_match_platform_list(plat_list);
if (!force_load && idx < 0)
return -ENODEV;
} else {
idx = 0;
}
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
/* finish another registration/unregistration instance first */
mutex_lock(&ghes_reg_mutex);
/*
* We have only one logical memory controller to which all DIMMs belong.
*/
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
if (refcount_inc_not_zero(&ghes_refcount))
goto unlock;
ghes_scan_system();
/* Check if we've got a bogus BIOS */
if (!ghes_hw.num_dimms) {
fake = true;
ghes_hw.num_dimms = 1;
}
layers[0].type = EDAC_MC_LAYER_ALL_MEM;
layers[0].size = ghes_hw.num_dimms;
layers[0].is_virt_csrow = true;
mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_pvt));
if (!mci) {
pr_info("Can't allocate memory for EDAC data\n");
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
rc = -ENOMEM;
goto unlock;
}
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
pvt = mci->pvt_info;
pvt->mci = mci;
mci->pdev = dev;
mci->mtype_cap = MEM_FLAG_EMPTY;
mci->edac_ctl_cap = EDAC_FLAG_NONE;
mci->edac_cap = EDAC_FLAG_NONE;
mci->mod_name = "ghes_edac.c";
mci->ctl_name = "ghes_edac";
mci->dev_name = "ghes";
if (fake) {
pr_info("This system has a very crappy BIOS: It doesn't even list the DIMMS.\n");
pr_info("Its SMBIOS info is wrong. It is doubtful that the error report would\n");
pr_info("work on such system. Use this driver with caution\n");
} else if (idx < 0) {
pr_info("This EDAC driver relies on BIOS to enumerate memory and get error reports.\n");
pr_info("Unfortunately, not all BIOSes reflect the memory layout correctly.\n");
pr_info("So, the end result of using this driver varies from vendor to vendor.\n");
pr_info("If you find incorrect reports, please contact your hardware vendor\n");
pr_info("to correct its BIOS.\n");
pr_info("This system has %d DIMM sockets.\n", ghes_hw.num_dimms);
}
if (!fake) {
struct dimm_info *src, *dst;
int i = 0;
mci_for_each_dimm(mci, dst) {
src = &ghes_hw.dimms[i];
dst->idx = src->idx;
dst->smbios_handle = src->smbios_handle;
dst->nr_pages = src->nr_pages;
dst->mtype = src->mtype;
dst->edac_mode = src->edac_mode;
dst->dtype = src->dtype;
dst->grain = src->grain;
/*
* If no src->label, preserve default label assigned
* from EDAC core.
*/
if (strlen(src->label))
memcpy(dst->label, src->label, sizeof(src->label));
i++;
}
} else {
struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
dimm->nr_pages = 1;
dimm->grain = 128;
dimm->mtype = MEM_UNKNOWN;
dimm->dtype = DEV_UNKNOWN;
dimm->edac_mode = EDAC_SECDED;
}
rc = edac_mc_add_mc(mci);
if (rc < 0) {
pr_info("Can't register with the EDAC core\n");
edac_mc_free(mci);
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
rc = -ENODEV;
goto unlock;
}
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
spin_lock_irqsave(&ghes_lock, flags);
ghes_pvt = pvt;
spin_unlock_irqrestore(&ghes_lock, flags);
EDAC/ghes: Do not warn when incrementing refcount on 0 The following warning from the refcount framework is seen during ghes initialization: EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) ------------[ cut here ]------------ refcount_t: increment on 0; use-after-free. WARNING: CPU: 36 PID: 1 at lib/refcount.c:156 refcount_inc_checked [...] Call trace: refcount_inc_checked ghes_edac_register ghes_probe ... It warns if the refcount is incremented from zero. This warning is reasonable as a kernel object is typically created with a refcount of one and freed once the refcount is zero. Afterwards the object would be "used-after-free". For GHES, the refcount is initialized with zero, and that is why this message is seen when initializing the first instance. However, whenever the refcount is zero, the device will be allocated and registered. Since the ghes_reg_mutex protects the refcount and serializes allocation and freeing of ghes devices, a use-after-free cannot happen here. Instead of using refcount_inc() for the first instance, use refcount_set(). This can be used here because the refcount is zero at this point and can not change due to its protection by the mutex. Fixes: 23f61b9fc5cc ("EDAC/ghes: Fix locking and memory barrier issues") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <huangming23@huawei.com> Cc: James Morse <james.morse@arm.com> Cc: <linuxarm@huawei.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: <tanxiaofei@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <wanghuiqiang@huawei.com> Link: https://lkml.kernel.org/r/20191121213628.21244-1-rrichter@marvell.com
2019-11-22 05:36:57 +08:00
/* only set on success */
refcount_set(&ghes_refcount, 1);
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
unlock:
/* Not needed anymore */
kfree(ghes_hw.dimms);
ghes_hw.dimms = NULL;
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
mutex_unlock(&ghes_reg_mutex);
return rc;
}
void ghes_edac_unregister(struct ghes *ghes)
{
struct mem_ctl_info *mci;
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
unsigned long flags;
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
mutex_lock(&ghes_reg_mutex);
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
if (!refcount_dec_and_test(&ghes_refcount))
goto unlock;
EDAC/ghes: Fix Use after free in ghes_edac remove path ghes_edac models a single logical memory controller, and uses a global ghes_init variable to ensure only the first ghes_edac_register() will do anything. ghes_edac is registered the first time a GHES entry in the HEST is probed. There may be multiple entries, so subsequent attempts to register ghes_edac are silently ignored as the work has already been done. When a GHES entry is unregistered, it calls ghes_edac_unregister(), which free()s the memory behind the global variables in ghes_edac. But there may be multiple GHES entries, the next call to ghes_edac_unregister() will dereference the free()d memory, and attempt to free it a second time. This may also be triggered on a platform with one GHES entry, if the driver is unbound/re-bound and unbound. The re-bind step will do nothing because of ghes_init, the second unbind will then do the same work as the first. Doing the unregister work on the first call is unsafe, as another CPU may be processing a notification in ghes_edac_report_mem_error(), using the memory we are about to free. ghes_init is already half of the reference counting. We only need to do the register work for the first call, and the unregister work for the last. Add the unregister check. This means we no longer free ghes_edac's memory while there are GHES entries that may receive a notification. This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE. [ bp: merge into a single patch. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
2019-10-15 01:19:18 +08:00
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
/*
* Wait for the irq handler being finished.
*/
spin_lock_irqsave(&ghes_lock, flags);
mci = ghes_pvt ? ghes_pvt->mci : NULL;
EDAC/ghes: Fix Use after free in ghes_edac remove path ghes_edac models a single logical memory controller, and uses a global ghes_init variable to ensure only the first ghes_edac_register() will do anything. ghes_edac is registered the first time a GHES entry in the HEST is probed. There may be multiple entries, so subsequent attempts to register ghes_edac are silently ignored as the work has already been done. When a GHES entry is unregistered, it calls ghes_edac_unregister(), which free()s the memory behind the global variables in ghes_edac. But there may be multiple GHES entries, the next call to ghes_edac_unregister() will dereference the free()d memory, and attempt to free it a second time. This may also be triggered on a platform with one GHES entry, if the driver is unbound/re-bound and unbound. The re-bind step will do nothing because of ghes_init, the second unbind will then do the same work as the first. Doing the unregister work on the first call is unsafe, as another CPU may be processing a notification in ghes_edac_report_mem_error(), using the memory we are about to free. ghes_init is already half of the reference counting. We only need to do the register work for the first call, and the unregister work for the last. Add the unregister check. This means we no longer free ghes_edac's memory while there are GHES entries that may receive a notification. This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE. [ bp: merge into a single patch. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rrichter@marvell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
2019-10-15 01:19:18 +08:00
ghes_pvt = NULL;
EDAC/ghes: Fix locking and memory barrier issues The ghes registration and refcount is broken in several ways: * ghes_edac_register() returns with success for a 2nd instance even if a first instance's registration is still running. This is not correct as the first instance may fail later. A subsequent registration may not finish before the first. Parallel registrations must be avoided. * The refcount was increased even if a registration failed. This leads to stale counters preventing the device from being released. * The ghes refcount may not be decremented properly on unregistration. Always decrement the refcount once ghes_edac_unregister() is called to keep the refcount sane. * The ghes_pvt pointer is handed to the irq handler before registration finished. * The mci structure could be freed while the irq handler is running. Fix this by adding a mutex to ghes_edac_register(). This mutex serializes instances to register and unregister. The refcount is only increased if the registration succeeded. This makes sure the refcount is in a consistent state after registering or unregistering a device. Note: A spinlock cannot be used here as the code section may sleep. The ghes_pvt is protected by ghes_lock now. This ensures the pointer is not updated before registration was finished or while the irq handler is running. It is unset before unregistering the device including necessary (implicit) memory barriers making the changes visible to other CPUs. Thus, the device can not be used anymore by an interrupt. Also, rename ghes_init to ghes_refcount for better readability and switch to refcount API. A refcount is needed because there can be multiple GHES structures being defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error Source, "Some platforms may describe multiple Generic Hardware Error Source structures with different notification types, ..."). Another approach to use the mci's device refcount (get_device()) and have a release function does not work here. A release function will be called only for device_release() with the last put_device() call. The device must be deleted *before* that with device_del(). This is only possible by maintaining an own refcount. [ bp: touchups. ] Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller") Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path") Co-developed-by: James Morse <james.morse@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
2019-11-06 04:07:51 +08:00
spin_unlock_irqrestore(&ghes_lock, flags);
if (!mci)
goto unlock;
mci = edac_mc_del_mc(mci->pdev);
if (mci)
edac_mc_free(mci);
unlock:
mutex_unlock(&ghes_reg_mutex);
}