Commit Graph

1342 Commits

Author SHA1 Message Date
Chris Packham 0b3df44eeb EDAC, mv64x60: Fix pdata->name
Change this from mpc85xx_pci_err to mv64x60_pci_err. The former is
likely a hangover from when this driver was created.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170518083135.28048-3-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-26 22:54:04 +02:00
Qiuxu Zhuo d14e3a201f EDAC, sb_edac: Bump driver version and do some cleanups
Collapse 'case:' in *_mci_bind_devs() and update driver version from
1.1.1 to 1.1.2.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 15:00:36 +02:00
Qiuxu Zhuo 4d475dde79 EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present
This is based on previous work by Patrick Geary, see Link.

Additional cleanups ontop:

 - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro,
 now that TA0 and TA1 are unified.

 - Remove get_pdev_same_bus(), since in get_dimm_config() the
 variable "pvt->pci_ta" for KNL is also ready, we can simply use
 pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read
 MCMTR.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com
Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com
[ Make __populate_dimms() return int. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:57:52 +02:00
Qiuxu Zhuo 3286d3eb90 EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4
We don't need this quirk anymore now that the EDAC memory controller
representation matches the hardware.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com
[ Commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:40:40 +02:00
Borislav Petkov 6696522957 EDAC, sb_edac: Carve out dimm-populating loop
... to slim down get_dimm_config().

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:34 +02:00
Borislav Petkov 199389acd9 EDAC, sb_edac: Fix mod_name
It is called "sb_edac.c" now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:33 +02:00
Qiuxu Zhuo e2f747b1f4 EDAC, sb_edac: Assign EDAC memory controller per h/w controller
Tony pointed out: "currently the driver pretends there is one big
8-channel memory controller per socket instead of 2 4-channel
controllers. This is fine with all memory controller populated with
symmetrical DIMM configurations, but runs into difficulties on
asymmetrical setups".

Restructure the driver to assign an EDAC memory controller to each real
h/w memory controller to resolve the issue.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com
[ Break some lines at convenient points. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:21 +02:00
Tony Luck 7fd562b75d EDAC, sb_edac: Don't use "Socket#" in the memory controller name
EDAC assigns logical memory controller numbers in the order that we find
memory controllers, which depends on which PCI bus they are on. Some
systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on
socket3.

All this is made more confusing for users because we use the string
"Socket" while generating names for memory controllers, but the number
that we attach there is the memory controller number. E.g.

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT)

Change the names to say "SrcID#%d" (where the number we use is read from
the h/w associated with the memory controller instead of some logical
number internal to the EDAC driver). New message:

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT)

Reported-by: Andrey Korolyov <andrey@xdel.ru>
Reported-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:47:11 +02:00
Qiuxu Zhuo 00cf50d90a EDAC, sb_edac: Classify PCI-IDs by topology
Each of the PCI device IDs belongs to a CPU socket, or to one of the
integrated memory controllers. Provide an enum to specify the domain of
each, and distinguish the resource number in each domain: the number
of the PCI device IDs per integrated memory controller/socket, and the
number of integrated memory controllers per socket.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com
[ Realign pci_dev_descr_knl members. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:19:25 +02:00
Tobias Klauser 18caec20bf EDAC, altera: Constify irq_domain_ops
struct irq_domain_ops is not modified, so it can be made const.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.ch
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-24 15:46:25 +02:00
Yazen Ghannam eb77e6b80f EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h
The wrong index into the csbases/csmasks arrays was being passed to
the function to compute the chip select sizes, which resulted in the
wrong size being computed. Address that so that the correct values are
computed and printed.

Also, redo how we calculate the number of pages in a CS row.

Reported-by: Benjamin Bennett <benbennett@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: <stable@vger.kernel.org> # 4.10.x
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded integer math comment, minor cleanups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-03 16:27:36 +02:00
Borislav Petkov f8d5549df2 EDAC, ghes: Do not enable it by default
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-27 14:15:38 +02:00
Borislav Petkov bffc7dece9 EDAC: Rename report status accessors
Change them to have the edac_ prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:15:02 +02:00
Borislav Petkov fee27d7d97 EDAC: Delete edac_stub.c
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:48 +02:00
Borislav Petkov a06b85ff07 EDAC: Update Kconfig help text
Remove the old URLs.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:44 +02:00
Borislav Petkov e3c4ff6d8c EDAC: Remove EDAC_MM_EDAC
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.

While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-edac@vger.kernel.org
2017-04-10 17:14:41 +02:00
Borislav Petkov be1d162948 EDAC: Issue tracepoint only when it is defined
... and this happens only when CONFIG_RAS is enabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:38 +02:00
Borislav Petkov 8c22b4fece EDAC: Move edac_op_state to edac_mc.c
... as part of moving stuff away from edac_stub.c

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:29 +02:00
Borislav Petkov d3116a0837 EDAC: Remove edac_err_assert
... and the glue around it. It is not needed anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:21 +02:00
Borislav Petkov 97bb6c17ad EDAC: Get rid of edac_handlers
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:17 +02:00
Borislav Petkov db47d5f856 x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See

  c0d1217202 ("drivers/edac: add new nmi rescan").

From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.

Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."

So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:

  inline int edac_handler_set(void)
  {
         if (edac_op_state == EDAC_OPSTATE_POLL)
                 return 0;

         return atomic_read(&edac_handlers);
  }

So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
2017-04-10 17:13:48 +02:00
Borislav Petkov 76f6a26ce9 EDAC, highbank: Align Makefile directives
... like the rest of the file.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:10:43 +02:00
Sergey Temerkhanov 5195c206fd EDAC, thunderx: Remove unused code
Remove unused code reserved for upcoming CPUs.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:49:32 +02:00
Sergey Temerkhanov 3d2d8c0f84 EDAC, thunderx: Change LMC index calculation
Shift the node number by 3 bits instead of 8 allowing proper functioning
with default EDAC_MAX_MCS.

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:47:44 +02:00
Thor Thayer 25b223ddfe EDAC, altera: Fix peripheral warnings for Cyclone5
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA.
The Cyclone5 initialization generates EDAC warnings when the peripherals
aren't found in the device tree. Fix by checking for Arria10 in the init
functions.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-06 11:42:38 +02:00
Jan Glauber 621c4fe3cc EDAC, thunderx: Fix L2C MCI interrupt disable
Fix a typo that disabled the MCI interrupts using the wrong bitmask.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-05 14:46:05 +02:00
Sergey Temerkhanov 41003396f9 EDAC, thunderx: Add Cavium ThunderX EDAC driver
Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely
the DRAM controller (LMC), cache coherent processor interconnect (CCPI)
and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC)

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David.Daney@cavium.com
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-27 11:43:56 +02:00
Qiuxu Zhuo 819f60fb7d EDAC, pnd2_edac: Fix reported DIMM number
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-26 09:36:28 +02:00
Borislav Petkov cd1be315ac EDAC, pnd2_edac: Fix !EDAC_DEBUG build
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we
don't fail the build:

  drivers/edac/pnd2_edac.c: In function ‘pnd2_init’:
  drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration]
    setup_pnd2_debug();
    ^
  drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’:
  drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration]
    teardown_pnd2_debug();
    ^

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:23 +01:00
Borislav Petkov 1c5bf78114 EDAC: Select DEBUG_FS
The debugfs.c functionality relies on DEBUG_FS so select it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:09 +01:00
Tony Luck 5c71ad17f9 EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms
Initial target for this driver is the Intel Apollo Lake platform and
Denverton micro-server, they use the same internal memory controller IP
called Pondicherry2.

Memory controller registers are not in PCI config space like earlier
Intel memory controllers. For Apollo Lake platform they are accessed via
a "side-band" interface, for Denverton micro-server they are access via
PCI config space and memory map I/O. This driver is for Apollo Lake and
Denverton, but only the Denverton is fully enabled while we wait for the
sideband driver.

Apollo lake driver and initial cut at Denverton driver by Tony Luck.
Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-16 12:40:52 +01:00
Jérémy Lefaure e61555c29c EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used
as if it returned a boolean true if the width if 8. Fix the tests where
MTR_DRAM_WIDTH is misused.

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-09 09:25:29 +01:00
Colin Ian King 4bd035eae2 EDAC, xgene: Fix wrongly spelled "procesing"
Fix spelling mistake in dev_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-06 17:08:19 +01:00
Linus Torvalds 60c906bab1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

  - Assign notifier chain priorities for all RAS related handlers to
    make the ordering explicit (Borislav Petkov)

  - Improve the AMD MCA banks sysfs output (Yazen Ghannam)

  - Various cleanups and restructuring of the x86 RAS code (Borislav
    Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
  x86/ras: Get rid of mce_process_work()
  EDAC/mce/amd: Dump TSC value
  EDAC/mce/amd: Unexport amd_decode_mce()
  x86/ras/amd/inj: Change dependency
  x86/ras: Flip the TSC-adding logic
  x86/ras/amd: Make sysfs names of banks more user-friendly
  x86/ras/therm_throt: Do not log a fake MCE for thermal events
  x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
2017-02-20 12:47:44 -08:00
Yazen Ghannam 75bf2f6478 EDAC, mce_amd: Print IPID and Syndrome on a separate line
Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.

For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.

In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.

Console:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Dmesg:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:

  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-16 15:39:32 +01:00
Borislav Petkov e62d2ca9d0 EDAC, amd64: Bump driver version
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2017-02-14 11:58:05 +01:00
Wei Yongjun 279fa58035 EDAC, fsl_ddr: Make locally used symbols static
Fix the following sparse warnings:

  drivers/edac/fsl_ddr_edac.c:148:1: warning:
   symbol 'dev_attr_inject_data_hi' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:150:1: warning:
   symbol 'dev_attr_inject_data_lo' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:152:1: warning:
   symbol 'dev_attr_inject_ctrl' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-09 17:40:54 +01:00
Chris Packham 321d17c19b EDAC, mpc85xx: Add T2080 l2-cache support
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-03 10:36:35 +01:00
Yazen Ghannam 1bd9900b83 EDAC, amd64: Add x86cpuid sanity check during init
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:43:06 +01:00
Yazen Ghannam 4688c9b42d EDAC, amd64: Don't treat ECC disabled as failure
Having ECC disabled on a node doesn't necessarily mean that it's
disabled for the entire system. So let's return a non-failing code when
ECC is disabled on a node. This way we can skip initialization for the
node but still continue with the remaining nodes.

After probing all instances, make sure we have at least one MC device
allocated.

This issue is seen and fix tested on Fam15h and Fam17h MCM systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:38:49 +01:00
Yazen Ghannam d7fc9d77ac EDAC: Add routine to check if MC devices list is empty
We need to know if any MC devices have been allocated.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:36:47 +01:00
Yazen Ghannam df64636fa4 EDAC, amd64: Remove unused printing macros
amd64_{debug,notice} don't have any users, so remove them.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:20:06 +01:00
Yazen Ghannam 11ab1cae58 EDAC, amd64: Rework messages in ecc_enabled()
Print the node number when informing that DRAM ECC is disabled so
that we can show which nodes have DRAM ECC disabled. Also, print more
detailed system information as edac_dbg(), so as to not bother general
users.

Switch amd64_notice to amd64_info to match the message above it.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:18:33 +01:00
Yazen Ghannam 234365f56e EDAC, amd64: Move global code out of instance functions
We have a few functions that register/unregister an ECC error decoding
routine. These functions are called when we init/remove instances.
However, they are global and so don't need to be registered/unregistered
multiple times.

So move them out of the init/remove instance functions and into the
module init/exit routines.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:08:10 +01:00
Yazen Ghannam 2b9b2c4659 EDAC, amd64: Free unused memory when init_one_instance() fails
Jump to memory freeing routines when init_one_instance() fails.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:40 +01:00
Yazen Ghannam 67d7fd306e EDAC, mce_amd: Give more context to deferred error message
Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:29 +01:00
Borislav Petkov 58fb24cb95 EDAC, i7300: Test for the second channel properly
REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the
CS[3:2] as the simbols in error. And thus the second channel.

The macro computing it was wrong so get rid of it (it was used at one
place only) and get rid of the conditional too. Generates better code
this way anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-01-26 11:35:23 +01:00
Borislav Petkov 9026cc82b6 x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:57 +01:00
Borislav Petkov 0bceab677d EDAC/mce/amd: Dump TSC value
Dump the TSC value of the time when the MCE got logged.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:56 +01:00
Borislav Petkov 1fbcd90903 EDAC/mce/amd: Unexport amd_decode_mce()
It is not used outside of the driver anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:55 +01:00