Commit Graph

1388 Commits

Author SHA1 Message Date
Borislav Petkov 6696522957 EDAC, sb_edac: Carve out dimm-populating loop
... to slim down get_dimm_config().

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:34 +02:00
Borislav Petkov 199389acd9 EDAC, sb_edac: Fix mod_name
It is called "sb_edac.c" now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:33 +02:00
Qiuxu Zhuo e2f747b1f4 EDAC, sb_edac: Assign EDAC memory controller per h/w controller
Tony pointed out: "currently the driver pretends there is one big
8-channel memory controller per socket instead of 2 4-channel
controllers. This is fine with all memory controller populated with
symmetrical DIMM configurations, but runs into difficulties on
asymmetrical setups".

Restructure the driver to assign an EDAC memory controller to each real
h/w memory controller to resolve the issue.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com
[ Break some lines at convenient points. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 14:37:21 +02:00
Tony Luck 7fd562b75d EDAC, sb_edac: Don't use "Socket#" in the memory controller name
EDAC assigns logical memory controller numbers in the order that we find
memory controllers, which depends on which PCI bus they are on. Some
systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on
socket3.

All this is made more confusing for users because we use the string
"Socket" while generating names for memory controllers, but the number
that we attach there is the memory controller number. E.g.

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT)

Change the names to say "SrcID#%d" (where the number we use is read from
the h/w associated with the memory controller instead of some logical
number internal to the EDAC driver). New message:

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT)

Reported-by: Andrey Korolyov <andrey@xdel.ru>
Reported-by: Patrick Geary <patrickg@supermicro.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:47:11 +02:00
Qiuxu Zhuo 00cf50d90a EDAC, sb_edac: Classify PCI-IDs by topology
Each of the PCI device IDs belongs to a CPU socket, or to one of the
integrated memory controllers. Provide an enum to specify the domain of
each, and distinguish the resource number in each domain: the number
of the PCI device IDs per integrated memory controller/socket, and the
number of integrated memory controllers per socket.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com
[ Realign pci_dev_descr_knl members. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-25 11:19:25 +02:00
Tobias Klauser 18caec20bf EDAC, altera: Constify irq_domain_ops
struct irq_domain_ops is not modified, so it can be made const.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.ch
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-24 15:46:25 +02:00
Yazen Ghannam eb77e6b80f EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h
The wrong index into the csbases/csmasks arrays was being passed to
the function to compute the chip select sizes, which resulted in the
wrong size being computed. Address that so that the correct values are
computed and printed.

Also, redo how we calculate the number of pages in a CS row.

Reported-by: Benjamin Bennett <benbennett@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: <stable@vger.kernel.org> # 4.10.x
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded integer math comment, minor cleanups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-05-03 16:27:36 +02:00
Borislav Petkov f8d5549df2 EDAC, ghes: Do not enable it by default
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-27 14:15:38 +02:00
Borislav Petkov bffc7dece9 EDAC: Rename report status accessors
Change them to have the edac_ prefix.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:15:02 +02:00
Borislav Petkov fee27d7d97 EDAC: Delete edac_stub.c
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to
a module parameter.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:48 +02:00
Borislav Petkov a06b85ff07 EDAC: Update Kconfig help text
Remove the old URLs.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:44 +02:00
Borislav Petkov e3c4ff6d8c EDAC: Remove EDAC_MM_EDAC
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of
that indirection. Update defconfigs which had it.

While at it, fix dependencies such that EDAC depends on RAS for the
tracepoints.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-edac@vger.kernel.org
2017-04-10 17:14:41 +02:00
Borislav Petkov be1d162948 EDAC: Issue tracepoint only when it is defined
... and this happens only when CONFIG_RAS is enabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:38 +02:00
Borislav Petkov 8c22b4fece EDAC: Move edac_op_state to edac_mc.c
... as part of moving stuff away from edac_stub.c

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:29 +02:00
Borislav Petkov d3116a0837 EDAC: Remove edac_err_assert
... and the glue around it. It is not needed anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:21 +02:00
Borislav Petkov 97bb6c17ad EDAC: Get rid of edac_handlers
Use mc_devices list instead to check whether we have EDAC driver
instances successfully registered with EDAC core.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:14:17 +02:00
Borislav Petkov db47d5f856 x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI
Apparently, some machines used to report DRAM errors through a PCI SERR
NMI. This is why we have a call into EDAC in the NMI handler. See

  c0d1217202 ("drivers/edac: add new nmi rescan").

From looking at the patch above, that's two drivers: e752x_edac.c and
e7xxx_edac.c. Now, I wanna say those are old machines which are probably
decommissioned already.

Tony says that "[t]the newest CPU supported by either of those drivers
is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some
folks are still using these ... but people that hold onto h/w for 7
years generally cling to old s/w too ... so I'd guess it unlikely that
we will get complaints for breaking these in upstream."

So even if there is a small number still in use, we did load EDAC with
edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact)
which means a default EDAC setup without any parameters supplied on the
command line or otherwise would never even log the error in the NMI
handler because we're polling by default:

  inline int edac_handler_set(void)
  {
         if (edac_op_state == EDAC_OPSTATE_POLL)
                 return 0;

         return atomic_read(&edac_handlers);
  }

So, long story short, I'd like to get rid of that nastiness called
edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If
we ever have to do stuff like that again, it should be notifiers we're
using and not some insanity like this one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
2017-04-10 17:13:48 +02:00
Borislav Petkov 76f6a26ce9 EDAC, highbank: Align Makefile directives
... like the rest of the file.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10 17:10:43 +02:00
Sergey Temerkhanov 5195c206fd EDAC, thunderx: Remove unused code
Remove unused code reserved for upcoming CPUs.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:49:32 +02:00
Sergey Temerkhanov 3d2d8c0f84 EDAC, thunderx: Change LMC index calculation
Shift the node number by 3 bits instead of 8 allowing proper functioning
with default EDAC_MAX_MCS.

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-07 11:47:44 +02:00
Thor Thayer 25b223ddfe EDAC, altera: Fix peripheral warnings for Cyclone5
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA.
The Cyclone5 initialization generates EDAC warnings when the peripherals
aren't found in the device tree. Fix by checking for Arria10 in the init
functions.

Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-06 11:42:38 +02:00
Jan Glauber 621c4fe3cc EDAC, thunderx: Fix L2C MCI interrupt disable
Fix a typo that disabled the MCI interrupts using the wrong bitmask.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-05 14:46:05 +02:00
Sergey Temerkhanov 41003396f9 EDAC, thunderx: Add Cavium ThunderX EDAC driver
Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely
the DRAM controller (LMC), cache coherent processor interconnect (CCPI)
and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC)

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Cc: David.Daney@cavium.com
Cc: Jan.Glauber@cavium.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-27 11:43:56 +02:00
Qiuxu Zhuo 819f60fb7d EDAC, pnd2_edac: Fix reported DIMM number
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded
to zero. Pass in the correct daddr->dimm value.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-26 09:36:28 +02:00
Borislav Petkov cd1be315ac EDAC, pnd2_edac: Fix !EDAC_DEBUG build
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we
don't fail the build:

  drivers/edac/pnd2_edac.c: In function ‘pnd2_init’:
  drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration]
    setup_pnd2_debug();
    ^
  drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’:
  drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration]
    teardown_pnd2_debug();
    ^

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:23 +01:00
Borislav Petkov 1c5bf78114 EDAC: Select DEBUG_FS
The debugfs.c functionality relies on DEBUG_FS so select it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-23 12:56:09 +01:00
Tony Luck 5c71ad17f9 EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms
Initial target for this driver is the Intel Apollo Lake platform and
Denverton micro-server, they use the same internal memory controller IP
called Pondicherry2.

Memory controller registers are not in PCI config space like earlier
Intel memory controllers. For Apollo Lake platform they are accessed via
a "side-band" interface, for Denverton micro-server they are access via
PCI config space and memory map I/O. This driver is for Apollo Lake and
Denverton, but only the Denverton is fully enabled while we wait for the
sideband driver.

Apollo lake driver and initial cut at Denverton driver by Tony Luck.
Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-16 12:40:52 +01:00
Jérémy Lefaure e61555c29c EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used
as if it returned a boolean true if the width if 8. Fix the tests where
MTR_DRAM_WIDTH is misused.

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-09 09:25:29 +01:00
Colin Ian King 4bd035eae2 EDAC, xgene: Fix wrongly spelled "procesing"
Fix spelling mistake in dev_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-03-06 17:08:19 +01:00
Linus Torvalds 60c906bab1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

  - Assign notifier chain priorities for all RAS related handlers to
    make the ordering explicit (Borislav Petkov)

  - Improve the AMD MCA banks sysfs output (Yazen Ghannam)

  - Various cleanups and restructuring of the x86 RAS code (Borislav
    Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
  x86/ras: Get rid of mce_process_work()
  EDAC/mce/amd: Dump TSC value
  EDAC/mce/amd: Unexport amd_decode_mce()
  x86/ras/amd/inj: Change dependency
  x86/ras: Flip the TSC-adding logic
  x86/ras/amd: Make sysfs names of banks more user-friendly
  x86/ras/therm_throt: Do not log a fake MCE for thermal events
  x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
2017-02-20 12:47:44 -08:00
Yazen Ghannam 75bf2f6478 EDAC, mce_amd: Print IPID and Syndrome on a separate line
Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.

For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.

In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.

Console:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Dmesg:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:

  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-16 15:39:32 +01:00
Borislav Petkov e62d2ca9d0 EDAC, amd64: Bump driver version
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen
so it is only natural ... :-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2017-02-14 11:58:05 +01:00
Wei Yongjun 279fa58035 EDAC, fsl_ddr: Make locally used symbols static
Fix the following sparse warnings:

  drivers/edac/fsl_ddr_edac.c:148:1: warning:
   symbol 'dev_attr_inject_data_hi' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:150:1: warning:
   symbol 'dev_attr_inject_data_lo' was not declared. Should it be static?
  drivers/edac/fsl_ddr_edac.c:152:1: warning:
   symbol 'dev_attr_inject_ctrl' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-09 17:40:54 +01:00
Chris Packham 321d17c19b EDAC, mpc85xx: Add T2080 l2-cache support
The L2 cache controller on the T2080 SoC has similar capabilities to the
others already supported by the mpc85xx_edac driver. Add it to the list
of compatible devices.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: devicetree@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-02-03 10:36:35 +01:00
Yazen Ghannam 1bd9900b83 EDAC, amd64: Add x86cpuid sanity check during init
Match one of the devices in amd64_cpuids[] before loading the module.
This is an additional sanity check against users trying to load
amd64_edac_mod on unsupported systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com
[ Get rid of err_ret label, make it a bit more readable this way. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:43:06 +01:00
Yazen Ghannam 4688c9b42d EDAC, amd64: Don't treat ECC disabled as failure
Having ECC disabled on a node doesn't necessarily mean that it's
disabled for the entire system. So let's return a non-failing code when
ECC is disabled on a node. This way we can skip initialization for the
node but still continue with the remaining nodes.

After probing all instances, make sure we have at least one MC device
allocated.

This issue is seen and fix tested on Fam15h and Fam17h MCM systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:38:49 +01:00
Yazen Ghannam d7fc9d77ac EDAC: Add routine to check if MC devices list is empty
We need to know if any MC devices have been allocated.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com
[ Prettify text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 14:36:47 +01:00
Yazen Ghannam df64636fa4 EDAC, amd64: Remove unused printing macros
amd64_{debug,notice} don't have any users, so remove them.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:20:06 +01:00
Yazen Ghannam 11ab1cae58 EDAC, amd64: Rework messages in ecc_enabled()
Print the node number when informing that DRAM ECC is disabled so
that we can show which nodes have DRAM ECC disabled. Also, print more
detailed system information as edac_dbg(), so as to not bother general
users.

Switch amd64_notice to amd64_info to match the message above it.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:18:33 +01:00
Yazen Ghannam 234365f56e EDAC, amd64: Move global code out of instance functions
We have a few functions that register/unregister an ECC error decoding
routine. These functions are called when we init/remove instances.
However, they are global and so don't need to be registered/unregistered
multiple times.

So move them out of the init/remove instance functions and into the
module init/exit routines.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:08:10 +01:00
Yazen Ghannam 2b9b2c4659 EDAC, amd64: Free unused memory when init_one_instance() fails
Jump to memory freeing routines when init_one_instance() fails.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:40 +01:00
Yazen Ghannam 67d7fd306e EDAC, mce_amd: Give more context to deferred error message
Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28 13:03:29 +01:00
Borislav Petkov 58fb24cb95 EDAC, i7300: Test for the second channel properly
REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the
CS[3:2] as the simbols in error. And thus the second channel.

The macro computing it was wrong so get rid of it (it was used at one
place only) and get rid of the conditional too. Generates better code
this way anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-01-26 11:35:23 +01:00
Borislav Petkov 9026cc82b6 x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:57 +01:00
Borislav Petkov 0bceab677d EDAC/mce/amd: Dump TSC value
Dump the TSC value of the time when the MCE got logged.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:56 +01:00
Borislav Petkov 1fbcd90903 EDAC/mce/amd: Unexport amd_decode_mce()
It is not used outside of the driver anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:55 +01:00
Nicolas Iooss 127c1225bf EDAC, sb_edac: Get rid of ->show_interleave_mode()
Function sbridge_register_mci() sets pvt->info.show_interleave_mode
to knl_show_interleave_mode() on Knight's Landing and
show_interleave_mode() anywhere else.

Merge show_interleave_mode() and knl_show_interleave_mode() in a single
implementation and use it without an indirect function pointer.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org
[ Call it get_intlv_mode_str(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-23 11:39:48 +01:00
Aaron Miller 4fb6fde74d EDAC: Expose per-DIMM error counts in sysfs
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.

EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.

Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-19 10:29:40 +01:00
Yazen Ghannam 2287c63643 EDAC, amd64: Save and return err code from probe_one_instance()
We should save the return code from probe_one_instance() so that it can
be returned from the module init function. Otherwise, we'll be returning
the -ENOMEM from above.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-16 11:53:39 +01:00
Arvind Yadav f5c61277f6 EDAC, i82975x: Add ioremap_nocache() error handling
If ioremap_nocache() fails, it will return NULL. Which will then cause a
NULL-pointer dereference. Handle the returned value properly.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484549092-11349-1-git-send-email-arvind.yadav.cs@gmail.com
[ Boris: massage commit message and improve error message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-16 11:09:22 +01:00
Ben Dooks 628ea92f09 EDAC: Make dev_attr_sdram_scrub_rate static
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:

  drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
  'dev_attr_sdram_scrub_rate' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-06 15:54:16 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Mauro Carvalho Chehab 66c222a02f edac: fix kernel-doc tags at the drivers/edac_*.h
Some kernel-doc tags don't provide good descriptions or use
a different style. Adjust them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:58:10 -02:00
Mauro Carvalho Chehab 6634fbb6b6 driver-api: create an edac.rst file with EDAC documentation
Currently, there's no device driver documentation for the EDAC
subsystem at the driver-api book. Fill in the blanks for the
structures and functions that misses documentation, uniform
the word on the existing ones, and add a new edac.rst file at
driver-api, in order to document the EDAC subsystem.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab e01aa14cf2 edac: move documentation from edac_mc.c to edac_core.h
Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab fdaf0b3505 edac: move documentation from edac_pci*.c to edac_pci.h
Several functions are documented at edac_pci.c and edac_pci_sysfs.c.

As we'll be including edac_pci.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 5336f75499 edac: move documentation from edac_device to edac_core.h
Several functions are documented at edac_device.c.

As we'll be including edac_core.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 78d88e8a3d edac: rename edac_core.h to edac_mc.h
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 6d8ef24724 edac: move EDAC device definitions to drivers/edac/edac_device.h
The edac_core.h header contain data structures and function
definitions for both EDAC MC and EDAC device.

Let's move the devices ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 0b892c7177 edac: move EDAC PCI definitions to drivers/edac/edac_pci.h
The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.

Let's move the PCI ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:50 -02:00
Mauro Carvalho Chehab 7230617537 edac: edac_core.h: remove prototype for edac_pci_reset_delay_period()
This function doesn't exist. So, remove its prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Mauro Carvalho Chehab a2c223b5ed edac: edac_core.h: get rid of unused kobj_complete
This element of struct edac_pci_ctl_info is never used. So,
get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Pan Bian 0de2788447 EDAC, amd64: Fix improper return value
When the call to zalloc_cpumask_var() fails, returning "false" seems
improper. The real value of macro "false" is 0, and 0 means no error.
Return -ENOMEM instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071

Signed-off-by: Pan Bian <bianpan2016@163.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-04 10:51:42 +01:00
Borislav Petkov 5246c54007 EDAC, amd64: Improve amd64-specific printing macros
Prefix the warn and error macros with the respective string so that
callers don't have to say "Error" or "Warning". We save us string length
this way in the actual calls.

While at it, shorten the calls in reserve_mc_sibling_devs().

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2016-12-01 11:35:07 +01:00
Yazen Ghannam 95d3af6bd1 EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems
Add Fam17h to the list of families to autoload amd64_edac_mod.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:49 +01:00
Yazen Ghannam 713ad54675 EDAC, amd64: Define and register UMC error decode function
How we need to decode UMC errors is different from how we decode bus
errors, so let's define a new function for this. We also need a way to
determine the UMC channel since we're not guaranteed that there is a
fixed relation between channel and MCA bank.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
[ Fold in decode_synd_reg(), simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:48 +01:00
Yazen Ghannam d27f3a348e EDAC, amd64: Determine EDAC capabilities on Fam17h systems
We need to determine the EDAC capabilities from all UMCs on the node. We
should only check UMCs that are enabled and make sure they all agree.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:47 +01:00
Yazen Ghannam 2d09d8f301 EDAC, amd64: Determine EDAC MC capabilities on Fam17h
The UMCs on Fam17h are independent memory controllers so we need to
read the capabilities from all UMCs and make sure they agree. Once
we determine what capabilities are available we should save them for
convenience.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com
[ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:04:54 +01:00
Yazen Ghannam 07ed82ef93 EDAC, amd64: Add Fam17h debug output
Read a few more UMC registers and provide debug output in order to be as
similar as possible to older AMD systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded K8 check and comments, fixup others. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 17:16:09 +01:00
Yazen Ghannam 8051c0af3c EDAC, amd64: Add Fam17h scrubber support
Fam17h has new register offsets and fields for setting up the DRAM
scrubber so add support for this.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam a6c14dce85 EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4
MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank
since Fam15h except for Fam15h, bank 4 in which case it's defined as
part of the McaStatSubCache bitfield.

Filter out that case.

Reported-by: Dean Liberty <Dean.Liberty@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com
[ Split an almost unparseable ternary conditional, add a comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam b64ce7cd7f EDAC, amd64: Read MC registers on AMD Fam17h
Fam17h has a different set of registers and bitfields. Most of these
registers are read through SMN (System Management Network) rather
than PCI config space. Also, the derivation of various values is now
different.

Update amd64_edac to read the appropriate registers and extract the
correct values for Fam17h.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
[ Save us the indentation level in read_mc_regs(), add defines ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:11 +01:00
Yazen Ghannam 936fc3afaa EDAC, amd64: Reserve correct PCI devices on AMD Fam17h
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
systems. Update struct amd64_pvt to hold the new functions and reserve
them if on Fam17h.

Also, allocate an array of UMC structs within our newly allocated PVT
struct.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
[ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:49:40 +01:00
Yazen Ghannam f1cbbec9fc EDAC, amd64: Add AMD Fam17h family type and ops
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:24:09 +01:00
Yazen Ghannam 196b79fcc8 EDAC, amd64: Extend ecc_enabled() to Fam17h
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:07:03 +01:00
Borislav Petkov 627bc29ed9 Merge tip:ras/core to pick up dependent changes
tip:ras/core contains the respective Fam17h x86 RAS bits which
amd64_edac is going to use. So merge it into the EDAC branch.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 21:13:40 +01:00
Yazen Ghannam 044e7a414b EDAC, amd64: Don't force-enable ECC checking on newer systems
It's not recommended for the OS to try and force-enable ECC checking.
This is considered a firmware task since it includes memory training,
etc, so don't change ECC settings on Fam17h or newer systems and inform
the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com
[ Put the "forcing" message in an else branch. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 19:04:11 +01:00
Yazen Ghannam d12a969ebb EDAC, amd64: Add Deferred Error type
Currently, deferred errors are classified as correctable in EDAC. Add a
new error type for deferred errors so that they are correctly reported
to the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:57:19 +01:00
Yazen Ghannam e70984d9eb EDAC, amd64: Rename __log_bus_error() to be more specific
We only use __log_bus_error() to log DRAM ECC errors, so let's change
the name to reflect this. We'll also use this function for DRAM ECC
errors on Fam17h, but we'll call it from a different function than
decode_bus_error().

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:42:55 +01:00
Yazen Ghannam e7934b70d7 EDAC, amd64: Change target of pci_name from F2 to F3
AMD Fam17h will not be using PCI function 2 for EDAC, but will continue
to use function 3. So let's get the name of F3 instead of F2 to support
Fam17h and previous families.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:24:20 +01:00
Yazen Ghannam 5c332202f8 EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder
nb_bus_decoder() is only used for DRAM ECC errors so rename it so that
the name is more generic and descriptive.

Also, call it for DRAM ECC errors on SMCA systems.

[ Boris: rename it to real function name with a verb in it. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 09:43:15 +01:00
Yanjiang Jin 27bda205ba EDAC, mpc85xx: Implement remove method for the platform driver
If we execute the below steps without this patch:

  modprobe mpc85xx_edac [The first insmod, everything is well.]
  modprobe -r mpc85xx_edac
  modprobe mpc85xx_edac [insmod again, error happens.]

We would get the error messages as below:

  BUG: recent printk recursion!
  Oops: Kernel access of bad area, sig: 11 [#48]
  Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac]
  CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2
   .vsnprintf
   .vscnprintf
   .vprintk_emit
   .printk
   .edac_pci_add_device
   .mpc85xx_pci_err_probe
   .platform_drv_probe
   .driver_probe_device
   .__driver_attach
   .bus_for_each_dev
   .driver_attach
   .bus_add_driver
   .driver_register
   .__platform_register_drivers
   .mpc85xx_mc_init
   .do_one_initcall
   .do_init_module
   .load_module
   .SyS_finit_module
   system_call

Address this by cleaning up properly when removing the platform driver.

Tested on a T4240QDS board.

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: york.sun@nxp.com
Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-17 11:19:50 +01:00
Colin Ian King 8176170e03 EDAC, xgene: Fix spelling mistake in error messages
Trivial fix to spelling mistake "Mutilple" to "Multiple" in error
messages.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161114231104.5585-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-15 11:10:02 +01:00
Borislav Petkov c73e8833be EDAC, mc: Fix locking around mc_devices list
When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.

Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.

Reported-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-14 13:26:11 +01:00
Borislav Petkov c09a8c40e0 x86/RAS: Hide SMCA bank names
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:15 +01:00
Borislav Petkov a9a1c0ee04 x86/RAS: Rename smca_bank_names to smca_names
Make it differ more from struct smca_bank_name for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov 1ce9cd7f9f x86/RAS: Simplify SMCA HWID descriptor struct
Call it simply smca_hwid and call local variables "hwid". More readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Thor Thayer 90e493d7d5 EDAC, altera: Disable IRQs while injecting SDRAM errors
Disable IRQs while injecting SDRAM errors. The RT patches exposed
a spinlock deadlock where the spinlock taken for the regmap write
deadlocked with the IRQ clear regmap write.

Error injection is not normally enabled for ECC but only for testing.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1476906827-9412-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:14:03 +02:00
Wei Yongjun 240ea9214a EDAC, skx_edac: Fix non static symbol warnings
Fix the following sparse warnings:

  drivers/edac/skx_edac.c:266:25: warning:
   symbol 'skx_cpuids' was not declared. Should it be static?
  drivers/edac/skx_edac.c:1040:12: warning:
   symbol 'skx_init' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477147098-2842-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:10:38 +02:00
Piotr Luc 9a9260ca92 EDAC, sb_edac: Add Knights Mill support
Add Knights Mill (KNM) to the list of CPU models supported by sb_edac.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:37:44 +02:00
Dave Hansen 20f4d69243 EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them
We now have symbolic names for a bunch of Intel CPU models via
asm/intel-family.h. The original conversion missed the EDAC drivers.
Convert them.

Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com
[ Remove comment, macro name is descriptive enough. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:32:40 +02:00
Linus Torvalds 19fe416532 * Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
buffers (Thor Thayer)
 
 * Split the memory controller part out of mpc85xx and share it with a
 * new Freescale ARM Layerscape driver (York Sun)
 
 * amd64_edac fixes (Yazen Ghannam)
 
 * Misc cleanups, refactoring and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30
 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb
 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U
 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea
 t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1
 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a
 iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm
 Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq
 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus
 WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq
 qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6
 dmDAtQbKUUwiltB8QzQd
 =pcI8
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "A lot of movement in the EDAC tree this time around, coarse summary
  below:

   - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
     buffers (Thor Thayer)

   - split the memory controller part out of mpc85xx and share it with a
     new Freescale ARM Layerscape driver (York Sun)

   - amd64_edac fixes (Yazen Ghannam)

   - misc cleanups, refactoring and fixes all over the place"

* tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits)
  EDAC, altera: Add IRQ Flags to disable IRQ while handling
  EDAC, altera: Correct EDAC IRQ error message
  EDAC, amd64: Autoload module using x86_cpu_id
  EDAC, sb_edac: Remove NULL pointer check on array pci_tad
  EDAC: Remove NO_IRQ from powerpc-only drivers
  EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
  EDAC, fsl_ddr: Add entry to MAINTAINERS
  EDAC: Move Doug Thompson to CREDITS
  EDAC, I3000: Orphan driver
  EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
  EDAC, layerscape: Add Layerscape EDAC support
  EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
  EDAC, fsl_ddr: Add support for little endian
  EDAC, fsl_ddr: Add missing DDR DRAM types
  EDAC, fsl_ddr: Rename macros and names
  EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
  EDAC, mpc85xx: Replace printk() with pr_* format
  EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
  EDAC, altera: Rename MC trigger to common name
  EDAC, altera: Rename device trigger to common name
  ...
2016-10-04 12:06:26 -07:00
Thor Thayer a29d64a45e EDAC, altera: Add IRQ Flags to disable IRQ while handling
Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ
while executing the IRQ handler. Remove the IRQF_SHARED because these
are not shared IRQs in the domain. Exposed when flooding IRQs.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 12:03:34 +02:00
Thor Thayer 3763569f4c EDAC, altera: Correct EDAC IRQ error message
Correct the error message sent out in the case of a single bit error IRQ
allocation.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 11:52:39 +02:00
Yazen Ghannam d6efab74f6 EDAC, amd64: Autoload module using x86_cpu_id
Reinstate driver autoloading now that PCI dependency is gone.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-21 12:48:15 +02:00
Yazen Ghannam a884675b87 x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly
Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.

Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.

[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Aravind Gopalakrishnan  <aravindksg.lkml@gmail.com>
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:14 +02:00
Yazen Ghannam 4b711f92c9 x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems
The MCA_SYND and MCA_IPID registers contain valuable information and
should be included in MCE output. The MCA_SYND register contains
syndrome and other error information, and the MCA_IPID register will
uniquely identify the MCA bank's type without having to rely on system
software.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:13 +02:00
Yazen Ghannam 5896820e0a x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types
Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.

Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.

[ bp: Align comments, simplify a bit. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:10 +02:00
Yazen Ghannam 856095b179 EDAC/mce_amd: Use SMCA prefix for error descriptions arrays
The error descriptions defined for Fam17h can be reused for other SMCA
systems, so their names should reflect this.

Change f17h prefix to smca for error descriptions.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00
Yazen Ghannam c019b951e1 EDAC/mce_amd: Add missing SMCA error descriptions
Add missing SMCA error descriptions to the error descriptions arrays.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00