OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Qiuxu Zhuo	d14e3a201f	EDAC, sb_edac: Bump driver version and do some cleanups Collapse 'case:' in *_mci_bind_devs() and update driver version from 1.1.1 to 1.1.2. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 15:00:36 +02:00
Qiuxu Zhuo	4d475dde79	EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present This is based on previous work by Patrick Geary, see Link. Additional cleanups ontop: - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro, now that TA0 and TA1 are unified. - Remove get_pdev_same_bus(), since in get_dimm_config() the variable "pvt->pci_ta" for KNL is also ready, we can simply use pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read MCMTR. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com [ Make __populate_dimms() return int. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:57:52 +02:00
Qiuxu Zhuo	3286d3eb90	EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4 We don't need this quirk anymore now that the EDAC memory controller representation matches the hardware. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com [ Commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:40:40 +02:00
Borislav Petkov	6696522957	EDAC, sb_edac: Carve out dimm-populating loop ... to slim down get_dimm_config(). No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:34 +02:00
Borislav Petkov	199389acd9	EDAC, sb_edac: Fix mod_name It is called "sb_edac.c" now. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:33 +02:00
Qiuxu Zhuo	e2f747b1f4	EDAC, sb_edac: Assign EDAC memory controller per h/w controller Tony pointed out: "currently the driver pretends there is one big 8-channel memory controller per socket instead of 2 4-channel controllers. This is fine with all memory controller populated with symmetrical DIMM configurations, but runs into difficulties on asymmetrical setups". Restructure the driver to assign an EDAC memory controller to each real h/w memory controller to resolve the issue. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com [ Break some lines at convenient points. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:21 +02:00
Tony Luck	7fd562b75d	EDAC, sb_edac: Don't use "Socket#" in the memory controller name EDAC assigns logical memory controller numbers in the order that we find memory controllers, which depends on which PCI bus they are on. Some systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on socket3. All this is made more confusing for users because we use the string "Socket" while generating names for memory controllers, but the number that we attach there is the memory controller number. E.g. EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT) Change the names to say "SrcID#%d" (where the number we use is read from the h/w associated with the memory controller instead of some logical number internal to the EDAC driver). New message: EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT) Reported-by: Andrey Korolyov <andrey@xdel.ru> Reported-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:47:11 +02:00
Qiuxu Zhuo	00cf50d90a	EDAC, sb_edac: Classify PCI-IDs by topology Each of the PCI device IDs belongs to a CPU socket, or to one of the integrated memory controllers. Provide an enum to specify the domain of each, and distinguish the resource number in each domain: the number of the PCI device IDs per integrated memory controller/socket, and the number of integrated memory controllers per socket. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com [ Realign pci_dev_descr_knl members. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:19:25 +02:00
Tobias Klauser	18caec20bf	EDAC, altera: Constify irq_domain_ops struct irq_domain_ops is not modified, so it can be made const. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.ch Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-24 15:46:25 +02:00
Yazen Ghannam	eb77e6b80f	EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h The wrong index into the csbases/csmasks arrays was being passed to the function to compute the chip select sizes, which resulted in the wrong size being computed. Address that so that the correct values are computed and printed. Also, redo how we calculate the number of pages in a CS row. Reported-by: Benjamin Bennett <benbennett@gmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@vger.kernel.org> # 4.10.x Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded integer math comment, minor cleanups. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-03 16:27:36 +02:00
Borislav Petkov	f8d5549df2	EDAC, ghes: Do not enable it by default Leave it to the user to decide whether to enable this or not. Otherwise, platform-specific drivers won't initialize (currently, EDAC supports only a single platform driver loaded). Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-27 14:15:38 +02:00
Borislav Petkov	bffc7dece9	EDAC: Rename report status accessors Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:15:02 +02:00
Borislav Petkov	fee27d7d97	EDAC: Delete edac_stub.c Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:48 +02:00
Borislav Petkov	a06b85ff07	EDAC: Update Kconfig help text Remove the old URLs. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:44 +02:00
Borislav Petkov	e3c4ff6d8c	EDAC: Remove EDAC_MM_EDAC Move all the EDAC core functionality behind CONFIG_EDAC and get rid of that indirection. Update defconfigs which had it. While at it, fix dependencies such that EDAC depends on RAS for the tracepoints. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-edac@vger.kernel.org	2017-04-10 17:14:41 +02:00
Borislav Petkov	be1d162948	EDAC: Issue tracepoint only when it is defined ... and this happens only when CONFIG_RAS is enabled. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:38 +02:00
Borislav Petkov	8c22b4fece	EDAC: Move edac_op_state to edac_mc.c ... as part of moving stuff away from edac_stub.c Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:29 +02:00
Borislav Petkov	d3116a0837	EDAC: Remove edac_err_assert ... and the glue around it. It is not needed anymore. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:21 +02:00
Borislav Petkov	97bb6c17ad	EDAC: Get rid of edac_handlers Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:14:17 +02:00
Borislav Petkov	db47d5f856	x86/nmi, EDAC: Get rid of DRAM error reporting thru PCI SERR NMI Apparently, some machines used to report DRAM errors through a PCI SERR NMI. This is why we have a call into EDAC in the NMI handler. See `c0d1217202` ("drivers/edac: add new nmi rescan"). From looking at the patch above, that's two drivers: e752x_edac.c and e7xxx_edac.c. Now, I wanna say those are old machines which are probably decommissioned already. Tony says that "[t]the newest CPU supported by either of those drivers is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some folks are still using these ... but people that hold onto h/w for 7 years generally cling to old s/w too ... so I'd guess it unlikely that we will get complaints for breaking these in upstream." So even if there is a small number still in use, we did load EDAC with edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact) which means a default EDAC setup without any parameters supplied on the command line or otherwise would never even log the error in the NMI handler because we're polling by default: inline int edac_handler_set(void) { if (edac_op_state == EDAC_OPSTATE_POLL) return 0; return atomic_read(&edac_handlers); } So, long story short, I'd like to get rid of that nastiness called edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If we ever have to do stuff like that again, it should be notifiers we're using and not some insanity like this one. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com>	2017-04-10 17:13:48 +02:00
Borislav Petkov	76f6a26ce9	EDAC, highbank: Align Makefile directives ... like the rest of the file. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:10:43 +02:00
Sergey Temerkhanov	5195c206fd	EDAC, thunderx: Remove unused code Remove unused code reserved for upcoming CPUs. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-07 11:49:32 +02:00
Sergey Temerkhanov	3d2d8c0f84	EDAC, thunderx: Change LMC index calculation Shift the node number by 3 bits instead of 8 allowing proper functioning with default EDAC_MAX_MCS. Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-07 11:47:44 +02:00
Thor Thayer	25b223ddfe	EDAC, altera: Fix peripheral warnings for Cyclone5 The peripherals' RAS functionality only exist on the Arria10 SoCFPGA. The Cyclone5 initialization generates EDAC warnings when the peripherals aren't found in the device tree. Fix by checking for Arria10 in the init functions. Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-06 11:42:38 +02:00
Jan Glauber	621c4fe3cc	EDAC, thunderx: Fix L2C MCI interrupt disable Fix a typo that disabled the MCI interrupts using the wrong bitmask. Signed-off-by: Jan Glauber <jglauber@cavium.com> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Sergey Temerkhanov <s.temerkhanov@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-05 14:46:05 +02:00
Sergey Temerkhanov	41003396f9	EDAC, thunderx: Add Cavium ThunderX EDAC driver Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely the DRAM controller (LMC), cache coherent processor interconnect (CCPI) and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC) Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David.Daney@cavium.com Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-27 11:43:56 +02:00
Qiuxu Zhuo	819f60fb7d	EDAC, pnd2_edac: Fix reported DIMM number DIMM number passed to edac_mc_handle_error() was accidentally hardcoded to zero. Pass in the correct daddr->dimm value. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-26 09:36:28 +02:00
Borislav Petkov	cd1be315ac	EDAC, pnd2_edac: Fix !EDAC_DEBUG build Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we don't fail the build: drivers/edac/pnd2_edac.c: In function ‘pnd2_init’: drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration] setup_pnd2_debug(); ^ drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’: drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration] teardown_pnd2_debug(); ^ Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-23 12:56:23 +01:00
Borislav Petkov	1c5bf78114	EDAC: Select DEBUG_FS The debugfs.c functionality relies on DEBUG_FS so select it. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-23 12:56:09 +01:00
Tony Luck	5c71ad17f9	EDAC, pnd2_edac: Add new EDAC driver for Intel SoC platforms Initial target for this driver is the Intel Apollo Lake platform and Denverton micro-server, they use the same internal memory controller IP called Pondicherry2. Memory controller registers are not in PCI config space like earlier Intel memory controllers. For Apollo Lake platform they are accessed via a "side-band" interface, for Denverton micro-server they are access via PCI config space and memory map I/O. This driver is for Apollo Lake and Denverton, but only the Denverton is fully enabled while we wait for the sideband driver. Apollo lake driver and initial cut at Denverton driver by Tony Luck. Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-16 12:40:52 +01:00
Jérémy Lefaure	e61555c29c	EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used as if it returned a boolean true if the width if 8. Fix the tests where MTR_DRAM_WIDTH is misused. Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.fr Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-09 09:25:29 +01:00
Colin Ian King	4bd035eae2	EDAC, xgene: Fix wrongly spelled "procesing" Fix spelling mistake in dev_err message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Loc Ho <lho@apm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-03-06 17:08:19 +01:00
Linus Torvalds	60c906bab1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y	2017-02-20 12:47:44 -08:00
Yazen Ghannam	75bf2f6478	EDAC, mce_amd: Print IPID and Syndrome on a separate line Currently, the IPID and Syndrome are printed on the same line as the Address. There are cases when we can have a valid Syndrome but not a valid Address. For example, the MCA_SYND register can be used to hold more detailed error info that the hardware folks can use. It's not just DRAM ECC syndromes. There are some error types that aren't related to memory that may have valid syndromes, like some errors related to links in the Data Fabric, etc. In these cases, the IPID and Syndrome are not printed at the same log level as the rest of the stanza, so users won't see them on the console. Console: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over\|CE\|MiscV\|-\|-\|-\|-\|SyndV\|-]: 0xd82000000002080b [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Dmesg: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over\|CE\|MiscV\|-\|-\|-\|-\|SyndV\|-]: 0xd82000000002080b , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Print the IPID first and on a new line. The IPID should always be printed on SMCA systems. The Syndrome will then be printed with the IPID and at the same log level when valid: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over\|CE\|MiscV\|-\|-\|-\|-\|SyndV\|-]: 0xd82000000002080b [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-02-16 15:39:32 +01:00
Borislav Petkov	e62d2ca9d0	EDAC, amd64: Bump driver version Last time we did that was when we enabled Bulldozer. Now, we enabled Zen so it is only natural ... :-) Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>	2017-02-14 11:58:05 +01:00
Wei Yongjun	279fa58035	EDAC, fsl_ddr: Make locally used symbols static Fix the following sparse warnings: drivers/edac/fsl_ddr_edac.c:148:1: warning: symbol 'dev_attr_inject_data_hi' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:150:1: warning: symbol 'dev_attr_inject_data_lo' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:152:1: warning: symbol 'dev_attr_inject_ctrl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-02-09 17:40:54 +01:00
Chris Packham	321d17c19b	EDAC, mpc85xx: Add T2080 l2-cache support The L2 cache controller on the T2080 SoC has similar capabilities to the others already supported by the mpc85xx_edac driver. Add it to the list of compatible devices. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Johannes Thumshirn <jth@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Borislav Petkov <bp@suse.de>	2017-02-03 10:36:35 +01:00
Yazen Ghannam	1bd9900b83	EDAC, amd64: Add x86cpuid sanity check during init Match one of the devices in amd64_cpuids[] before loading the module. This is an additional sanity check against users trying to load amd64_edac_mod on unsupported systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com [ Get rid of err_ret label, make it a bit more readable this way. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 14:43:06 +01:00
Yazen Ghannam	4688c9b42d	EDAC, amd64: Don't treat ECC disabled as failure Having ECC disabled on a node doesn't necessarily mean that it's disabled for the entire system. So let's return a non-failing code when ECC is disabled on a node. This way we can skip initialization for the node but still continue with the remaining nodes. After probing all instances, make sure we have at least one MC device allocated. This issue is seen and fix tested on Fam15h and Fam17h MCM systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 14:38:49 +01:00
Yazen Ghannam	d7fc9d77ac	EDAC: Add routine to check if MC devices list is empty We need to know if any MC devices have been allocated. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 14:36:47 +01:00
Yazen Ghannam	df64636fa4	EDAC, amd64: Remove unused printing macros amd64_{debug,notice} don't have any users, so remove them. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 13:20:06 +01:00
Yazen Ghannam	11ab1cae58	EDAC, amd64: Rework messages in ecc_enabled() Print the node number when informing that DRAM ECC is disabled so that we can show which nodes have DRAM ECC disabled. Also, print more detailed system information as edac_dbg(), so as to not bother general users. Switch amd64_notice to amd64_info to match the message above it. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 13:18:33 +01:00
Yazen Ghannam	234365f56e	EDAC, amd64: Move global code out of instance functions We have a few functions that register/unregister an ECC error decoding routine. These functions are called when we init/remove instances. However, they are global and so don't need to be registered/unregistered multiple times. So move them out of the init/remove instance functions and into the module init/exit routines. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 13:08:10 +01:00
Yazen Ghannam	2b9b2c4659	EDAC, amd64: Free unused memory when init_one_instance() fails Jump to memory freeing routines when init_one_instance() fails. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 13:03:40 +01:00
Yazen Ghannam	67d7fd306e	EDAC, mce_amd: Give more context to deferred error message Users may not be familiar with the concept of deferred errors. There is no action for users to take on this type of error, so give more context in the error message to make this more clear. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-28 13:03:29 +01:00
Borislav Petkov	58fb24cb95	EDAC, i7300: Test for the second channel properly REDMEMB[17] is the ECC_Locator bit, which, when set, identifies the CS[3:2] as the simbols in error. And thus the second channel. The macro computing it was wrong so get rid of it (it was used at one place only) and get rid of the conditional too. Generates better code this way anyway. Signed-off-by: Borislav Petkov <bp@suse.de> Reported-by: David Binderman <dcb314@hotmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2017-01-26 11:35:23 +01:00
Borislav Petkov	9026cc82b6	x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:57 +01:00
Borislav Petkov	0bceab677d	EDAC/mce/amd: Dump TSC value Dump the TSC value of the time when the MCE got logged. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:56 +01:00
Borislav Petkov	1fbcd90903	EDAC/mce/amd: Unexport amd_decode_mce() It is not used outside of the driver anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:55 +01:00
Nicolas Iooss	127c1225bf	EDAC, sb_edac: Get rid of ->show_interleave_mode() Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-23 11:39:48 +01:00
Aaron Miller	4fb6fde74d	EDAC: Expose per-DIMM error counts in sysfs The old csrowX sysfs directories have per-csrow error counters, but the new dimmX directories do not currently expose error counts. EDAC already keeps these counts, add them to sysfs so per-DIMM counts are still available when CONFIG_EDAC_LEGACY_SYSFS=n. Signed-off-by: Aaron Miller <aaronmiller@fb.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-19 10:29:40 +01:00
Yazen Ghannam	2287c63643	EDAC, amd64: Save and return err code from probe_one_instance() We should save the return code from probe_one_instance() so that it can be returned from the module init function. Otherwise, we'll be returning the -ENOMEM from above. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-16 11:53:39 +01:00
Arvind Yadav	f5c61277f6	EDAC, i82975x: Add ioremap_nocache() error handling If ioremap_nocache() fails, it will return NULL. Which will then cause a NULL-pointer dereference. Handle the returned value properly. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484549092-11349-1-git-send-email-arvind.yadav.cs@gmail.com [ Boris: massage commit message and improve error message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-16 11:09:22 +01:00
Ben Dooks	628ea92f09	EDAC: Make dev_attr_sdram_scrub_rate static The dev_attr_sdram_scrub_rate is not declared in a header or used anywhere else, so make it static to fix the following warning: drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol 'dev_attr_sdram_scrub_rate' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-06 15:54:16 +01:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Mauro Carvalho Chehab	66c222a02f	edac: fix kernel-doc tags at the drivers/edac_*.h Some kernel-doc tags don't provide good descriptions or use a different style. Adjust them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:58:10 -02:00
Mauro Carvalho Chehab	6634fbb6b6	driver-api: create an edac.rst file with EDAC documentation Currently, there's no device driver documentation for the EDAC subsystem at the driver-api book. Fill in the blanks for the structures and functions that misses documentation, uniform the word on the existing ones, and add a new edac.rst file at driver-api, in order to document the EDAC subsystem. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab	e01aa14cf2	edac: move documentation from edac_mc.c to edac_core.h Several functions are documented at edac_mc.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab	fdaf0b3505	edac: move documentation from edac_pci*.c to edac_pci.h Several functions are documented at edac_pci.c and edac_pci_sysfs.c. As we'll be including edac_pci.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. As several of those kernel-doc macros are not in the right format, fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab	5336f75499	edac: move documentation from edac_device to edac_core.h Several functions are documented at edac_device.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. As several of those kernel-doc macros are not in the right format, fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab	78d88e8a3d	edac: rename edac_core.h to edac_mc.h Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab	6d8ef24724	edac: move EDAC device definitions to drivers/edac/edac_device.h The edac_core.h header contain data structures and function definitions for both EDAC MC and EDAC device. Let's move the devices ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab	0b892c7177	edac: move EDAC PCI definitions to drivers/edac/edac_pci.h The edac_core.h header contain data structures and function definitions for the 3 parts of EDAC: MC, PCI and device. Let's move the PCI ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:50 -02:00
Mauro Carvalho Chehab	7230617537	edac: edac_core.h: remove prototype for edac_pci_reset_delay_period() This function doesn't exist. So, remove its prototype. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:49 -02:00
Mauro Carvalho Chehab	a2c223b5ed	edac: edac_core.h: get rid of unused kobj_complete This element of struct edac_pci_ctl_info is never used. So, get rid of it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:49 -02:00
Pan Bian	0de2788447	EDAC, amd64: Fix improper return value When the call to zalloc_cpumask_var() fails, returning "false" seems improper. The real value of macro "false" is 0, and 0 means no error. Return -ENOMEM instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071 Signed-off-by: Pan Bian <bianpan2016@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-12-04 10:51:42 +01:00
Borislav Petkov	5246c54007	EDAC, amd64: Improve amd64-specific printing macros Prefix the warn and error macros with the respective string so that callers don't have to say "Error" or "Warning". We save us string length this way in the actual calls. While at it, shorten the calls in reserve_mc_sibling_devs(). Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>	2016-12-01 11:35:07 +01:00
Yazen Ghannam	95d3af6bd1	EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems Add Fam17h to the list of families to autoload amd64_edac_mod. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-29 18:05:49 +01:00
Yazen Ghannam	713ad54675	EDAC, amd64: Define and register UMC error decode function How we need to decode UMC errors is different from how we decode bus errors, so let's define a new function for this. We also need a way to determine the UMC channel since we're not guaranteed that there is a fixed relation between channel and MCA bank. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com [ Fold in decode_synd_reg(), simplify. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-29 18:05:48 +01:00
Yazen Ghannam	d27f3a348e	EDAC, amd64: Determine EDAC capabilities on Fam17h systems We need to determine the EDAC capabilities from all UMCs on the node. We should only check UMCs that are enabled and make sure they all agree. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-29 18:05:47 +01:00
Yazen Ghannam	2d09d8f301	EDAC, amd64: Determine EDAC MC capabilities on Fam17h The UMCs on Fam17h are independent memory controllers so we need to read the capabilities from all UMCs and make sure they agree. Once we determine what capabilities are available we should save them for convenience. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com [ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-29 18:04:54 +01:00
Yazen Ghannam	07ed82ef93	EDAC, amd64: Add Fam17h debug output Read a few more UMC registers and provide debug output in order to be as similar as possible to older AMD systems. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded K8 check and comments, fixup others. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-29 17:16:09 +01:00
Yazen Ghannam	8051c0af3c	EDAC, amd64: Add Fam17h scrubber support Fam17h has new register offsets and fields for setting up the DRAM scrubber so add support for this. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-28 17:50:12 +01:00
Yazen Ghannam	a6c14dce85	EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4 MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank since Fam15h except for Fam15h, bank 4 in which case it's defined as part of the McaStatSubCache bitfield. Filter out that case. Reported-by: Dean Liberty <Dean.Liberty@amd.com> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com [ Split an almost unparseable ternary conditional, add a comment. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-28 17:50:12 +01:00
Yazen Ghannam	b64ce7cd7f	EDAC, amd64: Read MC registers on AMD Fam17h Fam17h has a different set of registers and bitfields. Most of these registers are read through SMN (System Management Network) rather than PCI config space. Also, the derivation of various values is now different. Update amd64_edac to read the appropriate registers and extract the correct values for Fam17h. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com [ Save us the indentation level in read_mc_regs(), add defines ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-28 17:50:11 +01:00
Yazen Ghannam	936fc3afaa	EDAC, amd64: Reserve correct PCI devices on AMD Fam17h Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older systems. Update struct amd64_pvt to hold the new functions and reserve them if on Fam17h. Also, allocate an array of UMC structs within our newly allocated PVT struct. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com [ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-28 17:49:40 +01:00
Yazen Ghannam	f1cbbec9fc	EDAC, amd64: Add AMD Fam17h family type and ops Add a family type and associated ops for Fam17h. Define a struct to hold all the UMC registers that we need. Make this a part of struct amd64_pvt in order to maximize code reuse in the rest of the driver. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-24 21:24:09 +01:00
Yazen Ghannam	196b79fcc8	EDAC, amd64: Extend ecc_enabled() to Fam17h Update the ecc_enabled() function to work on Fam17h. This entails reading a different set of registers and using the SMN (System Management Network) rather than PCI devices. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com [ Fixup ecc_en assignment and get_umc_base(). ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-24 21:07:03 +01:00
Borislav Petkov	627bc29ed9	Merge tip:ras/core to pick up dependent changes tip:ras/core contains the respective Fam17h x86 RAS bits which amd64_edac is going to use. So merge it into the EDAC branch. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-23 21:13:40 +01:00
Yazen Ghannam	044e7a414b	EDAC, amd64: Don't force-enable ECC checking on newer systems It's not recommended for the OS to try and force-enable ECC checking. This is considered a firmware task since it includes memory training, etc, so don't change ECC settings on Fam17h or newer systems and inform the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com [ Put the "forcing" message in an else branch. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-23 19:04:11 +01:00
Yazen Ghannam	d12a969ebb	EDAC, amd64: Add Deferred Error type Currently, deferred errors are classified as correctable in EDAC. Add a new error type for deferred errors so that they are correctly reported to the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-21 10:57:19 +01:00
Yazen Ghannam	e70984d9eb	EDAC, amd64: Rename __log_bus_error() to be more specific We only use __log_bus_error() to log DRAM ECC errors, so let's change the name to reflect this. We'll also use this function for DRAM ECC errors on Fam17h, but we'll call it from a different function than decode_bus_error(). Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-21 10:42:55 +01:00
Yazen Ghannam	e7934b70d7	EDAC, amd64: Change target of pci_name from F2 to F3 AMD Fam17h will not be using PCI function 2 for EDAC, but will continue to use function 3. So let's get the name of F3 instead of F2 to support Fam17h and previous families. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-21 10:24:20 +01:00
Yazen Ghannam	5c332202f8	EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder nb_bus_decoder() is only used for DRAM ECC errors so rename it so that the name is more generic and descriptive. Also, call it for DRAM ECC errors on SMCA systems. [ Boris: rename it to real function name with a verb in it. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-21 09:43:15 +01:00
Yanjiang Jin	27bda205ba	EDAC, mpc85xx: Implement remove method for the platform driver If we execute the below steps without this patch: modprobe mpc85xx_edac [The first insmod, everything is well.] modprobe -r mpc85xx_edac modprobe mpc85xx_edac [insmod again, error happens.] We would get the error messages as below: BUG: recent printk recursion! Oops: Kernel access of bad area, sig: 11 [#48] Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac] CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2 .vsnprintf .vscnprintf .vprintk_emit .printk .edac_pci_add_device .mpc85xx_pci_err_probe .platform_drv_probe .driver_probe_device .__driver_attach .bus_for_each_dev .driver_attach .bus_add_driver .driver_register .__platform_register_drivers .mpc85xx_mc_init .do_one_initcall .do_init_module .load_module .SyS_finit_module system_call Address this by cleaning up properly when removing the platform driver. Tested on a T4240QDS board. Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: york.sun@nxp.com Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-17 11:19:50 +01:00
Colin Ian King	8176170e03	EDAC, xgene: Fix spelling mistake in error messages Trivial fix to spelling mistake "Mutilple" to "Multiple" in error messages. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Loc Ho <lho@apm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161114231104.5585-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-15 11:10:02 +01:00
Borislav Petkov	c73e8833be	EDAC, mc: Fix locking around mc_devices list When accessing the mc_devices list of memory controller descriptors, we need to hold mem_ctls_mutex. This was not always the case, fix that. Make all external callers call a version which grabs the mutex since the last is local to edac_mc.c. Reported-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-11-14 13:26:11 +01:00
Borislav Petkov	c09a8c40e0	x86/RAS: Hide SMCA bank names Add accessor functions and hide the smca_names array. Also, add a sanity-check to bank HWID assignment in get_smca_bank_info(). Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:15 +01:00
Borislav Petkov	a9a1c0ee04	x86/RAS: Rename smca_bank_names to smca_names Make it differ more from struct smca_bank_name for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:14 +01:00
Borislav Petkov	1ce9cd7f9f	x86/RAS: Simplify SMCA HWID descriptor struct Call it simply smca_hwid and call local variables "hwid". More readable. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-11-08 17:10:14 +01:00
Thor Thayer	90e493d7d5	EDAC, altera: Disable IRQs while injecting SDRAM errors Disable IRQs while injecting SDRAM errors. The RT patches exposed a spinlock deadlock where the spinlock taken for the regmap write deadlocked with the IRQ clear regmap write. Error injection is not normally enabled for ECC but only for testing. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1476906827-9412-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-22 20:14:03 +02:00
Wei Yongjun	240ea9214a	EDAC, skx_edac: Fix non static symbol warnings Fix the following sparse warnings: drivers/edac/skx_edac.c:266:25: warning: symbol 'skx_cpuids' was not declared. Should it be static? drivers/edac/skx_edac.c:1040:12: warning: symbol 'skx_init' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1477147098-2842-1-git-send-email-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-22 20:10:38 +02:00
Piotr Luc	9a9260ca92	EDAC, sb_edac: Add Knights Mill support Add Knights Mill (KNM) to the list of CPU models supported by sb_edac. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:37:44 +02:00
Dave Hansen	20f4d69243	EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them We now have symbolic names for a bunch of Intel CPU models via asm/intel-family.h. The original conversion missed the EDAC drivers. Convert them. Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com [ Remove comment, macro name is descriptive enough. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:32:40 +02:00
Linus Torvalds	19fe416532	* Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) * Split the memory controller part out of mpc85xx and share it with a * new Freescale ARM Layerscape driver (York Sun) * amd64_edac fixes (Yazen Ghannam) * Misc cleanups, refactoring and fixes all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6 dmDAtQbKUUwiltB8QzQd =pcI8 -----END PGP SIGNATURE----- Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "A lot of movement in the EDAC tree this time around, coarse summary below: - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) - split the memory controller part out of mpc85xx and share it with a new Freescale ARM Layerscape driver (York Sun) - amd64_edac fixes (Yazen Ghannam) - misc cleanups, refactoring and fixes all over the place" * tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits) EDAC, altera: Add IRQ Flags to disable IRQ while handling EDAC, altera: Correct EDAC IRQ error message EDAC, amd64: Autoload module using x86_cpu_id EDAC, sb_edac: Remove NULL pointer check on array pci_tad EDAC: Remove NO_IRQ from powerpc-only drivers EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe() EDAC, fsl_ddr: Add entry to MAINTAINERS EDAC: Move Doug Thompson to CREDITS EDAC, I3000: Orphan driver EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul() EDAC, layerscape: Add Layerscape EDAC support EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed EDAC, fsl_ddr: Add support for little endian EDAC, fsl_ddr: Add missing DDR DRAM types EDAC, fsl_ddr: Rename macros and names EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx EDAC, mpc85xx: Replace printk() with pr_* format EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1 EDAC, altera: Rename MC trigger to common name EDAC, altera: Rename device trigger to common name ...	2016-10-04 12:06:26 -07:00
Thor Thayer	a29d64a45e	EDAC, altera: Add IRQ Flags to disable IRQ while handling Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ while executing the IRQ handler. Remove the IRQF_SHARED because these are not shared IRQs in the domain. Exposed when flooding IRQs. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-23 12:03:34 +02:00
Thor Thayer	3763569f4c	EDAC, altera: Correct EDAC IRQ error message Correct the error message sent out in the case of a single bit error IRQ allocation. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1474582419-7053-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-23 11:52:39 +02:00
Yazen Ghannam	d6efab74f6	EDAC, amd64: Autoload module using x86_cpu_id Reinstate driver autoloading now that PCI dependency is gone. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-21 12:48:15 +02:00
Yazen Ghannam	a884675b87	x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly Bank 4 is reserved on family 0x17 and shouldn't generate any MCE records. However, broken hardware and software is not something unheard of so warn about bank 4 errors. They shouldn't be coming from bank 4 naturally but users can still use mce_amd_inj to simulate errors from it for testing purposed. Also, avoid special handling in the injector mce_amd_inj like it is being done on the older families. [ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:14 +02:00
Yazen Ghannam	4b711f92c9	x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems The MCA_SYND and MCA_IPID registers contain valuable information and should be included in MCE output. The MCA_SYND register contains syndrome and other error information, and the MCA_IPID register will uniquely identify the MCA bank's type without having to rely on system software. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:13 +02:00
Yazen Ghannam	5896820e0a	x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types Scalable MCA defines a number of IP types. An MCA bank on an SMCA system is defined as one of these IP types. A bank's type is uniquely identified by the combination of the HWID and MCATYPE values read from its MCA_IPID register. Add the required tables in order to be able to lookup error descriptions based on a bank's type and the error's extended error code. [ bp: Align comments, simplify a bit. ] Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:10 +02:00
Yazen Ghannam	856095b179	EDAC/mce_amd: Use SMCA prefix for error descriptions arrays The error descriptions defined for Fam17h can be reused for other SMCA systems, so their names should reflect this. Change f17h prefix to smca for error descriptions. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:09 +02:00
Yazen Ghannam	c019b951e1	EDAC/mce_amd: Add missing SMCA error descriptions Add missing SMCA error descriptions to the error descriptions arrays. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:09 +02:00
Yazen Ghannam	b300e87300	EDAC/mce_amd: Print syndrome register value on SMCA systems Print SyndV bit status and print the raw value of the MCA_SYND register. Further decoding of the syndrome from struct mce.synd can be done in other places where appropriate, e.g. DRAM ECC. Boris: make the error stanza more compact by putting the error address and syndrome on the same line: [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-\|CE\|-\|PCC\|AddrV\|-\|-\|SyndV\|CECC]: 0x96204100001e0117 [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000 [Hardware Error]: Invalid IP block specified. [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-09-13 15:23:07 +02:00
Colin Ian King	c7c35407cd	EDAC, sb_edac: Remove NULL pointer check on array pci_tad pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and hence cannot be NULL, so the NULL pointer check on pci_tad is redundant. Remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-12 20:15:43 +02:00
Michael Ellerman	372095723a	EDAC: Remove NO_IRQ from powerpc-only drivers We'd like to eventually remove NO_IRQ on powerpc, so remove usages of it from powerpc-only drivers. The pdata structs are kzalloc'ed, so we don't need to initialise those to 0, we can just drop the assignments entirely. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/1473674436-19467-1-git-send-email-mpe@ellerman.id.au Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-12 13:04:56 +02:00
Wei Yongjun	43fa9ba632	EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe() Return negative error code from the edac_mc_add_mc() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1473350284-26482-1-git-send-email-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-09 19:27:22 +02:00
York Sun	f47ae798d8	EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul() Replace obsolete simple_strtoul() with kstrtoul(). Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471990593-27536-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:03 +02:00
York Sun	eeb3d68b6c	EDAC, layerscape: Add Layerscape EDAC support Add DDR EDAC driver for ARM-based compatible controllers. Both big-endian and little-endian are supported, as specified in device tree. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471990465-27443-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:03 +02:00
York Sun	55764ed37e	EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed When compiled as a module, removing it causes kernel warnings when irq_dispose_mapping() is called. Instead of calling irq_of_parse_and_map(), use platform_get_irq() to acquire the IRQ number. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-8-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:02 +02:00
York Sun	339fdff14c	EDAC, fsl_ddr: Add support for little endian Get endianness from device tree. Both big endian and little endian are supported. Default to big endian for backwards compatibility to MPC85xx. Signed-off-by: York Sun <york.sun@nxp.com> Acked-by: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-7-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:02 +02:00
York Sun	4e2c3252d2	EDAC, fsl_ddr: Add missing DDR DRAM types The compatible DDR controllers may support DDR, DDR2, DDR3, DDR4 DRAM. An individual controller doesn't support all of them. The EDAC driver reads SDRAM_CFG to determine which mode is configured. Add DDR4 and drop the defines used only in the mtype assignment. Signed-off-by: York Sun <york.sun@nxp.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: morbidrsa@gmail.com Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-6-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:01 +02:00
York Sun	d43a9fb202	EDAC, fsl_ddr: Rename macros and names Use FSL-specific prefix for macros, variables and functions. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-5-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:01 +02:00
York Sun	ea2eb9a8b6	EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx The mpc85xx-compatible DDR controllers are used on ARM-based SoCs too. Carve out the DDR part from the mpc85xx EDAC driver in preparation to support both architectures. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470946525-3410-1-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:28:00 +02:00
York Sun	88857ebe71	EDAC, mpc85xx: Replace printk() with pr_* format Replace printk() with pr_err/pr_warn/pr_info macros. Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-3-git-send-email-york.sun@nxp.com [ Boris: unbreak strings for easier greppability. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:27:59 +02:00
York Sun	9e6a03a044	EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1 On e500v1, read fault exception enable (RFXE) controls whether assertion of core_fault_in causes a machine check interrupt. Assertion of core_fault_in can result from uncorrectable data error, such as an L2 multi-bit ECC error. It can also occur from a system error if logic on the integrated device signals a fault for nonfatal errors. RFXE bit is cleared out of reset, and should be left clear for normal operation. Assertion of core_fault_in does not cause a machine check. RFXE is set specifically for RIO (Rapid IO) and PCI for book E to catch the errors by machine check. With this bit set, the EDAC driver can't get the interrupt in case of uncorrectable error. So this bit is cleared in favor of EDAC. However, the benefit of catching such uncorrectable error doesn't outweigh the other errors which may hang the system. Besides, e500v2 has different errors masked by RFXE, and e500mc doesn't support this bit. It is more reasonable to leave RFXE as is in the EDAC driver, and leave the uncorrectable errors triggering machine check for e500v1. Suggested-by: Scott Wood <oss@buserror.net> Signed-off-by: York Sun <york.sun@nxp.com> Cc: Johannes Thumshirn <morbidrsa@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: oss@buserror.net Cc: stuart.yoder@nxp.com Link: http://lkml.kernel.org/r/1470779760-16483-2-git-send-email-york.sun@nxp.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 10:27:59 +02:00
Thor Thayer	b8978badc4	EDAC, altera: Rename MC trigger to common name Rename the Memory Controller debug trigger to the same common name as the EDAC devices. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471622666-15197-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 09:06:42 +02:00
Thor Thayer	f399f34bdb	EDAC, altera: Rename device trigger to common name The L2 and OCRAM devices have different ecc trigger names than the other EDAC devices (FIFO peripherals). Make them all the same and remove the character array from the device structure. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1471622666-15197-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-01 08:55:24 +02:00
Tony Luck	4ec656bdf4	EDAC, skx_edac: Add EDAC driver for Skylake This is an entirely new driver instead of yet another set of patches to sb_edac.c because: 1) Mapping from PCI devices to socket/memory controller is significantly different. Skylake scatters devices on a socket across a number of PCI buses. 2) There is an extra level of interleaving via the "mcroute" register that would be a little messy to squeeze into the old driver. 3) Validation is getting too expensive. Changes to sb_edac need to be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and Knights Landing. Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-21 10:58:34 -07:00
Tillmann Heidsieck	6fa06b0d9e	EDAC, mpc85xx: Fix PCIe error capture According to the reference manual of MPC8572 and T4240, bit 31 of PEX_ERR_CAP_STAT is W1C (write 1 to clear). Add the corresponding write to PEX_ERR_CAP_STAT in order to fix the PCIe error capture. Tested on a T4240 processor. Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160815190849.29327-1-theidsieck@leenox.de Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-18 10:17:40 +02:00
Bhaktipriya Shridhar	7bb8b77779	EDAC, wq: Remove deprecated create_singlethread_workqueue() Replace the deprecated create_singlethread_workqueue() with alloc_ordered_workqueue() with WQ_MEM_RECLAIM. This is the identity conversion. It's not recommended to stall it from memory pressure. Hence, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160813164124.GA9077@Karyakshetra Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-15 07:21:29 +02:00
Wei Yongjun	9bcd919eb8	EDAC, altera: Make a10_eccmgr_ic_ops static Fix the following sparse warning: drivers/edac/altera_edac.c:1649:23: warning: symbol 'a10_eccmgr_ic_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lkml <linux-kernel@vger.kernel.org> Link: http://lkml.kernel.org/r/1470836667-11822-1-git-send-email-weiyj.lk@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-10 18:37:06 +02:00
Thor Thayer	911049845d	EDAC, altera: Add Arria10 SD-MMC EDAC support Add Altera Arria10 SD-MMC FIFO memory EDAC support. The SD-MMC is a dual port RAM implementation which is different than any of the other peripherals and therefore requires additional code. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470753653-23465-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-10 14:43:14 +02:00
Yazen Ghannam	dc0a50a841	EDAC, amd64: Fix channel decode on Fam15hMod60h systems Fam15hMod60h systems are using the channel decode of Fam15hMod30h which gives incorrect results. Fam15hMod60h systems should use the generic channel decode method plus a couple more cases. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1470236355-30039-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:59:42 +02:00
Thor Thayer	485fe9e24e	EDAC, altera: Add Arria10 QSPI support Add Altera Arria10 QSPI FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-9-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:59:39 +02:00
Thor Thayer	c609581d1f	EDAC, altera: Add Arria10 USB support Add Altera Arria10 USB FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:59:37 +02:00
Thor Thayer	e8263793b7	EDAC, altera: Add Arria10 DMA support Add Altera Arria10 DMA FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-7-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:59:36 +02:00
Thor Thayer	c6882fb2e8	EDAC, altera: Add Arria10 NAND support Add Altera Arria10 NAND FIFO memory support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1468512408-5156-6-git-send-email-tthayer@opensource.altera.com [ Reformat loop in altr_edac_a10_probe() for better readability. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:59:35 +02:00
Lukasz Odzioba	c5b48fa7e2	EDAC, sb_edac: Fix channel reporting on Knights Landing On Intel Xeon Phi Knights Landing processor family the channels of the memory controller have untypical arrangement - MC0 is mapped to CH3,4,5 and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the channel name incorrectly. We missed this change earlier, so the code already contains similar comment, but the translation function is incorrect. Without this patch: errors in DIMM_A and DIMM_D were reported in DIMM_D errors in DIMM_B and DIMM_E were reported in DIMM_E errors in DIMM_C and DIMM_F were reported in DIMM_F Correct this. Hubert Chrzaniuk: - rebased to 4.8 - comments and code cleanup Fixes: `d0cdf90031` ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support") Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Cc: lukasz.odzioba@intel.com Cc: mchehab@kernel.org Cc: <stable@vger.kernel.org> # v4.5.. Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> [ Boris: Simplify a bit by removing char mc. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:52:08 +02:00
Linus Torvalds	c79a14defb	* Altera Arria10 ethernet FIFO buffer support (Thor Thayer) * Minor cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXlbvGAAoJEBLB8Bhh3lVK9g0P/16Z5/WIqrrLjJegcZ3qKb4m u2b5FBwbTDQyj/smZmfZfhZUleO7HDayL188fHPKO11my+Mvjyws8IZCvHshy7+3 b6Krno8NXZuNYtm32srSQQoLE4eN5pookOPMe2KJcRFghwnDqOo0ZfMZ8Lt6RVGg YtqiVZ3DwaqoHUbvxkHvdJuMbtVEZIEC6SzK9t4URmdncAx+3/BTLqDSYc4LaL+B tI6REgMy+QT1IUMImuUbOA5ktd78K+D26YKF9NuTSWNQvwzjSYy4y6WmDsjYpZF6 h6YWgGwERIy6gws5IdvkBMKX0s1RV7/sY6xFBsEQrX+IFywyY5OcI1gPRttD6VAn ORJ4j7WNH8Phqld9YxjEX8yTwGxlxge88J/lb0i9s5jXPMX7ioxAqAjdpsJj6PlA sNDP0fTyCECqME73cWaAS9bdrzB2dgOxEDfcWKWR24vZeHfnh72owD1k+J0iWu9c 0R2usPB1Yjbo6ODGpqG1R+u09LR+RpNe38oT5GcsFrj6/EaGz0kn2QulaNtylsDo J5qkm02X1h3+F8/UHrzKavOGk82IQzLCbcW7eHPmyzkH8PTe1semLk9jHnXN5sK3 uL2nirZQVIF6QLbYxf0Uf2oSVs05xc7u9ZT+QOl6hqh/6/K7DWC45SoABwMXDaHH jXr+y4vtr5IR8GQfZ/nd =mdxx -----END PGP SIGNATURE----- Merge tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "This last cycle, Thor was busy adding Arria10 eth FIFO support to the altera_edac driver along with other improvements. We have two cleanups/fixes too. Summary: - Altera Arria10 ethernet FIFO buffer support (Thor Thayer) - Minor cleanups" * tag 'edac_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: dts: Add Arria10 Ethernet EDAC devicetree entry EDAC, altera: Add Arria10 Ethernet EDAC support EDAC, altera: Add Arria10 ECC memory init functions Documentation: dt: socfpga: Add Arria10 Ethernet binding EDAC, altera: Drop some ifdeffery EDAC, altera: Add panic flag check to A10 IRQ EDAC, altera: Check parent status for Arria10 EDAC block EDAC, altera: Make all private data structures static EDAC: Correct channel count limit EDAC, amd64_edac: Init opstate at the proper time during init EDAC, altera: Handle Arria10 SDRAM child node EDAC, altera: Add ECC Manager IRQ controller support Documentation: dt: socfpga: Add interrupt-controller to ecc-manager	2016-07-27 13:40:47 -07:00
Tony Luck	0ba169ac36	EDAC, sb_edac: Fix Knights Landing In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") I broke Knights Landing because I failed to notice that it called a wrapper macro "sbridge_get_all_devices_knl" instead of "sbridge_get_all_devices" like all the other types. Now that we include the processor type in the pci_id_table structure we can skip the wrappers and just have the sbridge_get_all_devices() check the type to decide whether to allow duplicate devices and controllers to have registers spread across buses. Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-16 06:11:59 +09:00
Thor Thayer	ab8c1e0fb0	EDAC, altera: Add Arria10 Ethernet EDAC support Add Altera Arria10 Ethernet FIFO memory EDAC support. Update to support a common compatibility string for all Ethernet FIFOs in the DT. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-25 11:31:34 +02:00
Thor Thayer	1166fde93d	EDAC, altera: Add Arria10 ECC memory init functions In preparation for additional memory module ECCs, add the memory initialization functions and helpers. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-7-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-24 21:16:38 +02:00
Thor Thayer	6b300fb953	EDAC, altera: Drop some ifdeffery Make the IRQ and check_deps() functions available to all the memory buffers by moving them outside of the OCRAM only area. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-5-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-24 14:18:33 +02:00
Thor Thayer	2b083d65ff	EDAC, altera: Add panic flag check to A10 IRQ In preparation for additional memory module ECCs, the IRQ function will check a panic flag before doing a kernel panic on double bit errors. OCRAM uncorrectable errors cause a panic because sleep/resume functions and FPGA contents during sleep are stored in OCRAM. ECCs on peripheral FIFO buffers will not cause a kernel panic on DBERRs because the packet can be retried and therefore recovered. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-24 11:58:43 +02:00
Thor Thayer	44ec9b307e	EDAC, altera: Check parent status for Arria10 EDAC block In preparation for the Arria10 ECC modules, check the status of the parent in the device tree to ensure the block is enabled. Skip if no parent phandle is set in the device tree. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-24 11:58:42 +02:00
Thor Thayer	1cf7037724	EDAC, altera: Make all private data structures static The device private data structures are used only here so make them static. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1466603939-7526-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-24 11:58:42 +02:00
Borislav Petkov	bba142957e	EDAC: Correct channel count limit `c44696fff0` ("EDAC: Remove arbitrary limit on number of channels") lifted the arbitrary limit on memory controller channels in EDAC. However, the dynamic channel attributes dynamic_csrow_dimm_attr and dynamic_csrow_ce_count_attr remained 6. This wasn't a problem except channels 6 and 7 weren't visible in sysfs on machines with more than 6 channels after the conversion to static attr groups with `2c1946b6d6` ("EDAC: Use static attribute groups for managing sysfs entries") [ without that, we're exploding in edac_create_sysfs_mci_device() because we're dereferencing out of the bounds of the dynamic_csrow_dimm_attr array. ] Add attributes for channels 6 and 7 along with a guard for the future, should more channels be required and/or to sanity check for misconfigured machines. We still need to check against the number of channels present on the MC first, as Thor reported. Signed-off-by: Borislav Petkov <bp@suse.de> Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com> Tested-by: Thor Thayer <tthayer@opensource.altera.com> Cc: <stable@vger.kernel.org> # 4.2	2016-06-16 10:06:35 +02:00
Borislav Petkov	6ba92fea1b	EDAC, amd64_edac: Init opstate at the proper time during init It is useless to do it if we're loaded on unsupported hardware so do that only after we have detected at least 1 supported AMD northbridge. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-16 01:13:18 +02:00
Thor Thayer	ab564cb51e	EDAC, altera: Handle Arria10 SDRAM child node Separate the device match arrays for each platform to prevent CycloneV matches when calling of_platform_populate() on the Arria10 ECC manager node. If the SDRAM is a child node of ECC manager, call probe function via of_platform_populate(). Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1464193783-5071-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-08 13:23:09 +02:00
Thor Thayer	13ab8448d2	EDAC, altera: Add ECC Manager IRQ controller support To better support child devices, the ECC manager needs to be implemented as an IRQ controller. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1465331757-10227-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-08 13:23:01 +02:00
Tony Luck	665f05e0b8	EDAC, sb_edac: Readd accidentally dropped Broadwell-D support In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") we switched from using PCI ids to determine which platform we are running on to using CPU model instead. I forgot that Broadwell-DE has its own distinct model number different from Broadwell-EP or -EX. Fixing this isn't just adding a line to the array of cpuids - the exising code assumed a 1:1 mapping between entries in that array and the "enum type" values. Added the type to pci_id_table structure to remove this dependency and allows two Broadwell cpu models. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 17:28:21 +02:00
Nicholas Krause	fbedcaf43f	EDAC: Fix workqueues poll period resetting After the workqueue cleanup, we're registering workqueues based on the presence of an ->edac_check function. When that is the case, we're setting OP_RUNNING_POLL. But we forgot to check that in edac_mc_reset_delay_period(), leading to: BUG: unable to handle kernel paging request at 0000000000015d10 IP: [ .. ] queued_spin_lock_slowpath PGD 3ffcc8067 PUD 3ffc56067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: ... CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Stack: Call Trace: ? _raw_spin_lock_irqsave ? lock_timer_base.isra.34 ? del_timer ? try_to_grab_pending ? mod_delayed_work_on ? edac_mc_reset_delay_period ? edac_set_poll_msec ? param_attr_store ? module_attr_store ? kernfs_fop_write ? __vfs_write ? __vfs_read ? __alloc_fd ? vfs_write ? SyS_write ? entry_SYSCALL_64_fastpath Code: RIP [ .. ] queued_spin_lock_slowpath RSP <> CR2: 0000000000015d10 ---[ end trace 3f286bc71cca15d1 ]--- Kernel panic - not syncing: Fatal exception Fix it. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Cc: <stable@vger.kernel.org> # 4.5 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com [ Rewrite commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 11:14:27 +02:00
Tony Luck	c7103f650a	EDAC, sb_edac: Fix rank lookup on Broadwell Broadwell made a small change to the rank target register moving the target rank ID field up from bits 16:19 to bits 20:23. Also found that the offset field grew by one bit in the IVY_BRIDGE to HASWELL transition, so fix the RIR_OFFSET() macro too. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org # v3.19+ Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 10:05:49 +02:00
Linus Torvalds	16bf834805	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...	2016-05-17 17:05:30 -07:00
Linus Torvalds	1cc3880a3c	* Altera Arria10 L2 cache and On-Chip RAM ECC handling. (Thor Thayer) * Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac. (Tony Luck) * Do not register sb_edac with pci_register_driver(). (Tony Luck) * Add support for Skylake to ie31200_edac. (Jason Baron) * Do not register amd64_edac with pci_register_driver(). (Borislav Petkov) + the usual round of cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXOaHaAAoJEBLB8Bhh3lVKRdoP/jDewaX4GAxgcKaK9Kx7VjMe i42E/7syh5iLoyFgZ1hUjvqiQHN4RyWloUYyNHbNxGS9uXCMXIUoJQd2FjOY442s oeEaoMCfZsJLzKQti3fUPK9GugZ57BSZxfn6NlJPpyG4FxSirOcZFCxGaNuyqqrh 0M+gFvbWfZcVDLE0FI5CUYZWRxsk//+pLM4ZkQZFA8Eo1tGVc3r0zEEAlL/uEMDW P0ATvRHNUeib9YQGTdSD7cNUpRX4SX+T8lDUiaVm+tHL5ES3b4rayMBdbVMrahfc Gnxu9GtO5gWXEY6QDDpOx+VkbfFmDFodx833psJ3MVD8evEFHHdinkgDaptLrrV6 92ZDKR5s3W6tKXkcmGuExrtc17UgjLcRZCebXbv+5FJlVrslzvQ9ESOTRyiZBrCD ZpFi2TZhpYU1uEWuoBZCbWNXW2pcSt7/bQ9bYUvfrvNfgPzPnblubJuVKRcfY0WB x2vj0PNnckpoPRvskV7GEe0Y/JISzAxBQUK6XO+GJgMgz5M+23SEaSVU5yeyf26e x/yD5yImQGN84AxfRMMvbR2JvpVLN3vdFWtWigneht160erVA/Qgw5Gcrpw53tZr zPD3fGaetrNgS8BvJAy/c6xHV1j6A1RGCGThy8Ivqep9WD90a5JhR1NRjZp6m8sf aB0A1zTP7e+hRFkZGahO =ud2P -----END PGP SIGNATURE----- Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "It was pretty busy in EDAC land this time: - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer) - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac (Tony Luck) - Do not register sb_edac with pci_register_driver() (Tony Luck) - Add support for Skylake to ie31200_edac (Jason Baron) - Do not register amd64_edac with pci_register_driver() (Borislav Petkov) ... plus the usual round of cleanups and fixes all over the place" * tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits) EDAC, amd64_edac: Drop pci_register_driver() use EDAC, ie31200_edac: Add Skylake support EDAC, sb_edac: Use cpu family/model in driver detection EDAC, i7core: Remove double buffering of error records EDAC, amd64_edac: Issue driver banner only on success ARM: socfpga: Initialize Arria10 OCRAM ECC on startup EDAC: Increment correct counter in edac_inc_ue_error() EDAC, sb_edac: Remove double buffering of error records EDAC: Fix used after kfree() error in edac_unregister_sysfs() EDAC, altera: Avoid unused function warnings EDAC, altera: Remove useless casts ARM: socfpga: Enable Arria10 OCRAM ECC on startup EDAC, altera: Add Arria10 OCRAM ECC support Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding EDAC, altera: Make OCRAM ECC dependency check generic EDAC, altera: Add register offset for ECC Enable EDAC, altera: Extract error inject operations to a struct fops ARM: socfpga: Enable Arria10 L2 cache ECC on startup EDAC, altera: Add Arria10 L2 Cache ECC handling Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding ...	2016-05-16 18:44:39 -07:00
Yazen Ghannam	a348ed83d9	EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA Use X86_FEATURE_SMCA when detecting if SMCA is available instead of directly using CPUID 0x80000007_EBX. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-12 09:08:23 +02:00
Borislav Petkov	3f37a36b62	EDAC, amd64_edac: Drop pci_register_driver() use - remove homegrown instances counting. - take F3 PCI device from amd_nb caching instead of F2 which was used with the PCI core. With those changes, the driver doesn't need to register a PCI driver and relies on the northbridges caching which we do anyway on AMD. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>	2016-05-09 20:41:16 +02:00
Jason Baron	953dee9bbd	EDAC, ie31200_edac: Add Skylake support Skylake adjusts some register locations, but otherwise follows the existing model quite closely. I was able to verify that the 'ce_count' increments when 'bad dimms' are used. The accounting of 'ce_count' and 'ue_count' is the primary functionality of interest for us. Tested on Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-05-06 18:50:14 +02:00
Tony Luck	2c1ea4c700	EDAC, sb_edac: Use cpu family/model in driver detection Instead of picking a random PCI ID from the dozen or so we need to access, just use x86_match_cpu() to pick based on CPU model number. The choosing of PCI devices has been problematic in the past, see `11249e7399` ("sb_edac: Fix detection on SNB machines") which fixed problems introduced by `d0585cd815` ("sb_edac: Claim a different PCI device"). This is especially ugly if future hardware might not even have EDAC-relevant registers in PCI config space and we would still be required to choose some "random" PCI devices to scan for just so our driver loads. Is this cleaner/clearer? It deletes much more code than it adds. Only tested on Broadwell. The driver loads/unloads and loads again. Still decodes errors too. Signed-off-by: Tony Luck <tony.luck@intel.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-05-02 19:44:43 +02:00
Tony Luck	5359534505	EDAC, i7core: Remove double buffering of error records In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in `f29a7aff4b` ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as i7core_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-29 16:41:24 +02:00
Tony Luck	c4fc1956fa	EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-29 15:43:10 +02:00
Borislav Petkov	de0336b30d	EDAC, amd64_edac: Issue driver banner only on success ... and don't mislead users into thinking that the driver has loaded successfully. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-27 12:30:26 +02:00
Emmanouil Maroudas	993f88f1cc	EDAC: Increment correct counter in edac_inc_ue_error() Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `4275be6355` ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 18:10:09 +02:00
Tony Luck	ad08c4e974	EDAC, sb_edac: Remove double buffering of error records In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in `f29a7aff4b` ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as sbridge_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: patrickg@supermicro.com Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 14:02:02 +02:00
Tony Luck	ab67b6c22d	EDAC: Fix used after kfree() error in edac_unregister_sysfs() Code flow looks like this: device_unregister(&mci->dev); -> kobject_put+0x25/0x50 -> kobject_cleanup+0x77/0x190 -> device_release+0x32/0xa0 -> mci_attr_release+0x36/0x70 -> kfree(mci); bus_unregister(mci->bus); Fix is to grab a local copy of "mci->bus" and use that when we call bus_unregister(). Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 13:23:54 +02:00
Arnd Bergmann	1aa6eb5c5b	EDAC, altera: Avoid unused function warnings The recently added Arria10 OCRAM ECC support caused some new harmless warnings about unused functions when it is disabled: drivers/edac/altera_edac.c:1067:20: error: 'altr_edac_a10_ecc_irq' defined but not used [-Werror=unused-function] drivers/edac/altera_edac.c:658:12: error: 'altr_check_ecc_deps' defined but not used [-Werror=unused-function] This rearranges the code slightly to have those two functions inside of the same #ifdef that hides their callers. It also manages to avoid a forward declaration of the IRQ handler in the process. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Alan Tull <atull@opensource.altera.com> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `c7b4be8db8` ("EDAC, altera: Add Arria10 OCRAM ECC support") Link: http://lkml.kernel.org/r/1460837650-1237650-2-git-send-email-arnd@arndb.de Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 12:00:12 +02:00
Arnd Bergmann	2c911f6cac	EDAC, altera: Remove useless casts The altera EDAC driver refers to its per-device data using a cast to '(void *)', which makes the pointer non-const, though both the source and destination are actually const. Removing the annotation makes the reference (almost) fit into a single line for improved readability, and ensures that it is actually defined as const. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Alan Tull <atull@opensource.altera.com> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1460837650-1237650-1-git-send-email-arnd@arndb.de Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 11:54:42 +02:00
Tony Luck	ea5dfb5fae	x86 EDAC, sb_edac.c: Take account of channel hashing when needed Haswell and Broadwell can be configured to hash the channel interleave function using bits [27:12] of the physical address. On those processor models we must check to see if hashing is enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and act accordingly. Based on a patch by patrickg <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Tony Luck	ff15e95c82	x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address In commit: `eb1af3b71f` ("Fix computation of channel address") I switched the "sck_way" variable from holding the log2 value read from the h/w to instead be the actual number. Unfortunately it is needed in log2 form when used to shift the address. Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `eb1af3b71f` ("Fix computation of channel address") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Masanari Iida	c19ca6cb4c	treewide: Fix typos in printk This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-04-18 11:23:24 +02:00
Thor Thayer	c7b4be8db8	EDAC, altera: Add Arria10 OCRAM ECC support Add Arria10 On-Chip RAM ECC handling. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459992174-8015-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-07 12:42:56 +02:00
Thor Thayer	aa1f06dcc0	EDAC, altera: Make OCRAM ECC dependency check generic In preparation for the Arria10 peripheral ECCs, move the OCRAM ECC dependency check into the general ECC area since this same function can be used by other memories. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-02 13:47:15 +02:00
Thor Thayer	943ad91798	EDAC, altera: Add register offset for ECC Enable In preparation for the Arria10 peripheral ECCs, add a register offset from the ECC base to index to the ECC enable register. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-02 13:38:57 +02:00
Thor Thayer	e17ced2cb6	EDAC, altera: Extract error inject operations to a struct fops In preparation for the Arria10 peripheral ECCs, extract the inject file operations because the Arria10 IRQ trigger mechanism is different than Cyclone5/Arria5 and Arria10 L2 cache. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-02 13:27:06 +02:00
Thor Thayer	588cb03ea2	EDAC, altera: Add Arria10 L2 Cache ECC handling Add a private data structure for Arria10 L2 cache ECC and the probe function for it. The Arria10 ECC device IRQs are in a shared register so the ECC Manager parent/child relationship requires a different probe function. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:34:06 +02:00
Thor Thayer	811fce4f2a	EDAC, altera: Add register offset for ECC Error Inject In preparation for the Arria10 peripheral ECCs, add a register offset from the ECC base to the private data structure to index to the error injection register. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-6-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:20:48 +02:00
Thor Thayer	27439a1a63	EDAC, altera: Abstract ECC Enable Mask in check_deps() In preparation for the Arria10 peripheral ECCs, use the ECC Enable mask in place of hard coded masks in the check dependency functions. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-5-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:17:32 +02:00
Thor Thayer	328ca7ae81	EDAC, altera: Remove platform device from check_deps() In preparation for the Arria10 peripheral ECCs, remove the platform device parameter from the check_deps() functions because it is not needed and makes the Arria10 check_deps() cleaner. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:14:21 +02:00
Thor Thayer	05b088b6f8	EDAC, altera: Move device structs and defines to the header Move the device structs and defines to altera_edac.h in preparation for adding the Arria10 L2 cache ECC. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:11:16 +02:00
Thor Thayer	3a8f21f170	EDAC, altera: Make L2C depend on L2x0 cache controller Make L2 cache depend instead of forcibly select the L2 cache support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-29 10:06:11 +02:00
Linus Torvalds	047486d8e7	EDAC queue for 4.6 * Altera: L2 cache and On-Chip RAM support (Thor Thayer). * EDAC: Workqueue handling cleanups (Borislav Petkov). * Xgene: Register bus error handling (Loc Ho). * Misc small fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW5sS0AAoJEBLB8Bhh3lVKxPEP/j8pleG17s1HI9wfhukzuT9z epjyaYjEv1BR6+3drmFjbAYGgk7hhL8khVePEhouS/P5WQSzRHMdgimL5NMpnwSZ 6XXyR2szhz86eYMC1lQxdRnESZarbnVtqyiRG0Mv1hFbObyzM0ewiHt2lTtCa1Uj DeprrSE6QCQLwq0WSIF8MJ9LBcmGh5dXJTy0sTeATfkmgBaBQeJhiZOmJer25Jqf tEyvxkkFw0OLAy3BoI7eeI7ALmgzIXPLIWVOo0t1qTeKsURwdfap8xjteT9/5iiJ HHB+4A+8iUMSPYMoIiSr0qsyZT1CI2ncVV4VOAhm11if4gB8sCnOkioMo6bHhtbq NZrug6BnwizBBh4FFGHmTTpN8MlhLgYA8YfSkUD5jal3jM/l5VvlM9DDM72JuzLy 0hd5zaAqlqEJj4plvSx4ieuDHAkA3Qzd58U12LuN+R+nbN2EO8Svr9CvEMSLE+1r exxOYRP5xO9SPeK7pTnXJFsI09MrRmXGinRQu/LChWAm1j7NxaHkeUl7ulzxjV+5 E6Bx+mOPvYTUF32CQi4i/oIePK2kVGoExaUSkRNKeRgwF/ObQaVxjM+cYvM+VJEh 2OknzHNG/F87UvkcOdGMEy6G4E5xkHSuEYtq7bHGWI2prKn+f0nbPpWgx9d4DiE8 jo0vi58v+Irwk9H465lH =ssy9 -----END PGP SIGNATURE----- Merge tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - Altera: L2 cache and On-Chip RAM support (Thor Thayer). - EDAC: Workqueue handling cleanups (Borislav Petkov). - Xgene: Register bus error handling (Loc Ho). - Misc small fixes. * tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: socfpga: Enable OCRAM ECC on startup ARM: socfpga: Enable L2 cache ECC on startup ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries EDAC, altera: Add Altera L2 cache and OCRAM support EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() EDAC, mpc85xx: Silence unused variable warning EDAC: Cleanup/sync workqueue functions EDAC: Kill workqueue setup/teardown functions EDAC: Balance workqueue setup and teardown arm64: Update the APM X-Gene EDAC node with the RB register resource EDAC, xgene: Add missing SoC register bus error handling Documentation, EDAC: Update xgene binding for missing register bus EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()	2016-03-16 08:36:55 -07:00
Linus Torvalds	d88bfe1d68	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call	2016-03-14 18:43:51 -07:00
Luck, Tony	eb1af3b71f	EDAC/sb_edac: Fix computation of channel address Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 18:31:55 +01:00
Aravind Gopalakrishnan	be0aec23bf	x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-08 11:48:14 +01:00
Hubert Chrzaniuk	83bdaad4d9	EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi Correct a typo introduced by `d0cdf90031` ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-07 19:07:40 +01:00
Thor Thayer	c3eea1942a	EDAC, altera: Add Altera L2 cache and OCRAM support Add L2 Cache and On-Chip RAM EDAC support for the Altera SoCs. The SDRAM controller is using the Memory Controller model. Each type of ECC is individually configurable. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-doc@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1455132384-17108-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-11 12:23:06 +01:00
Thor Thayer	9bf4f00567	EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() debugfs_remove() is used to remove a file or a directory from the debugfs filesystem on an EDAC device exit. However edac_debugfs might not be empty. This is similar to `30f84a891b` ("EDAC: Use edac_debugfs_remove_recursive()") which changed the EDAC MCI code to use edac_debugfs_remove_recursive(). Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-10 10:37:46 +01:00
Sudip Mukherjee	f2b59ac66f	EDAC, mpc85xx: Silence unused variable warning We were getting this build warning: drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr' pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it __maybe_unused. Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 18:53:15 +01:00
Borislav Petkov	06e912d4d4	EDAC: Cleanup/sync workqueue functions They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:38:50 +01:00
Borislav Petkov	626a7a4dba	EDAC: Kill workqueue setup/teardown functions We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:38:43 +01:00
Borislav Petkov	0966760619	EDAC: Balance workqueue setup and teardown We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE before destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:04:29 +01:00
Loc Ho	4d67e3ce75	EDAC, xgene: Add missing SoC register bus error handling Add missing register bus error handling for APM X-Gene EDAC SoC and fix a checking condition for CE error promoted to UE. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Link: http://lkml.kernel.org/r/1453495625-28006-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-25 11:17:22 +01:00
Dan Carpenter	6f3508f61c	EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr() dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: `c8e518d567` ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-25 11:17:14 +01:00
Geliang Tang	1cac5503fb	EDAC, i5100: Use to_delayed_work() Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/58c0e319c7263a10b692100c657c06c42814aecf.1451659910.git.geliangtang@163.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-01 18:31:34 +01:00
Hubert Chrzaniuk	45f4d3ab3e	EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing Knights Landing does not come with register that could be used to fetch DIMM width. However the value is fixed for this architecture so it can be hardcoded. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:58:32 +01:00
Borislav Petkov	c4cf3b454e	EDAC: Rework workqueue handling Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with `91b99041c1` ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:43 +01:00
Borislav Petkov	e136fa016f	EDAC: Make edac_device workqueue setup/teardown functions static They're not used anywhere else. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:42 +01:00
Borislav Petkov	d4538000ca	EDAC: Remove edac_get_sysfs_subsys() error handling It cannot fail now. We either load EDAC core after having successfully initialized edac_subsys or we don't. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:41 +01:00
Borislav Petkov	a97d262701	EDAC: Unexport and make edac_subsys static ... and use the accessor instead. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:40 +01:00
Borislav Petkov	733476cf20	EDAC: Rip out the edac_subsys reference counting This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:39 +01:00
Borislav Petkov	fcd5c4dd82	EDAC: Robustify workqueues destruction EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work synchronously too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>	2015-12-11 16:56:39 +01:00
Borislav Petkov	12e26969b3	EDAC, mc_sysfs: Fix freeing bus' name I get the splat below when modprobing/rmmoding EDAC drivers. It happens because bus->name is invalid after bus_unregister() has run. The Code: section below corresponds to: .loc 1 1108 0 movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus .loc 1 1109 0 popq %rbx # .loc 1 1108 0 movq (%rax), %rdi # _7->name, jmp kfree # and %rax has some funky stuff 2030203020312030 which looks a lot like something walked over it. Fix that by saving the name ptr before doing stuff to string it points to. general protection fault: 0000 [#1] SMP Modules linked in: ... CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48 Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011 task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000 RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292 RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286 RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110 R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68 R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000 FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0 Stack: Call Trace: i7core_unregister_mci.isra.9 i7core_remove pci_device_remove __device_release_driver driver_detach bus_remove_driver pci_unregister_driver i7core_exit SyS_delete_module system_call_fastpath 0x7fc9bf426536 Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP <ffff88030da3fe28> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: <stable@vger.kernel.org> # v3.6.. Fixes: `7a623c0390` ("edac: rewrite the sysfs code to use struct device")	2015-12-11 16:56:38 +01:00
Scott Wood	666db563d3	EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device Originally the mpc85xx-pci-edac driver bound directly to the PCI controller node. Commit `905e75c46d` ("powerpc/fsl-pci: Unify pci/pcie initialization code") turned the PCI controller code into a platform device. Since we can't have two drivers binding to the same device, the EDAC code was changed to be called into as a library-style submodule. However, this doesn't work if the EDAC driver is built as a module. Commit 8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting") exposed another problem with this approach -- mpc85xx_pci_err_probe() was being called in the same early boot phase that the PCI controller is initialized, rather than in the device_initcall phase that the EDAC layer expects. This caused a crash on boot. To fix this, the PCI controller code now creates a child platform device specifically for EDAC, which the mpc85xx-pci-edac driver binds to. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Jia Hongtao <B38951@freescale.com> Cc: Jiri Kosina <jkosina@suse.com> Cc: Kim Phillips <kim.phillips@freescale.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:16 +01:00
Jim Snow	d0cdf90031	EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support Knights Landing is the next generation architecture for HPC market. KNL introduces concept of a tile and CHA - Cache/Home Agent for memory accesses. Some things are fixed in KNL: () There's single DIMM slot per channel () There's 2 memory controllers with 3 channels each, however, from EDAC standpoint, it is presented as single memory controller with 6 channels. In order to represent 2 MCs w/ 3 CH, it would require major redesign of EDAC core driver. Basically, two functionalities are added/extended: () during driver initialization KNL topology is being recognized, i.e. which channels are populated with what DIMM sizes (knl_get_dimm_capacity function) () handle MCE errors - channel swizzling Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jim Snow <jim.m.snow@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 19:00:52 +01:00
Jim Snow	c1979ba254	EDAC, sb_edac: Add support for duplicate device IDs Add options to sbridge_get_all_devices() to allow for duplicate device IDs and devices that are scattered across mulitple PCI buses. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:57:41 +01:00
Jim Snow	c59f9c06bd	EDAC, sb_edac: Virtualize several hard-coded functions SAD limit, interleave mode and DRAM related functionalities are now virtualized, so that overriding them is easier. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:54:45 +01:00
Thierry Reding	768ce42cce	EDAC, mv64x60: Use platform_register/unregister_drivers() These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449073138-10852-2-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-03 12:05:40 +01:00
Thierry Reding	d54051f1cc	EDAC, mpc85xx: Use platform_register/unregister_drivers() These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449136632-11680-1-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-03 12:03:36 +01:00
Borislav Petkov	e8f937da74	EDAC, pci: Remove old disabled code Remove an unused edac_pci_find() function iterating over edac_pci_list. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-11-18 14:00:05 +01:00

... 2 3 4 5 6 ...

1491 Commits