OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Peter Zijlstra	5ebb34edbe	x86/intel: Aggregate microserver naming Currently big microservers have _XEON_D while small microservers have _X, Make it uniformly: _D. for i in `git grep -l "$INTEL_FAM6_\\|VULNWL_INTEL\\|INTEL_CPU_FAM6$._$X\\|XEON_D$"` do sed -i -e 's/$\(INTEL_FAM6_\\|VULNWL_INTEL\\|INTEL_CPU_FAM6$.ATOM.\)_X/\1_D/g' \ -e 's/$\(INTEL_FAM6_\\|VULNWL_INTEL\\|INTEL_CPU_FAM6$.\)_XEON_D/\1_D/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org	2019-08-28 11:29:32 +02:00
Colin Ian King	0042e9e7a5	EDAC/sb_edac: Remove redundant update of tad_base The variable tad_base is being set to a value that is never read and is being over-written on the next iteration of a for-loop. This assignment is therefore redundant and can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: James Morse <james.morse@arm.com> Cc: kernel-janitors@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lkml.kernel.org/r/20190508224201.27120-1-colin.king@canonical.com	2019-06-20 11:44:36 -07:00
Thomas Gleixner	122375508b	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 172 Based on 1 normalized pattern(s): this file may be distributed under the terms of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 9 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.395589349@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-30 11:26:39 -07:00
Tony Luck	432de7fd76	EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting The count of errors is picked up from bits 52:38 of the machine check bank status register. But this is the count of corrected errors. If an uncorrected error is being logged, the h/w sets this field to 0. Which means that when edac_mc_handle_error() is called, the EDAC core will carefully add zero to the appropriate uncorrected error counts. Signed-off-by: Tony Luck <tony.luck@intel.com> [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.com	2018-09-29 10:58:16 +02:00
Qiuxu Zhuo	6f6da13604	EDAC: Correct DIMM capacity unit symbol The {i3200\|i7core\|sb\|skx}_edac drivers show DIMM capacity using the wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with 'MiB' - mebibyte. [Tony: These are all "edac_dbg()" messages, so this won't break scripts that parse console logs.] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com	2018-09-22 18:18:57 +02:00
Luck, Tony	c968ed0859	EDAC, sb_edac: Fix signedness bugs in *_get_ha() functions A static checker gave the following warnings: drivers/edac/sb_edac.c:1030 ibridge_get_ha() warn: signedness bug returning '(-22)' drivers/edac/sb_edac.c:1037 knl_get_ha() warn: signedness bug returning '(-22)' Both because the functions are declared to return a "u8", but try to return -EINVAL for the error case. Fix by returning 0xff (since the caller doesn't look at, or pass on, the return value). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180914201905.GA30946@agluck-desk Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-15 11:41:08 +02:00
Qiuxu Zhuo	8489b17ce2	EDAC, sb_edac: Fix reporting for patrol scrubber errors sb_edac sometimes reports the wrong DIMM for a memory error found by the patrol scrubber. That is because the hardware provides only a 4KB page-aligned address for the error case. This means that the EDAC driver will point at the DIMM matching offset 0x0 in the 4KB page, but because of interleaving across channels and ranks, the actual DIMM involved may be different if the error is on some other cache line within the page. Therefore, reconstruct the socket/iMC/channel information from the "mce" structure passed to the EDAC driver. The DIMM cannot be determined, so pass "dimm=-1" to the EDAC core. It will report that all the DIMMs on that channel may be affected. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180907230828.13901-3-tony.luck@intel.com [ Improve comments on the functions to convert bank number to memory controller number. Minor cleanup to commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> [ Massage commit message more. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-11 11:09:54 +02:00
Qiuxu Zhuo	dcc960b225	EDAC, sb_edac: Return early on ADDRV bit and address type test Users of the mce_register_decode_chain() are called for every logged error. EDAC drivers should check: 1) Is this a memory error? [bit 7 in status register] 2) Is there a valid address? [bit 58 in status register] 3) Is the address a system address? [bitfield 8:6 in misc register] The sb_edac driver performed test "1" twice. Waited far too long to perform check "2". Didn't do check "3" at all. Fix it by moving the test for valid address from sbridge_mce_output_error() into sbridge_mce_check_error() and add a test for the type immediately after. Delete the redundant check for the type of the error from sbridge_mce_output_error(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com [ Re-word commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-11 10:59:21 +02:00
Andy Shevchenko	bd1852317f	EDAC: Get rid of custom ICPU() macro Replace custom grown macro with generic INTEL_CPU_FAM6() one. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180831082341.72363-1-andriy.shevchenko@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-03 12:18:35 +02:00
Masayoshi Mizuma	190bd6e98a	EDAC, sb_edac: Add support for systems with segmented PCI buses Extend the driver to check whether segment number and bus number matches when deciding how to group memory controller PCI devices to CPU sockets. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com [ Cleanup commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2018-07-25 11:17:15 +02:00
Gustavo A. R. Silva	6fd0526652	EDAC, sb_edac: Remove variable length array usage In preparation for enabling -Wvla, remove VLA and replace it with a fixed-length array instead. Also, remove max_interleave as it is no longer needed. Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180314182131.GA25259@embeddedgus Signed-off-by: Borislav Petkov <bp@suse.de>	2018-03-17 05:24:55 +01:00
Anna Karbownik	bf8486709a	EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL Commit `3286d3eb90` ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights Landing which supports up to 6 channels. This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm variables which don't pay critical role on KNL code path, so the memory corruption wasn't causing any visible driver failures. The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that. An alternative solution would be to restructure the KNL part of the driver to 2MC/3channel representation. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Karbownik <anna.karbownik@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: jim.m.snow@intel.com Cc: krzysztof.paliswiat@intel.com Cc: lukasz.odzioba@intel.com Cc: qiuxu.zhuo@intel.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Fixes: `3286d3eb90` ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4") Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2018-02-23 12:05:37 +01:00
Gustavo A. R. Silva	a8e9b186f1	EDAC, sb_edac: Fix missing break in switch Add missing break statement in order to prevent the code from falling through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-10-19 10:53:42 +02:00
Luis Felipe Sandoval Castro	24281a2f4c	EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode When figuring out the size of the DIMMs and the cluster mode is SNC2 or SNC4 the current algorithm ignores the contribution of some of the channels resulting in EDAC never knowing of the existence of some DIMMs attached to such channels (thus sysfs is not populated). Instead of selectively iterating from 0 to interlv_ways when looking for all the participants in the interleave, do an exhaustive search and iterate from 0 to KNL_MAX_CHANNELS. The algorithm is already smart enough to consider participants only one time. This works fine in all KNL cluster modes and even when there are missing DIMMs as the contribution of those channels is 0. Signed-off-by: Luis Felipe Sandoval Castro <luis.felipe.sandoval.castro@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: arozansk@redhat.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: qiuxu.zhuo@intel.com Link: http://lkml.kernel.org/r/1506606882-90521-1-git-send-email-luis.felipe.sandoval.castro@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-10-11 15:57:25 +02:00
Qiuxu Zhuo	15cc3ae001	EDAC, sb_edac: Don't create a second memory controller if HA1 is not present Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3) server (DELL PowerEdge 730xd): EDAC sbridge: Some needed devices are missing EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0 EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0 EDAC sbridge: Couldn't find mci handler EDAC sbridge: Couldn't find mci handler EDAC sbridge: Failed to register device with error -19. The refactored sb_edac driver creates the IMC1 (the 2nd memory controller) if any IMC1 device is present. In this case only HA1_TA of IMC1 was present, but the driver expected to find HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure. The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi Zhang inserted one DIMM per channel for each CPU, and did random error address injection test with this patch: 4024 addresses fell in TOLM hole area 12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0 12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0 12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 106400 addresses were injected totally. The test result shows that all the 4 channels belong to IMC0 per CPU, so the server really only has one IMC per CPU. In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3' implements either one or two IMCs. For CPUs with one IMC, IMC1 is not used and should be ignored. Thus, do not create a second memory controller if the key HA1 is absent. [1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz [2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf Reported-and-tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `e2f747b1f4` ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller") Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-09-27 12:15:43 +02:00
Toshi Kani	301375e764	EDAC: Add owner check to the x86 platform drivers Change x86 EDAC platform drivers to verify the module owner at the beginning of their module init functions. This allows them to fail their init immediately when ghes_edac is enabled. Similar change can be made to other edac drivers if necessary. Also, remove ".c" from module names of pnp2_edac, sb_edac, and skx_edac. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170823225447.15608-6-toshi.kani@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-09-25 13:09:39 +02:00
Arvind Yadav	75f029c3a8	EDAC: Handle return value of kasprintf() kasprintf() can fail and we must check its return value. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: linux-edac@vger.kernel.org [ Merged into a single patch, small formatting fixups. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-09-21 12:18:44 +02:00
Qiuxu Zhuo	039d7af651	EDAC, sb_edac: Classify memory mirroring modes Basically, there are full memory mirroring and address range partial memory mirroring (supported by Haswell EX and Broadwell EX) modes. a) In full memory mirroring, the memory behind each memory controller is mirrored, i.e. the memory is split into two identical mirrors (primary and secondary), half of the memory is reserved for redundancy. b) In address range partial memory mirroring, the memory size (range) of primary and secondary behind each memory controller can be user defined by the TAD0 register. The rest of memory ranges defined by TAD1/TAD2/... in that memory controller are non-mirrored. For more detail on memory mirroring, see the following link written by Tony Luck: https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux Currently the sb_edac driver only supports address decoding in full memory mirroring and non-mirroring modes. In address range partial memory mirroring mode, it may fail to decode an address that falls in a non-mirroring area (the following was one of this kind of failed logs). mce: Uncorrected hardware memory error in user-access at 566d53a400 Memory failure: 0x566d53a: Killing einj_mem_uc:4647 due to hardware memory corruption Memory failure: 0x566d53a: recovery action for dirty LRU page: Recovered mce: [Hardware Error]: Machine check events logged EDAC sbridge MC1: HANDLING MCE MEMORY ERROR EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090 EDAC sbridge MC1: TSC 4b914aa5a99dab EDAC sbridge MC1: ADDR 566d53a400 EDAC sbridge MC1: MISC 1443a0c86 EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80 EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32) mce: [Hardware Error]: Machine check events logged Therefore, classify memory mirroring modes and make the address decoding in address range partial memory mode correct. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-08-02 05:40:11 +02:00
Borislav Petkov	c54182ec0e	EDAC: Get rid of mci->mod_ver It is a write-only variable so get rid of it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Thor Thayer <thor.thayer@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org	2017-07-17 13:42:48 +02:00
Qiuxu Zhuo	133e4455c9	EDAC, sb_edac: Avoid creating SOCK memory controller Xiaolong Ye reported the following failure on Broadwell D server: EDAC sbridge: Some needed devices are missing EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 EDAC sbridge: Couldn't find mci handler EDAC sbridge: Failed to register device with error -19. Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per socket) use the same PCI device IDs for IMC0 per socket, then they share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case, Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller and reports above error messages, since it has no IMC1 per socket. Avoid creating the nonexistent SOCK memory controller. Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com [ Massage. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-06-14 11:53:39 +02:00
Qiuxu Zhuo	d14e3a201f	EDAC, sb_edac: Bump driver version and do some cleanups Collapse 'case:' in *_mci_bind_devs() and update driver version from 1.1.1 to 1.1.2. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 15:00:36 +02:00
Qiuxu Zhuo	4d475dde79	EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present This is based on previous work by Patrick Geary, see Link. Additional cleanups ontop: - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro, now that TA0 and TA1 are unified. - Remove get_pdev_same_bus(), since in get_dimm_config() the variable "pvt->pci_ta" for KNL is also ready, we can simply use pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read MCMTR. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com [ Make __populate_dimms() return int. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:57:52 +02:00
Qiuxu Zhuo	3286d3eb90	EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4 We don't need this quirk anymore now that the EDAC memory controller representation matches the hardware. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com [ Commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:40:40 +02:00
Borislav Petkov	6696522957	EDAC, sb_edac: Carve out dimm-populating loop ... to slim down get_dimm_config(). No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:34 +02:00
Borislav Petkov	199389acd9	EDAC, sb_edac: Fix mod_name It is called "sb_edac.c" now. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:33 +02:00
Qiuxu Zhuo	e2f747b1f4	EDAC, sb_edac: Assign EDAC memory controller per h/w controller Tony pointed out: "currently the driver pretends there is one big 8-channel memory controller per socket instead of 2 4-channel controllers. This is fine with all memory controller populated with symmetrical DIMM configurations, but runs into difficulties on asymmetrical setups". Restructure the driver to assign an EDAC memory controller to each real h/w memory controller to resolve the issue. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com [ Break some lines at convenient points. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:21 +02:00
Tony Luck	7fd562b75d	EDAC, sb_edac: Don't use "Socket#" in the memory controller name EDAC assigns logical memory controller numbers in the order that we find memory controllers, which depends on which PCI bus they are on. Some systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on socket3. All this is made more confusing for users because we use the string "Socket" while generating names for memory controllers, but the number that we attach there is the memory controller number. E.g. EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT) Change the names to say "SrcID#%d" (where the number we use is read from the h/w associated with the memory controller instead of some logical number internal to the EDAC driver). New message: EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT) Reported-by: Andrey Korolyov <andrey@xdel.ru> Reported-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:47:11 +02:00
Qiuxu Zhuo	00cf50d90a	EDAC, sb_edac: Classify PCI-IDs by topology Each of the PCI device IDs belongs to a CPU socket, or to one of the integrated memory controllers. Provide an enum to specify the domain of each, and distinguish the resource number in each domain: the number of the PCI device IDs per integrated memory controller/socket, and the number of integrated memory controllers per socket. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com [ Realign pci_dev_descr_knl members. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:19:25 +02:00
Borislav Petkov	bffc7dece9	EDAC: Rename report status accessors Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:15:02 +02:00
Linus Torvalds	60c906bab1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y	2017-02-20 12:47:44 -08:00
Borislav Petkov	9026cc82b6	x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:57 +01:00
Nicolas Iooss	127c1225bf	EDAC, sb_edac: Get rid of ->show_interleave_mode() Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-23 11:39:48 +01:00
Mauro Carvalho Chehab	78d88e8a3d	edac: rename edac_core.h to edac_mc.h Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Piotr Luc	9a9260ca92	EDAC, sb_edac: Add Knights Mill support Add Knights Mill (KNM) to the list of CPU models supported by sb_edac. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:37:44 +02:00
Dave Hansen	20f4d69243	EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them We now have symbolic names for a bunch of Intel CPU models via asm/intel-family.h. The original conversion missed the EDAC drivers. Convert them. Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com [ Remove comment, macro name is descriptive enough. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:32:40 +02:00
Linus Torvalds	19fe416532	* Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) * Split the memory controller part out of mpc85xx and share it with a * new Freescale ARM Layerscape driver (York Sun) * amd64_edac fixes (Yazen Ghannam) * Misc cleanups, refactoring and fixes all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6 dmDAtQbKUUwiltB8QzQd =pcI8 -----END PGP SIGNATURE----- Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "A lot of movement in the EDAC tree this time around, coarse summary below: - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) - split the memory controller part out of mpc85xx and share it with a new Freescale ARM Layerscape driver (York Sun) - amd64_edac fixes (Yazen Ghannam) - misc cleanups, refactoring and fixes all over the place" * tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits) EDAC, altera: Add IRQ Flags to disable IRQ while handling EDAC, altera: Correct EDAC IRQ error message EDAC, amd64: Autoload module using x86_cpu_id EDAC, sb_edac: Remove NULL pointer check on array pci_tad EDAC: Remove NO_IRQ from powerpc-only drivers EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe() EDAC, fsl_ddr: Add entry to MAINTAINERS EDAC: Move Doug Thompson to CREDITS EDAC, I3000: Orphan driver EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul() EDAC, layerscape: Add Layerscape EDAC support EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed EDAC, fsl_ddr: Add support for little endian EDAC, fsl_ddr: Add missing DDR DRAM types EDAC, fsl_ddr: Rename macros and names EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx EDAC, mpc85xx: Replace printk() with pr_* format EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1 EDAC, altera: Rename MC trigger to common name EDAC, altera: Rename device trigger to common name ...	2016-10-04 12:06:26 -07:00
Colin Ian King	c7c35407cd	EDAC, sb_edac: Remove NULL pointer check on array pci_tad pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and hence cannot be NULL, so the NULL pointer check on pci_tad is redundant. Remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-12 20:15:43 +02:00
Lukasz Odzioba	c5b48fa7e2	EDAC, sb_edac: Fix channel reporting on Knights Landing On Intel Xeon Phi Knights Landing processor family the channels of the memory controller have untypical arrangement - MC0 is mapped to CH3,4,5 and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the channel name incorrectly. We missed this change earlier, so the code already contains similar comment, but the translation function is incorrect. Without this patch: errors in DIMM_A and DIMM_D were reported in DIMM_D errors in DIMM_B and DIMM_E were reported in DIMM_E errors in DIMM_C and DIMM_F were reported in DIMM_F Correct this. Hubert Chrzaniuk: - rebased to 4.8 - comments and code cleanup Fixes: `d0cdf90031` ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support") Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Cc: lukasz.odzioba@intel.com Cc: mchehab@kernel.org Cc: <stable@vger.kernel.org> # v4.5.. Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> [ Boris: Simplify a bit by removing char mc. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:52:08 +02:00
Tony Luck	0ba169ac36	EDAC, sb_edac: Fix Knights Landing In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") I broke Knights Landing because I failed to notice that it called a wrapper macro "sbridge_get_all_devices_knl" instead of "sbridge_get_all_devices" like all the other types. Now that we include the processor type in the pci_id_table structure we can skip the wrappers and just have the sbridge_get_all_devices() check the type to decide whether to allow duplicate devices and controllers to have registers spread across buses. Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-16 06:11:59 +09:00
Tony Luck	665f05e0b8	EDAC, sb_edac: Readd accidentally dropped Broadwell-D support In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") we switched from using PCI ids to determine which platform we are running on to using CPU model instead. I forgot that Broadwell-DE has its own distinct model number different from Broadwell-EP or -EX. Fixing this isn't just adding a line to the array of cpuids - the exising code assumed a 1:1 mapping between entries in that array and the "enum type" values. Added the type to pci_id_table structure to remove this dependency and allows two Broadwell cpu models. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 17:28:21 +02:00
Tony Luck	c7103f650a	EDAC, sb_edac: Fix rank lookup on Broadwell Broadwell made a small change to the rank target register moving the target rank ID field up from bits 16:19 to bits 20:23. Also found that the offset field grew by one bit in the IVY_BRIDGE to HASWELL transition, so fix the RIR_OFFSET() macro too. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org # v3.19+ Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 10:05:49 +02:00
Linus Torvalds	1cc3880a3c	* Altera Arria10 L2 cache and On-Chip RAM ECC handling. (Thor Thayer) * Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac. (Tony Luck) * Do not register sb_edac with pci_register_driver(). (Tony Luck) * Add support for Skylake to ie31200_edac. (Jason Baron) * Do not register amd64_edac with pci_register_driver(). (Borislav Petkov) + the usual round of cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXOaHaAAoJEBLB8Bhh3lVKRdoP/jDewaX4GAxgcKaK9Kx7VjMe i42E/7syh5iLoyFgZ1hUjvqiQHN4RyWloUYyNHbNxGS9uXCMXIUoJQd2FjOY442s oeEaoMCfZsJLzKQti3fUPK9GugZ57BSZxfn6NlJPpyG4FxSirOcZFCxGaNuyqqrh 0M+gFvbWfZcVDLE0FI5CUYZWRxsk//+pLM4ZkQZFA8Eo1tGVc3r0zEEAlL/uEMDW P0ATvRHNUeib9YQGTdSD7cNUpRX4SX+T8lDUiaVm+tHL5ES3b4rayMBdbVMrahfc Gnxu9GtO5gWXEY6QDDpOx+VkbfFmDFodx833psJ3MVD8evEFHHdinkgDaptLrrV6 92ZDKR5s3W6tKXkcmGuExrtc17UgjLcRZCebXbv+5FJlVrslzvQ9ESOTRyiZBrCD ZpFi2TZhpYU1uEWuoBZCbWNXW2pcSt7/bQ9bYUvfrvNfgPzPnblubJuVKRcfY0WB x2vj0PNnckpoPRvskV7GEe0Y/JISzAxBQUK6XO+GJgMgz5M+23SEaSVU5yeyf26e x/yD5yImQGN84AxfRMMvbR2JvpVLN3vdFWtWigneht160erVA/Qgw5Gcrpw53tZr zPD3fGaetrNgS8BvJAy/c6xHV1j6A1RGCGThy8Ivqep9WD90a5JhR1NRjZp6m8sf aB0A1zTP7e+hRFkZGahO =ud2P -----END PGP SIGNATURE----- Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "It was pretty busy in EDAC land this time: - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer) - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac (Tony Luck) - Do not register sb_edac with pci_register_driver() (Tony Luck) - Add support for Skylake to ie31200_edac (Jason Baron) - Do not register amd64_edac with pci_register_driver() (Borislav Petkov) ... plus the usual round of cleanups and fixes all over the place" * tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits) EDAC, amd64_edac: Drop pci_register_driver() use EDAC, ie31200_edac: Add Skylake support EDAC, sb_edac: Use cpu family/model in driver detection EDAC, i7core: Remove double buffering of error records EDAC, amd64_edac: Issue driver banner only on success ARM: socfpga: Initialize Arria10 OCRAM ECC on startup EDAC: Increment correct counter in edac_inc_ue_error() EDAC, sb_edac: Remove double buffering of error records EDAC: Fix used after kfree() error in edac_unregister_sysfs() EDAC, altera: Avoid unused function warnings EDAC, altera: Remove useless casts ARM: socfpga: Enable Arria10 OCRAM ECC on startup EDAC, altera: Add Arria10 OCRAM ECC support Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding EDAC, altera: Make OCRAM ECC dependency check generic EDAC, altera: Add register offset for ECC Enable EDAC, altera: Extract error inject operations to a struct fops ARM: socfpga: Enable Arria10 L2 cache ECC on startup EDAC, altera: Add Arria10 L2 Cache ECC handling Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding ...	2016-05-16 18:44:39 -07:00
Tony Luck	2c1ea4c700	EDAC, sb_edac: Use cpu family/model in driver detection Instead of picking a random PCI ID from the dozen or so we need to access, just use x86_match_cpu() to pick based on CPU model number. The choosing of PCI devices has been problematic in the past, see `11249e7399` ("sb_edac: Fix detection on SNB machines") which fixed problems introduced by `d0585cd815` ("sb_edac: Claim a different PCI device"). This is especially ugly if future hardware might not even have EDAC-relevant registers in PCI config space and we would still be required to choose some "random" PCI devices to scan for just so our driver loads. Is this cleaner/clearer? It deletes much more code than it adds. Only tested on Broadwell. The driver loads/unloads and loads again. Still decodes errors too. Signed-off-by: Tony Luck <tony.luck@intel.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-05-02 19:44:43 +02:00
Tony Luck	c4fc1956fa	EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-29 15:43:10 +02:00
Tony Luck	ad08c4e974	EDAC, sb_edac: Remove double buffering of error records In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in `f29a7aff4b` ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as sbridge_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: patrickg@supermicro.com Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 14:02:02 +02:00
Tony Luck	ea5dfb5fae	x86 EDAC, sb_edac.c: Take account of channel hashing when needed Haswell and Broadwell can be configured to hash the channel interleave function using bits [27:12] of the physical address. On those processor models we must check to see if hashing is enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and act accordingly. Based on a patch by patrickg <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Tony Luck	ff15e95c82	x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address In commit: `eb1af3b71f` ("Fix computation of channel address") I switched the "sck_way" variable from holding the log2 value read from the h/w to instead be the actual number. Unfortunately it is needed in log2 form when used to shift the address. Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `eb1af3b71f` ("Fix computation of channel address") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Linus Torvalds	d88bfe1d68	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call	2016-03-14 18:43:51 -07:00
Luck, Tony	eb1af3b71f	EDAC/sb_edac: Fix computation of channel address Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 18:31:55 +01:00
Hubert Chrzaniuk	83bdaad4d9	EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi Correct a typo introduced by `d0cdf90031` ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-07 19:07:40 +01:00

1 2 3

137 Commits