linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Borislav Petkov	bbb013b920	amd64_edac: Fix bogus sysfs file permissions Fix yet another issue caught by `8f46baaa7e` ("base: core: WARN() about bogus permissions on device attributes"). Signed-off-by: Borislav Petkov <bp@suse.de>	2013-05-21 09:13:11 +02:00
Linus Torvalds	7462543abb	Two small EDAC fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRi31iAAoJEBLB8Bhh3lVKiaQQAIZ4R6/9QSt7KbP5VLN1ksv2 qV2VOGPuxkhGoO/vfUJ0RKFj0NgnQFYizKl+vjZL/ahQy9yqelWInoc/TO7tHGol rxlyCoxmS5kuXwbs8CYhC5OQBvuFSTfk/8Ecu4lZa/PfPKs4IXaq+jzjqfk6/QvC jjfJ/4TIbyHLR48tbjoiJtjMQlsZOZ+R9o6Ic0Qw49GlqbIdZ15KzZVnuLRxJWby 4S2hQnBWkMfc/RYXfliUs6TsXd55qyd88La6PHeY/BJRxQoaBvsPWLEd6Pk6RNWd RfpoEHCdUfnFBQMvO5C50/Dp6iKXJ4xqh9qrPg0LVlTtGbPL8gdDfK5QEElhEiuw vM9dzfNXFvE4Pavx32WSm7ql2LD9Qf8ZdLPxMlvjNGW14+oQexHmDI6sdU5iVYxb toct5jF7MwEy6GQQcwmp84V8y5FU+MyEtT+w9OKLpay/9Bqcq0I/Mv6LJWH10IDC bpfkaUm10C1aF2/vP6BB48NGUUElZIxXg9VapzX+AuRs6kN7LLOmM38G371HPEbV wcsRCU1znxo1Yjehen6oI9I2AQ4NuMcHplK2FiD0I1AzlRQ/BM6TeHejy84SJMgf QEQkjwh8DAClzcKJFlt9uoIglCLLjY/WDVSLgvvhv+/kQIXrLV7zCAhGR2CE27ci XQsmruJYlPt0A0xkOh// =uyan -----END PGP SIGNATURE----- Merge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull two small EDAC fixes from Borislav Petkov. * tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't give write permission to read-only files EDAC, mc_sysfs.c: Fix string array pointer types	2013-05-09 10:11:08 -07:00
Srivatsa S. Bhat	c8c64d165c	EDAC: Don't give write permission to read-only files I get the following warning on boot: ------------[ cut here ]------------ WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0() Hardware name: -[8737R2A]- Write permission without 'store' ... </snip> Drilling down, this is related to dynamic channel ce_count attribute files sporting a S_IWUSR mode without a ->store() function. Looking around, it appears that they aren't supposed to have a ->store() function. So remove the bogus write permission to get rid of the warning. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: <stable@vger.kernel.org> # 3.[89] [ shorten commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-05-09 12:40:45 +02:00
Linus Torvalds	e2823299cd	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull edac fixes from Mauro Carvalho Chehab: "Two edac fixes: - i7300_edac currently reports a wrong number of DIMMs when the memory controller is in single channel mode - on some Sandy Bridge machines, the EDAC driver bails out as one of the PCI IDs used by the driver is hidden by BIOS. As the driver uses it only to detect the type of memory, make it optional at the driver" * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: sb_edac.c should not require prescence of IMC_DDRIO device i7300_edac: Fix memory detection in single mode	2013-04-30 10:00:49 -07:00
Luck, Tony	de4772c621	edac: sb_edac.c should not require prescence of IMC_DDRIO device The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab	33ad41263d	i7300_edac: Fix memory detection in single mode When the machine is on single mode, only branch 0 channel 0 is valid. However, the code is not honouring it: [ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode ... [ 1952.639351] EDAC DEBUG: i7300_init_csrows: AMB-present CH0 = 0x1: [ 1952.639353] EDAC DEBUG: i7300_init_csrows: AMB-present CH1 = 0x0: [ 1952.639355] EDAC DEBUG: i7300_init_csrows: AMB-present CH2 = 0x0: [ 1952.639358] EDAC DEBUG: i7300_init_csrows: AMB-present CH3 = 0x0: ... [ 1952.639360] EDAC DEBUG: decode_mtr: MTR0 CH0: DIMMs are Present (mtr) [ 1952.639362] EDAC DEBUG: decode_mtr: WIDTH: x8 [ 1952.639363] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled [ 1952.639364] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s) [ 1952.639366] EDAC DEBUG: decode_mtr: NUMRANK: single [ 1952.639367] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows [ 1952.639368] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns [ 1952.639370] EDAC DEBUG: decode_mtr: SIZE: 512 MB [ 1952.639371] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code [ 1952.639373] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode [ 1952.639374] EDAC DEBUG: decode_mtr: MTR0 CH1: DIMMs are Present (mtr) [ 1952.639376] EDAC DEBUG: decode_mtr: WIDTH: x8 [ 1952.639377] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled [ 1952.639379] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s) [ 1952.639380] EDAC DEBUG: decode_mtr: NUMRANK: single [ 1952.639381] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows [ 1952.639383] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns [ 1952.639384] EDAC DEBUG: decode_mtr: SIZE: 512 MB [ 1952.639385] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code [ 1952.639387] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode ... [ 1952.639449] EDAC DEBUG: print_dimm_size: channel 0 \| channel 1 \| channel 2 \| channel 3 \| [ 1952.639451] EDAC DEBUG: print_dimm_size: ------------------------------------------------------------- [ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0 512 MB \| 512 MB \| 0 MB \| 0 MB \| [ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639470] EDAC DEBUG: print_dimm_size: ------------------------------------------------------------- Instead of detecting a single memory at channel 0, it is showing twice the memory. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:39 -03:00
Aravind Gopalakrishnan	94c1acf2c8	amd64_edac: Add Family 16h support Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-04-19 12:46:50 +02:00
Borislav Petkov	8b7719e08a	EDAC, mc_sysfs.c: Fix string array pointer types Those should be const ptr to a const string, fix them. Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-25 15:44:25 +01:00
Mauro Carvalho Chehab	9713faecff	EDAC: Merge mci.mem_is_per_rank with mci.csbased Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab	1eef128254	amd64_edac: Correct DIMM sizes We were filling the csrow size with a wrong value. `16a528ee39` ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-16 06:32:02 +01:00
Stephen Hemminger	fbe2d3616c	EDAC: Make sysfs functions static Fixes lots of sparse warnings here. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-05 11:33:57 +01:00
Linus Torvalds	ad6c2c2eb3	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab: "For: - Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac); - error injection support for i5100, when EDAC debug is enabled; - fix edac when it is loaded builtin (early init for the subsystem); - a "Firmware First" EDAC driver, allowing ghes to report errors via EDAC (ghes-edac). With regards to ghes-edac, this fixes a longstanding BZ at Red Hat that happens with Nehalem and Sandy Bridge CPUs: when both GHES and i7core_edac or sb_edac are running, the error reports are unpredictable, as both BIOS and OS race to access the registers. With ghes-edac, the EDAC core will refuse to register any other concurrent memory error driver. This patchset moves the ghes struct definitions to a separate header file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to register/unregister and to report errors via ghes-edac. Those changes were acked by ghes driver maintainer (Huang)." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits) i5100_edac: convert to use simple_open() ghes_edac: fix to use list_for_each_entry_safe() when delete list items ghes_edac: Fix RAS tracing ghes_edac: Make it compliant with UEFI spec 2.3.1 ghes_edac: Improve driver's printk messages ghes_edac: Don't credit the same memory dimm twice ghes_edac: do a better job of filling EDAC DIMM info ghes_edac: add support for reporting errors via EDAC ghes_edac: Register at EDAC core the BIOS report ghes: add the needed hooks for EDAC error report ghes: move structures/enum to a header file edac: add support for error type "Info" edac: add support for raw error reports edac: reduce stack pressure by using a pre-allocated buffer edac: lock module owner to avoid error report conflicts edac: remove proc_name from mci structure edac: add a new memory layer type edac: initialize the core earlier edac: better report error conditions in debug mode i5100_edac: Remove two checkpatch warnings ...	2013-02-28 20:42:33 -08:00
Wei Yongjun	b0769891ba	i5100_edac: convert to use simple_open() This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-26 10:06:18 -03:00
Wei Yongjun	5dae92a718	ghes_edac: fix to use list_for_each_entry_safe() when delete list items Since we will remove items off the list using list_del() we need to use a safe version of the list_for_each_entry() macro aptly named list_for_each_entry_safe(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-26 10:05:35 -03:00
Mauro Carvalho Chehab	8ae8f50ad8	ghes_edac: Fix RAS tracing With the current version of CPER, there's no way to associate an error with the memory error. So, the error location in EDAC layers is unused. As CPER has its own idea about memory architectural layers, just output whatever is there inside the driver's detail at the RAS tracepoint. The EDAC location keeps untouched, in the case that, in some future, we could actually map the error into the dimm labels. Now, the error message: [ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 72.396627] {1}[Hardware Error]: APEI generic hardware error status [ 72.396628] {1}[Hardware Error]: severity: 2, corrected [ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected [ 72.396632] {1}[Hardware Error]: flags: 0x01 [ 72.396634] {1}[Hardware Error]: primary [ 72.396635] {1}[Hardware Error]: section_type: memory error [ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400 [ 72.396638] {1}[Hardware Error]: node: 3 [ 72.396639] {1}[Hardware Error]: card: 0 [ 72.396640] {1}[Hardware Error]: module: 0 [ 72.396641] {1}[Hardware Error]: device: 0 [ 72.396643] {1}[Hardware Error]: error_type: 18, unknown [ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Is properly represented on the trace event: kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 sockets E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:17 -03:00
Mauro Carvalho Chehab	689c9cd812	ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:16 -03:00
Mauro Carvalho Chehab	d2a6856614	ghes_edac: Improve driver's printk messages Provide a better infrastructure for printk's inside the driver: - use edac_dbg() for debug messages; - standardize the usage of pr_info(); - provide warning about the risk of relying on this driver. While here, changes the size of a fake memory to 1 page. This is as good or as bad as 1000 pages, but it is easier for userspace to detect, as I don't expect that any machine implementing GHES would provide just 1 page available ;) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Conflicts: drivers/edac/ghes_edac.c	2013-02-25 19:42:15 -03:00
Mauro Carvalho Chehab	5ee726db52	ghes_edac: Don't credit the same memory dimm twice On my tests on a 4xE5-4650 CPU's system, the GHES EDAC driver is called twice. As the SMBIOS DMI enumeration call will seek for the entire DIMM sockets in the system, on this machine, equipped with 128 GB of RAM, the memory is displayed twice: +-----------------------+ \| mc0 \| mc1 \| ----------+-----------------------+ memory45: \| 8192 MB \| 8192 MB \| memory44: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory43: \| 0 MB \| 0 MB \| memory42: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory41: \| 0 MB \| 0 MB \| memory40: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory39: \| 8192 MB \| 8192 MB \| memory38: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory37: \| 0 MB \| 0 MB \| memory36: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory35: \| 0 MB \| 0 MB \| memory34: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory33: \| 8192 MB \| 8192 MB \| memory32: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory31: \| 0 MB \| 0 MB \| memory30: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory29: \| 0 MB \| 0 MB \| memory28: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory27: \| 8192 MB \| 8192 MB \| memory26: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory25: \| 0 MB \| 0 MB \| memory24: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory23: \| 0 MB \| 0 MB \| memory22: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory21: \| 8192 MB \| 8192 MB \| memory20: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory19: \| 0 MB \| 0 MB \| memory18: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory17: \| 0 MB \| 0 MB \| memory16: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory15: \| 8192 MB \| 8192 MB \| memory14: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory13: \| 0 MB \| 0 MB \| memory12: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory11: \| 0 MB \| 0 MB \| memory10: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory9: \| 8192 MB \| 8192 MB \| memory8: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory7: \| 0 MB \| 0 MB \| memory6: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory5: \| 0 MB \| 0 MB \| memory4: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory3: \| 8192 MB \| 8192 MB \| memory2: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory1: \| 0 MB \| 0 MB \| memory0: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ Total sum of 256 GB. As there's no reliable way to credit DIMMS to the right memory controller, just put everything on memory controller 0 (with should always exist). Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:14 -03:00
Mauro Carvalho Chehab	32fa1f53c2	ghes_edac: do a better job of filling EDAC DIMM info Instead of just faking a random value for the DIMM data, get the information that it is available via DMI table. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab	f04c62a703	ghes_edac: add support for reporting errors via EDAC Now that the EDAC core is capable of just forward the errors via the userspace API, add a report mechanism for the GHES errors. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab	77c5f5d2f2	ghes_edac: Register at EDAC core the BIOS report Register GHES at EDAC MC core, in order to avoid other drivers to also handle errors and mangle with error data. The edac core will warrant that just one driver will be used, so the first one to register (BIOS first) will be the one that will be reporting the hardware errors. For now, the EDAC driver does nothing but to register at the EDAC core, preventing the hardware-driven mechanism to interfere with GHES. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:12 -03:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Mauro Carvalho Chehab	e7e248304c	edac: add support for raw error reports That allows APEI GHES driver to report errors directly, using the EDAC error report API. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 14:16:03 -03:00
Mauro Carvalho Chehab	c7ef764554	edac: reduce stack pressure by using a pre-allocated buffer The number of variables at the stack is too big. Reduces the stack usage by using a pre-allocated error buffer. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 13:48:45 -03:00
Mauro Carvalho Chehab	80cc7d87d5	edac: lock module owner to avoid error report conflicts APEI GHES and i7core_edac/sb_edac currently can be loaded at the same time, but those are Highlander modules: "There can be only one". There are two reasons for that: 1) Each driver assumes that it is the only one registering at the EDAC core, as it is driver's responsibility to number the memory controllers, and all of them start from 0; 2) If BIOS is handling the memory errors, the OS can't also be doing it, as one will mangle with the other. So, we need to add an module owner's lock at the EDAC core, in order to avoid having two different modules handling memory errors at the same time. The best way for doing this lock seems to use the driver's name, as this is unique, and won't require changes on every driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:38 -03:00
Mauro Carvalho Chehab	c66b5a79a9	edac: add a new memory layer type There are some cases where the memory controller layout is completely hidden. This is the case of firmware-driven error code, like the one provided by GHES. Add a new layer to be used on such memory error report mechanisms. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:37 -03:00
Mauro Carvalho Chehab	4ab19b06ac	edac: initialize the core earlier In order for it to work with it builtin, the EDAC core should be initialized earlier, otherwise the ghes_edac driver initializes before edac_mc_sysfs_init() being called: ... [ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes ... [ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes [ 6.519495] EDAC MC: Ver: 3.0.0 [ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created The net result is that no EDAC sysfs nodes will appear. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:36 -03:00
Mauro Carvalho Chehab	3d958823e2	edac: better report error conditions in debug mode It is hard to find what's wrong without a proper error report. Improve it, in debug mode. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab	59b9796d1e	i5100_edac: Remove two checkpatch warnings The last changeset introduced a few checkpatch warnings: WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required 261: FILE: drivers/edac/i5100_edac.c:1207: + if (priv->debugfs) + debugfs_remove_recursive(priv->debugfs); WARNING: debugfs_remove(NULL) is safe this check is probably not required 290: FILE: drivers/edac/i5100_edac.c:1250: + if (i5100_debugfs) + debugfs_remove(i5100_debugfs); Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:34 -03:00
Niklas Söderlund	9cbc6d38f2	i5100_edac: connect fault injection to debugfs node Create a debugfs direcotry i5100_edac/mcX for each memory controller and add nodes to control how fault injection is preformed. After configuring an injection using inject_channel, inject_deviceptr1, inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel trigger the injection by writing anything to inject_enable. Example of a CE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Example of UE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2 echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1 echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Sometimes it is needed to enable the injection more then once (echo to the inject_enable node) for the injection to happen, I am not sure why. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:34 -03:00
Niklas Söderlund	53ceafd6a2	i5100_edac: add fault injection code Add fault injection based on information datasheet for i5100, see 1. In addition to the i5100 datasheet some missing information on injection functions where found through experimentation and the i7300 datasheet, see 2. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf [2] Intel 7300 Chipset MemoryController Hub (MCH) Doc.Nr: 318082 http://www.intel.com/assets/pdf/datasheet/318082.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:33 -03:00
Niklas Söderlund	52608ba205	i5100_edac: probe for device 19 function 0 Probe and store the device handle for the device 19 function 0 during driver initialization. The device is used during fault injection. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:32 -03:00
Mauro Carvalho Chehab	e7100478fa	edac: only create sdram_scrub_rate where supported Currently, sdram_scrub_rate sysfs node is created even if the device doesn't support get/set the scub rate. Change the logic to only create this device node when the operation is supported. Reported-by: Felipe Balbi <balbi@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab	61734e1835	i3200_edac: Fix the logic that detects filled memories After running a series of tests on an HP DL320, filled with different memory sizes, it was noticed that, when filled with just one DIMM on such hardware, the driver wrongly detects twice the memory, and thinks that both channels 0 and 1 are filled. It seems to be partially caused by the BIOS and partially by the driver. The i3200_edac current logic would be working fine if the BIOS were disabling the unused second channel when just one DIMM is connected, in order to do power-saving, as recommended on this chipset's datasheet. However, the BIOS on this particular machine doesn't do it: [ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode [ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled So, the driver were assuming that 2 channels are enabled (well, they are, but the second is unused). Combined with that, I found two issues at the logic that creates the EDAC data, that were failing when the two channels are not equally filled (AFAICT, that happens only when just 1 DIMM is plugged). The first one is that a 0 at DRB means that nothing is filled. The driver's logic, however, do some calculation with that. The second one is that the logic that fills the DIMM data currently assumes that both channels are equally filled. I tested the system already with the current configuration and my patch and it is now working fine. So, for a 2R single DIMM 2Gb memory at dimm slot 01 (channel 0), it is now displaying: [ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0 [ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0 [ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0 [ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0 ... [ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb [ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb and the corresponding sysfs nodes are now properly filled. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab	5f466cb025	i3200_edac: Add more debug to the driver Currently, it is not possible to know, when debug is enabled, if the driver is using 2 DIMMS per channel mode or not. It is not possible to know the values of the drbs registers, used to identify the memory rank sizes. Add debug for both, as it helps to track issues on the driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:22 -03:00
Linus Torvalds	55529fa576	EDAC updates for 3.9 Only one: AMD F16h MCE decoding enablement from Jacob Shin. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRI2t7AAoJEBLB8Bhh3lVKmLAP/RbYNVVvLZZxXqB6heCkX7D3 Avhi4GtuHqjY+79nqBoJEH9HjgzBzL6ffsJGPF3gtXrWybi6nzZkqog5QGRQFfGS gIoldbfP9MMF/RrMHbUF5x9Ha9EcdudDgYE3eqCq6KUfOMlUrBRECaU5YRrB+FKR qsuQNmF6J3uMPYwO9oGhTpA/vPwwapFaKgm+KND8q5h5JcpJrRpooKNQc1DheW+x nfYFpmSBW8eamVwV5DTjqKVhKeG3gB/3i6uSTpLvh4i0ZmV2vQ+drwC3aU0Eh0Q4 wnslEpQENjODsvbZH6b0j2WTDgBGKEpALRkMUDTnnqvYrR96cV02MWUfp6cbKYCv +d/iPt3hEBCyDTDzfP66N+1b+Wqls4Dx8lhhqLdPhsIpY2Qu8OQ8/mjyIIs12qK5 g9w3lsfy3shWmdsK/Ehd0IhNt1bzNV+63CINA2isC2QAp0Eqecj3jKrXor+MTPu1 uRY3wM+8CrdeFNNhhhsMQYIb4+DFGLgMwY5mV6z8ceFJAjxvyUxjObnSMopjBKog 9+JcyEHCADbpHsRup/zRIoGtSs+44sQ6l8l7pq6d4K4UfkG7Biq9lcxThID7rImn Rl6MOJVmJuM5n9JVIU4/bmhjfuTcI+gdshexDEH2HnqaXxd7ceIbnrFWFaBZEO5t t3WasBuIb30g1XCQmNT6 =LbcU -----END PGP SIGNATURE----- Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "Mostly AMD's side of EDAC. It is basically a new family enablement stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is trivial cleanups." * tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: mpc85xx_edac: Fix typo EDAC, MCE, AMD: Remove unneeded exports EDAC, MCE, AMD: Add MCE decoding support for Family 16h EDAC, MCE, AMD: Make MC2 decoding per-family amd64_edac: Remove dead code	2013-02-20 11:28:50 -08:00
Mauro Carvalho Chehab	1339730e73	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into next Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...	2013-02-20 15:45:52 -03:00
Linus Torvalds	f98982ce80	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: - Support for the Technologic Systems TS-5500 platform, by Vivien Didelot - Improved NUMA support on AMD systems: Add support for federated systems where multiple memory controllers can exist and see each other over multiple PCI domains. This basically means that AMD node ids can be more than 8 now and the code handling this is taught to incorporate PCI domain into those IDs. - Support for the Goldfish virtual Android emulator, by Jun Nakajima, Intel, Google, et al. - Misc fixlets. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add TS-5500 platform support x86/srat: Simplify memory affinity init error handling x86/apb/timer: Remove unnecessary "if" goldfish: platform device for x86 amd64_edac: Fix type usage in NB IDs and memory ranges amd64_edac: Fix PCI function lookup x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id x86, AMD, NB: Add multi-domain support	2013-02-19 20:11:07 -08:00
Baruch Siach	e7d2c215e5	mpc85xx_edac: Fix typo Correct typos. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Cc: Dave Jiang <djiang@mvista.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-02-10 13:28:41 +01:00
Joe Perches	d3d09e1820	EDAC: Fix kcalloc argument order First number, then size. Signed-off-by: Joe Perches <joe@perches.com> Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-01-30 11:38:25 +01:00
Dan Carpenter	8024c4c0b1	EDAC: Test correct variable in ->store function We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>	2013-01-30 11:38:11 +01:00
Borislav Petkov	0f08669e86	EDAC, MCE, AMD: Remove unneeded exports Initially, those strings describing different parts of an MCE message were shared with amd64_edac and were therefore exported to modules. However, all except pp_msgs are used only in one place right now so hide them and make them static. No functionality change. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:40:03 +01:00
Jacob Shin	980eec8b20	EDAC, MCE, AMD: Add MCE decoding support for Family 16h Add MCE decoding logic for AMD Family 16h processors. Boris: - drop unneeded uu_msgs export - exit early in cat_mc1_mce and save us an indentation level Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:58 +01:00
Jacob Shin	4a73d3de63	EDAC, MCE, AMD: Make MC2 decoding per-family Currently only AMD Family 15h processors have special handling for MC2 errors. Since upcoming Family 16h will also need unique handling, let's make MC2 handling part of amd_decoder_ops. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:54 +01:00
Borislav Petkov	acc7fcb400	amd64_edac: Remove dead code `5e2af0c09e` ("edac: Don't initialize csrow's first_page & friends when not needed") removed useless initialization of variables but left in the functions which did that. They're unused now so drop them. Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:41 +01:00
Kees Cook	053417a53e	drivers/edac: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 12:11:26 -08:00
Daniel J Blueman	c7e5301a1b	amd64_edac: Fix type usage in NB IDs and memory ranges Use appropriate types for northbridge IDs and memory ranges. Mark immutable data const and keep within compilation unit on related structures. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com [Boris: Drop arg change to node_to_amd_nb] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:18:00 +01:00
Daniel J Blueman	e2c0bffea2	amd64_edac: Fix PCI function lookup Fix locating sibling memory controller PCI functions by using the correct PCI domain and use a northbridge descriptor only if found. We need to at least warn if it wasn't found so that it gets fixed and we don't go off with wrong results. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com [Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:59 +01:00
Daniel J Blueman	8b84c8df38	x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id Change amd_get_nb_id to return u16 to support >255 memory controllers, and related consistency fixes. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00
Daniel J Blueman	772c3ff385	x86, AMD, NB: Add multi-domain support Fix get_node_id to match northbridge IDs from the array of detected ones, allowing multi-server support such as with Numascale's NumaConnect, renaming to 'amd_get_node_id' for consistency. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com [Boris: shorten lines to fit 80 cols] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00

1 2 3 4 5 ...

900 Commits