Commit Graph

919 Commits

Author SHA1 Message Date
Linus Torvalds 4de9ad9bc0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull Tile arch updates from Chris Metcalf:
 "These changes bring in a bunch of new functionality that has been
  maintained internally at Tilera over the last year, plus other stray
  bits of work that I've taken into the tile tree from other folks.

  The changes include some PCI root complex work, interrupt-driven
  console support, support for performing fast-path unaligned data
  fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
  vDSO support for gettimeofday(), a serial driver for the tilegx
  on-chip UART, KGDB support, more optimized string routines, support
  for ftrace and kprobes, improved ASLR, and many bug fixes.

  We also remove support for the old TILE64 chip, which is no longer
  buildable"

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
  tile: refresh tile defconfig files
  tile: rework <asm/cmpxchg.h>
  tile PCI RC: make default consistent DMA mask 32-bit
  tile: add null check for kzalloc in tile/kernel/setup.c
  tile: make __write_once a synonym for __read_mostly
  tile: remove support for TILE64
  tile: use asm-generic/bitops/builtin-*.h
  tile: eliminate no-op "noatomichash" boot argument
  tile: use standard tile_bundle_bits type in traps.c
  tile: simplify code referencing hypervisor API addresses
  tile: change <asm/system.h> to <asm/switch_to.h> in comments
  tile: mark pcibios_init() as __init
  tile: check for correct compiler earlier in asm-offsets.c
  tile: use standard 'generic-y' model for <asm/hw_irq.h>
  tile: use asm-generic version of <asm/local64.h>
  tile PCI RC: add comment about "PCI hole" problem
  tile: remove DEBUG_EXTRA_FLAGS kernel config option
  tile: add virt_to_kpte() API and clean up and document behavior
  tile: support FRAME_POINTER
  tile: support reporting Tilera hypervisor statistics
  ...
2013-09-06 11:14:33 -07:00
Aravind Gopalakrishnan 4fc06b3171 amd64_edac: Fix incorrect wraparounds
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 15:00:22 +02:00
Borislav Petkov 3f0aba4fc0 amd64_edac: Correct erratum 505 range
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.

Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-27 14:25:08 +02:00
Ingo Molnar c874b6ba55 An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSC6mSAAoJEBLB8Bhh3lVKhKYQALt3lklah1BpZIqyTiLqtizN
 YVF/lSubpcHB2BfnZUzaSInE+17uo21DJ44tQ56O0lWDsxLqadJzTDiJ2BG5Ol3u
 JOIWti/Yl8YoCQQcir6QksBAUkR26iCpelO8kO6J/JWGEekVgl3Oik4NOmYrdN55
 rUoHIyVdK5Z5y4j79Pmth7//+c6OFli1cAeUmBlIvxS9T4T2ZCz30jBim76VTS8H
 AjgaX/aBlE3SxAYoMWLZh1VglukxAVCG2qZ9lm7iNLCpkGwP59jT/DE9Gok2IXBg
 id9SMTkrpitijCyM3oKYox14Tl+QP/brElnWCVyVeRIpkVH8s2WUkU+qHeZztBg9
 8i/aU9x4bOCkDjhvBjIM4jbYJAvvaVKlIXPvANO8xl/A0D7nCsmCs7DpritRHZEr
 3y4N8SQsaamVD083+UaVciAo1XCpl5cNq9gH/Y7+U8h2bIThZn8HTbZ1uWgXaCY1
 OwfElbJDKInsSmDVBEklMI8CF42YsjGQc2JC+A/3M3CapTfepKPg6SvptNQueZb+
 SUw0mgGBumpdDQSHo5tPf5JL1y57ERkdOBVryrqYtr6qdjw3ox9o/+B8eTy3gkg1
 b71LEhzG2UbdSVaHiYhRE/IR3W3yKzNX8Oh3BTHfp04jqnNijx4hHu9oX7j3W59M
 P/bd6GHHI5ZY66aLqCue
 =r2kL
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras

Pull RAS/EDAC updates from Boris Petkov:

 "An amd64_edac fix for single channel configurations + trivial cleanups
  courtesy of Jingoo Han."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:07:20 +02:00
Jingoo Han 75a9551f2b cpc925_edac: Use proper array termination
The struct should be terminated by using empty braces in order to
fix the following sparse warning.

drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-14 12:46:46 +02:00
Borislav Petkov a4b4bedce8 amd64_edac: Get rid of boot_cpu_data accesses
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:01:56 +02:00
Aravind Gopalakrishnan 18b94f66f9 amd64_edac: Add ECC decoding support for newer F15h models
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.

Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.

While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:

      * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
      * Erratum 637: not fixed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 16:00:10 +02:00
Jingoo Han e0d391ab04 x38_edac: Make a local function static
Make a local function static in order to fix the following sparse
warning:

drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-09 15:29:10 +02:00
Jingoo Han 166e9334e9 i3200_edac: Make a local function static
This local symbol is used only in this file.
Fix the following sparse warnings:

drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-09 15:23:02 +02:00
Borislav Petkov f0a56c4801 amd64_edac: Fix single-channel setups
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.

So always allocate two channels unconditionally.

This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.

Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-29 17:22:41 +02:00
Jingoo Han c542b53da9 EDAC: Replace strict_strtol() with kstrtol()
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-24 09:30:03 +02:00
Borislav Petkov 88d84ac973 EDAC: Fix lockdep splat
Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-23 16:01:28 -07:00
Sachin Kamat 8e42e211e4 edac: Remove redundant platform_set_drvdata()
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2013-07-17 12:49:55 -04:00
Linus Torvalds d144746478 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "MIPS updates:

   - All the things that didn't make 3.10.
   - Removes the Windriver PPMC platform.  Nobody will miss it.
   - Remove a workaround from kernel/irq/irqdomain.c which was there
     exclusivly for MIPS.  Patch by Grant Likely.
   - More small improvments for the SEAD 3 platform
   - Improvments on the BMIPS / SMP support for the BCM63xx series.
   - Various cleanups of dead leftovers.
   - Platform support for the Cavium Octeon-based EdgeRouter Lite.

  Two large KVM patchsets didn't make it for this pull request because
  their respective authors are vacationing"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
  MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
  MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
  MIPS: SEAD3: Disable L2 cache on SEAD-3.
  MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
  MIPS: BCM63xx: Add SMP support to prom.c
  MIPS: define write{b,w,l,q}_relaxed
  MIPS: Expose missing pci_io{map,unmap} declarations
  MIPS: Malta: Update GCMP detection.
  Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
  MIPS: APSP: Remove <asm/kspd.h>
  SSB: Kconfig: Amend SSB_EMBEDDED dependencies
  MIPS: microMIPS: Fix improper definition of ISA exception bit.
  MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
  MIPS: Declare emulate_load_store_microMIPS as a static function.
  MIPS: Fix typos and cleanup comment
  MIPS: Cleanup indentation and whitespace
  MIPS: BMIPS: support booting from physical CPU other than 0
  MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
  MIPS: GIC: Fix gic_set_affinity infinite loop
  MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
  ...
2013-07-13 14:52:21 -07:00
Linus Torvalds f3acb96f38 Add MCE signatures for family 0x15, models 30-3f.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0UeRAAoJEBLB8Bhh3lVKp/oP/1Bn5wFlVNVQymyqjUv8PCui
 W7gro9vf67KSZrEvbvWYuBl3/Z7JI9e7MQMzWOtj1mx/li6G/mSHPWYsoK/g4MRD
 nuxr5EWmXoXvoJvaIPDf/ne9qeMZ88sTv1zQavCWEO+fCY95rUY5DzhJuw8kgjMX
 g++C1AZJFzRGHBF0/VswcBKU1cBaEIJ2h0UI6zJQDPWtTUL5VJcfZ+7CduHf07IA
 rMKqRLKLzBuuQ/aiE52bPQAHirbYbJ9d2jfNqpDRsdBFrOoOeqJbco7ac/itk9Lo
 4Qlh52HuF0rpi9Ub+QIchaQm4SZCFnBuw5liDgT9kp8cOO9KWS6JBNmYa2cEw54F
 LALbdp0XeHCYWoQskzVlTaZlAGADmr4f0C0o1e+XXJYfp8TFIR3ZoconyBKHulsh
 JbQDp1NNC1JzoyZyZW73Gzi4a1yLmf5tB7Y3+O6NMTDy+6RFck3oCwctcr3dRaP2
 pArQA0OCApDixcGfB0h1uX+H8zhVrUen+JcF3KfuXhTepS5koIRbMBglBiJYtSjN
 4RRaV1DWR3ii+gvSfWEaApzX2XxnLYA/6ZJSvicwoRHi6AcZPn5mcLDCXIyYU0p/
 jDGLPY1RU+EmIQlgXUr6CbtlpoCaQOZuyUgSAqQL0pPfCvaeUvX1M7Ly0GqXy5b5
 dKMw7cf7T37S2S9YUBJ/
 =KQCd
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull AMD EDAC update from Borislav Petkov:
 "Add MCE signatures for family 0x15, models 30-3f"

* tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE, AMD: Add an MCE signature for new Fam15h models
  EDAC: Replace strict_strtoul() with kstrtoul()
2013-07-03 13:11:18 -07:00
David Daney 9ddebc46e7 MIPS: OCTEON: Rename Kconfig CAVIUM_OCTEON_REFERENCE_BOARD to CAVIUM_OCTEON_SOC
CAVIUM_OCTEON_SOC most place we used to use CPU_CAVIUM_OCTEON.  This
allows us to CPU_CAVIUM_OCTEON in places where we have no OCTEON SOC.

Remove CAVIUM_OCTEON_SIMULATOR as it doesn't really do anything, we can
get the same configuration with CAVIUM_OCTEON_SOC.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-ide@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-i2c@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: spi-devel-general@lists.sourceforge.net
Cc: devel@driverdev.osuosl.org
Cc: linux-usb@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Patchwork: https://patchwork.linux-mips.org/patch/5295/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-06-10 18:01:25 +02:00
Aravind Gopalakrishnan aad19e5176 EDAC, MCE, AMD: Add an MCE signature for new Fam15h models
Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
 [ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:17:03 +02:00
Jingoo Han c7f62fc87b EDAC: Replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-08 10:16:33 +02:00
Stephen Rothwell 40b313608a Finally eradicate CONFIG_HOTPLUG
Ever since commit 45f035ab9b ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 14:20:18 -07:00
Borislav Petkov bbb013b920 amd64_edac: Fix bogus sysfs file permissions
Fix yet another issue caught by 8f46baaa7e ("base: core: WARN() about
bogus permissions on device attributes").

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-21 09:13:11 +02:00
Linus Torvalds 7462543abb Two small EDAC fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRi31iAAoJEBLB8Bhh3lVKiaQQAIZ4R6/9QSt7KbP5VLN1ksv2
 qV2VOGPuxkhGoO/vfUJ0RKFj0NgnQFYizKl+vjZL/ahQy9yqelWInoc/TO7tHGol
 rxlyCoxmS5kuXwbs8CYhC5OQBvuFSTfk/8Ecu4lZa/PfPKs4IXaq+jzjqfk6/QvC
 jjfJ/4TIbyHLR48tbjoiJtjMQlsZOZ+R9o6Ic0Qw49GlqbIdZ15KzZVnuLRxJWby
 4S2hQnBWkMfc/RYXfliUs6TsXd55qyd88La6PHeY/BJRxQoaBvsPWLEd6Pk6RNWd
 RfpoEHCdUfnFBQMvO5C50/Dp6iKXJ4xqh9qrPg0LVlTtGbPL8gdDfK5QEElhEiuw
 vM9dzfNXFvE4Pavx32WSm7ql2LD9Qf8ZdLPxMlvjNGW14+oQexHmDI6sdU5iVYxb
 toct5jF7MwEy6GQQcwmp84V8y5FU+MyEtT+w9OKLpay/9Bqcq0I/Mv6LJWH10IDC
 bpfkaUm10C1aF2/vP6BB48NGUUElZIxXg9VapzX+AuRs6kN7LLOmM38G371HPEbV
 wcsRCU1znxo1Yjehen6oI9I2AQ4NuMcHplK2FiD0I1AzlRQ/BM6TeHejy84SJMgf
 QEQkjwh8DAClzcKJFlt9uoIglCLLjY/WDVSLgvvhv+/kQIXrLV7zCAhGR2CE27ci
 XQsmruJYlPt0A0xkOh//
 =uyan
 -----END PGP SIGNATURE-----

Merge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull two small EDAC fixes from Borislav Petkov.

* tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC: Don't give write permission to read-only files
  EDAC, mc_sysfs.c: Fix string array pointer types
2013-05-09 10:11:08 -07:00
Srivatsa S. Bhat c8c64d165c EDAC: Don't give write permission to read-only files
I get the following warning on boot:

------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name:  -[8737R2A]-
Write permission without 'store'
...
</snip>

Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-09 12:40:45 +02:00
Linus Torvalds e2823299cd Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac fixes from Mauro Carvalho Chehab:
 "Two edac fixes:

   - i7300_edac currently reports a wrong number of DIMMs when the
     memory controller is in single channel mode

   - on some Sandy Bridge machines, the EDAC driver bails out as one of
     the PCI IDs used by the driver is hidden by BIOS.  As the driver
     uses it only to detect the type of memory, make it optional at the
     driver"

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  edac: sb_edac.c should not require prescence of IMC_DDRIO device
  i7300_edac: Fix memory detection in single mode
2013-04-30 10:00:49 -07:00
Luck, Tony de4772c621 edac: sb_edac.c should not require prescence of IMC_DDRIO device
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers.  While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.

Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab 33ad41263d i7300_edac: Fix memory detection in single mode
When the machine is on single mode, only branch 0 channel 0
is valid. However, the code is not honouring it:

[ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
...
[ 1952.639351] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH0 = 0x1:
[ 1952.639353] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH1 = 0x0:
[ 1952.639355] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH2 = 0x0:
[ 1952.639358] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH3 = 0x0:
...
[ 1952.639360] EDAC DEBUG: decode_mtr: 	MTR0 CH0: DIMMs are Present (mtr)
[ 1952.639362] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639363] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639364] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639366] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639367] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639368] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639370] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639371] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639373] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
[ 1952.639374] EDAC DEBUG: decode_mtr: 	MTR0 CH1: DIMMs are Present (mtr)
[ 1952.639376] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639377] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639379] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639380] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639381] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639383] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639384] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639385] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639387] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
...
[ 1952.639449] EDAC DEBUG: print_dimm_size:               channel 0 | channel 1 | channel 2 | channel 3 |
[ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
[ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0   512 MB   |  512 MB   |    0 MB   |    0 MB   |
[ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------

Instead of detecting a single memory at channel 0, it is showing
twice the memory.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-04-29 10:32:39 -03:00
Aravind Gopalakrishnan 94c1acf2c8 amd64_edac: Add Family 16h support
Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-19 12:46:50 +02:00
Borislav Petkov 8b7719e08a EDAC, mc_sysfs.c: Fix string array pointer types
Those should be const ptr to a const string, fix them.

Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-25 15:44:25 +01:00
Mauro Carvalho Chehab 9713faecff EDAC: Merge mci.mem_is_per_rank with mci.csbased
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab 1eef128254 amd64_edac: Correct DIMM sizes
We were filling the csrow size with a wrong value. 16a528ee39 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16 06:32:02 +01:00
Stephen Hemminger fbe2d3616c EDAC: Make sysfs functions static
Fixes lots of sparse warnings here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-05 11:33:57 +01:00
Linus Torvalds ad6c2c2eb3 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
 "For:

   - Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
   - error injection support for i5100, when EDAC debug is enabled;
   - fix edac when it is loaded builtin (early init for the subsystem);
   - a "Firmware First" EDAC driver, allowing ghes to report errors via
     EDAC (ghes-edac).

  With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
  that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
  i7core_edac or sb_edac are running, the error reports are
  unpredictable, as both BIOS and OS race to access the registers.  With
  ghes-edac, the EDAC core will refuse to register any other concurrent
  memory error driver.

  This patchset moves the ghes struct definitions to a separate header
  file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
  register/unregister and to report errors via ghes-edac.  Those changes
  were acked by ghes driver maintainer (Huang)."

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
  i5100_edac: convert to use simple_open()
  ghes_edac: fix to use list_for_each_entry_safe() when delete list items
  ghes_edac: Fix RAS tracing
  ghes_edac: Make it compliant with UEFI spec 2.3.1
  ghes_edac: Improve driver's printk messages
  ghes_edac: Don't credit the same memory dimm twice
  ghes_edac: do a better job of filling EDAC DIMM info
  ghes_edac: add support for reporting errors via EDAC
  ghes_edac: Register at EDAC core the BIOS report
  ghes: add the needed hooks for EDAC error report
  ghes: move structures/enum to a header file
  edac: add support for error type "Info"
  edac: add support for raw error reports
  edac: reduce stack pressure by using a pre-allocated buffer
  edac: lock module owner to avoid error report conflicts
  edac: remove proc_name from mci structure
  edac: add a new memory layer type
  edac: initialize the core earlier
  edac: better report error conditions in debug mode
  i5100_edac: Remove two checkpatch warnings
  ...
2013-02-28 20:42:33 -08:00
Wei Yongjun b0769891ba i5100_edac: convert to use simple_open()
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26 10:06:18 -03:00
Wei Yongjun 5dae92a718 ghes_edac: fix to use list_for_each_entry_safe() when delete list items
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-26 10:05:35 -03:00
Mauro Carvalho Chehab 8ae8f50ad8 ghes_edac: Fix RAS tracing
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:17 -03:00
Mauro Carvalho Chehab 689c9cd812 ghes_edac: Make it compliant with UEFI spec 2.3.1
The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:16 -03:00
Mauro Carvalho Chehab d2a6856614 ghes_edac: Improve driver's printk messages
Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Conflicts:
	drivers/edac/ghes_edac.c
2013-02-25 19:42:15 -03:00
Mauro Carvalho Chehab 5ee726db52 ghes_edac: Don't credit the same memory dimm twice
On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:14 -03:00
Mauro Carvalho Chehab 32fa1f53c2 ghes_edac: do a better job of filling EDAC DIMM info
Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab f04c62a703 ghes_edac: add support for reporting errors via EDAC
Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab 77c5f5d2f2 ghes_edac: Register at EDAC core the BIOS report
Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-25 19:42:12 -03:00
Linus Torvalds 06991c28f3 Driver core patches for 3.9-rc1
Here is the big driver core merge for 3.9-rc1
 
 There are two major series here, both of which touch lots of drivers all
 over the kernel, and will cause you some merge conflicts:
   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.
   - remove CONFIG_EXPERIMENTAL
 
 If you need me to provide a merged tree to handle these resolutions,
 please let me know.
 
 Other than those patches, there's not much here, some minor fixes and
 updates.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
 =yWAQ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg Kroah-Hartman:
 "Here is the big driver core merge for 3.9-rc1

  There are two major series here, both of which touch lots of drivers
  all over the kernel, and will cause you some merge conflicts:

   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.

   - remove CONFIG_EXPERIMENTAL

  Other than those patches, there's not much here, some minor fixes and
  updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
  base: memory: fix soft/hard_offline_page permissions
  drivercore: Fix ordering between deferred_probe and exiting initcalls
  backlight: fix class_find_device() arguments
  TTY: mark tty_get_device call with the proper const values
  driver-core: constify data for class_find_device()
  firmware: Ignore abort check when no user-helper is used
  firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
  firmware: Make user-mode helper optional
  firmware: Refactoring for splitting user-mode helper code
  Driver core: treat unregistered bus_types as having no devices
  watchdog: Convert to devm_ioremap_resource()
  thermal: Convert to devm_ioremap_resource()
  spi: Convert to devm_ioremap_resource()
  power: Convert to devm_ioremap_resource()
  mtd: Convert to devm_ioremap_resource()
  mmc: Convert to devm_ioremap_resource()
  mfd: Convert to devm_ioremap_resource()
  media: Convert to devm_ioremap_resource()
  iommu: Convert to devm_ioremap_resource()
  drm: Convert to devm_ioremap_resource()
  ...
2013-02-21 12:05:51 -08:00
Mauro Carvalho Chehab e7e248304c edac: add support for raw error reports
That allows APEI GHES driver to report errors directly, using
the EDAC error report API.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 14:16:03 -03:00
Mauro Carvalho Chehab c7ef764554 edac: reduce stack pressure by using a pre-allocated buffer
The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 13:48:45 -03:00
Mauro Carvalho Chehab 80cc7d87d5 edac: lock module owner to avoid error report conflicts
APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
	"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
   the EDAC core, as it is driver's responsibility to number
   the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
   doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:38 -03:00
Mauro Carvalho Chehab c66b5a79a9 edac: add a new memory layer type
There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:37 -03:00
Mauro Carvalho Chehab 4ab19b06ac edac: initialize the core earlier
In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:

...
[    4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[    4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[    6.519495] EDAC MC: Ver: 3.0.0
[    6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

The net result is that no EDAC sysfs nodes will appear.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:36 -03:00
Mauro Carvalho Chehab 3d958823e2 edac: better report error conditions in debug mode
It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab 59b9796d1e i5100_edac: Remove two checkpatch warnings
The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+       if (priv->debugfs)
+               debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+       if (i5100_debugfs)
+               debugfs_remove(i5100_debugfs);

Get rid of them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund 9cbc6d38f2 i5100_edac: connect fault injection to debugfs node
Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:34 -03:00
Niklas Söderlund 53ceafd6a2 i5100_edac: add fault injection code
Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
	http://www.intel.com/assets/pdf/datasheet/318082.pdf

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21 11:06:33 -03:00