Commit Graph

272 Commits

Author SHA1 Message Date
Borislav Petkov 33ca0643c9 amd64_edac: Reorganize error reporting path
Rewrite CE/UE paths so that they use the same code and drop additional
code duplication in handle_ue. Add a struct err_info which collects
required info for the error reporting. This, in turn, helps slimming all
edac_mc_handle_error() calls down to one.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:34 +01:00
Borislav Petkov c8d1adf092 amd64_edac: Do not check whether error address is valid
All families report a valid error address when encountering a DRAM ECC
error so no need to check it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:11 +01:00
Borislav Petkov 66fed2d464 amd64_edac: Improve error injection
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine
does this by injecting the error in the next non-cached access. This
takes relatively long time on a normal system so that in order for us to
expedite it, we disable the caches around the injection.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:45:01 +01:00
Borislav Petkov 1f31677e0d amd64_edac: Small fixlets and cleanups
amd64_get_dram_hole_info: remove local variable 'base'.
sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize
constants formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28 11:44:12 +01:00
Andrew Morton 168bfeef7b amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]
If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.

As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.

Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2

It will be rewritten more robustly soonish.

Reported-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: stable@vger.kernel.org
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-10-24 16:13:27 +02:00
Mauro Carvalho Chehab 9eb07a7fb8 edac: edac_mc_handle_error(): add an error_count parameter
In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.

The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.

The changes at the drivers were made by this small script:

	$file .=$_ while (<>);
	$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
	print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-12 12:15:47 -03:00
Mauro Carvalho Chehab 03f7eae80f edac: remove arch-specific parameter for the error handler
Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.

The changes at the EDAC drivers were done by this small perl script:

	$file .=$_ while (<>);
	$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
	print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:52 -03:00
Mauro Carvalho Chehab 075f30901e amd64_edac: Don't pass driver name as an error parameter
The EDAC driver name doesn't help to handle EDAC errors. So,
remove it from the EDAC error messages, preserving only the
error_message.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:51 -03:00
Joe Perches 956b9ba156 edac: Convert debugfX to edac_dbg(X,
Use a more common debugging style.

Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:49 -03:00
Mauro Carvalho Chehab de3910eb79 edac: change the mem allocation scheme to make Documentation/kobject.txt happy
Kernel kobjects have rigid rules: each container object should be
dynamically allocated, and can't be allocated into a single kmalloc.

EDAC never obeyed this rule: it has a single malloc function that
allocates all needed data into a single kzalloc.

As this is not accepted anymore, change the allocation schema of the
EDAC *_info structs to enforce this Kernel standard.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Greg K H <gregkh@linuxfoundation.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:45 -03:00
Mauro Carvalho Chehab c56087595f amd64_edac: convert sysfs logic to use struct device
Now that the EDAC core supports struct device, there's no sense
on having any logic at the EDAC core to simulate it. So, instead
of adding such logic there, change the logic at amd64_edac to
use it.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 13:23:40 -03:00
Mauro Carvalho Chehab fd687502dc edac: Rename the parent dev to pdev
As EDAC doesn't use struct device itself, it created a parent dev
pointer called as "pdev".  Now that we'll be converting it to use
struct device, instead of struct devsys, this needs to be fixed.

No functional changes.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11 11:56:06 -03:00
Mauro Carvalho Chehab ca0907b9e4 edac: Remove the legacy EDAC ABI
Now that all drivers got converted to use the new ABI, we can
drop the old one.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab ab5a503cb5 amd64_edac: convert driver to use the new edac ABI
The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab a895bf8b1e edac: move nr_pages to dimm struct
The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab 5e2af0c09e edac: Don't initialize csrow's first_page & friends when not needed
Almost all edac	drivers	initialize csrow_info->first_page,
csrow_info->last_page and csrow_info->page_mask. Those vars are
used inside the EDAC core, in order to calculate the csrow affected
by an error, by using the routine edac_mc_find_csrow_by_page().

However, very few drivers actually use it:
        e752x_edac.c
        e7xxx_edac.c
        i3000_edac.c
        i82443bxgx_edac.c
        i82860_edac.c
        i82875p_edac.c
        i82975x_edac.c
        r82600_edac.c

There also a few other drivers that have their own calculus
formula internally using those vars.

All the others are just wasting time by initializing those
data.

While initializing data without using them won't cause any troubles, as
those information is stored at the wrong place (at csrows structure), it
is better to remove what is unused, in order to simplify the next patch.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab 084a4fccef edac: move dimm properties to struct dimm_info
On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.

However, such assumption is not true for all types of memory
controllers.

Controllers for FB-DIMM's don't have such requirements.

Also, modern Intel controllers seem to be capable of handling such
differences.

So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.

The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Mike Williams <mike@mikebwilliams.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:10:58 -03:00
Lionel Debroux 36c46f31df EDAC: Make pci_device_id tables __devinitconst.
These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:

"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."

So use DEFINE_PCI_DEVICE_TABLE(x).

Based on PaX and earlier work by Andi Kleen.

Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 12:04:54 +01:00
Borislav Petkov 11b0a31473 amd64_edac: Fix K8 revD and later chip select sizes
Fix DRAM chip select sizes calculation for K8, revisions D and E.

Reported-by: Niklas Söderlund <niklas.soderlund@ericsson.com
Link: http://lkml.kernel.org/r/1320849178-23340-1-git-send-email-niklas.soderlund@ericsson.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 12:02:46 +01:00
Ashish Shenoy f92cae4526 amd64_edac: Fix missing csrows sysfs nodes
While initializing the array of csrow attribute instances, a few csrows
were uninitialized. This happened because the module only performed a
check for DRAM base ctl register0's and not DRAM base ctl register1's
chip select enable bit. There could be systems with DIMMs populated
on only single memory channel whereas the module also assumed that a
dual channel dimm had double the memory size of a single memory channel
instead of checking the memory on each channel.

This patch fixes these above issues.

Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com>
Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 11:57:28 +01:00
Dan Carpenter 1f6189ed18 amd64_edac: Cleanup return type of amd64_determine_edac_cap()
Sparse complains that edac_cap was declared as dev_type and we are
returning edac_type.  Historically, edac_type was correct but since
then we have changed it to return a bit field.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20111006063025.GA2615@mwanda
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:35:46 +02:00
Borislav Petkov 73ba85937b amd64_edac: Add a fix for Erratum 505
When accessing the scrub rate control register (F3x58) on F15h, the DRAM
controller selector (F1x10C[DctCfgSel]) has to point to DCT0 so that the
scrub rate configuration can take effect. See Erratum 505 in the AMD
F15h revision guide for more details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:05 +02:00
Borislav Petkov b0b07a2bd4 EDAC, MCE, AMD: Simplify NB MCE decoder interface
Drop third nbcfg argument which is old remains and not required anymore.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:04 +02:00
Borislav Petkov c1ae68309b amd64_edac: Erratum #637 workaround
F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:56 +02:00
Borislav Petkov f08e457cec amd64_edac: Factor in CC6 save area
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:44 +02:00
Borislav Petkov f030ddfb37 amd64_edac: Remove node interleave warning
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:12 +02:00
Markus Trippelsdorf 4949603a6f EDAC: Remove debugging output in scrub rate handling
This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in the scrub rate
accessors.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-21 12:44:58 +02:00
Borislav Petkov a9f0fbe2bb amd64_edac: Fix potential memleak
We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.

Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-29 18:19:06 +02:00
Borislav Petkov d34a6ecd45 amd64_edac: Fix decode_syndrome types
Those should all be unsigned.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:40 +01:00
Borislav Petkov 8c6717510f amd64_edac: Fix DCT argument type
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt
is always first). Also, the now second arg denotes the DCT so adjust its
type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:34 +01:00
Borislav Petkov e761359a25 amd64_edac: Fix ranges signedness
The dram ranges make sense only as an unsigned type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:33 +01:00
Borislav Petkov 972ea17ab9 amd64_edac: Drop local variable
Use the macro directly instead

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:32 +01:00
Borislav Petkov 71d2a32e8e amd64_edac: Fix PCI config addressing types
Adjust argument types to the PCI config API's types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:31 +01:00
Borislav Petkov 151fa71c58 amd64_edac: Fix DRAM base macros
Return unsigned u8 values only.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:30 +01:00
Borislav Petkov b487c33e55 amd64_edac: Fix node id signedness
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:28 +01:00
Borislav Petkov df71a05324 amd64_edac: Enable driver on F15h
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov a3b7db09a6 amd64_edac: Adjust ECC symbol size to F15h
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov 87b3e0e6e4 amd64_edac: Simplify scrubrate setting
Drop per-instance variable and compute min scrubrate dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:25 +01:00
Borislav Petkov 41d8bfaba7 amd64_edac: Improve DRAM address mapping
Drop static tables which map the bits in F2x80 to a chip select size in
favor of functions doing the mapping with some bit fiddling. Also, add
F15 support.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:24 +01:00
Borislav Petkov 5a5d237169 amd64_edac: Sanitize ->read_dram_ctl_register
This function is relevant for F10h and higher, and it has only one
callsite so drop its function pointer from the low_ops struct.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:23 +01:00
Borislav Petkov b15f0fcab1 amd64_edac: Adjust sys_addr to chip select conversion routine to F15h
F15h sys_addr to chip select mapping is almost identical to F10h's so
reuse that. Rename functions on that path accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov 355fba6005 amd64_edac: Beef up early exit reporting
Add paranoid checks for the sys address before going off and decoding
it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov 614ec9d853 amd64_edac: Revamp online spare handling
Replace per-DCT macros with smarter ones, drop hack and look for the
spare rank on all chip selects on a channel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov 5d4b58e84a amd64_edac: Fix channel interleave removal
Remove the channel interleave select bit properly. See
F2x110[DctSelIntLvAddr] for details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov e2f79dbdfb amd64_edac: Correct node interleaving removal
When node interleaving is enabled, a subset of the addr[14:12] bits has
to be removed in order to get the normalized DCT address of the DRAM
channel. The actual number of bits to remove is determined by F1x[1,
0][7C:40][IntlvEn]. Do this correctly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov 95b0ef55cd amd64_edac: Add support for interleaved region swapping
On revC3 and revE Fam10h machines and later, non-interleaved graphics
framebuffer memory under the 16G mark can be swapped with a region
located at the bottom of memory so that the GPU can use the interleaved
region and thus two channels. Add support for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov 700466249f amd64_edac: Unify get_error_address
The address bits from MC4_STATUS differ only between K8 and the rest so
no need for a per-family method.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov f192c7b16c amd64_edac: Simplify decoding path
Use the struct mce directly instead of copying from it into a custom
struct err_regs.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov 7d20d14da1 amd64_edac: Adjust channel counting to F15h
The only difference is that F10h used to sport ganged DCTs and F15h
doesn't so adjust the F10h routine and reuse it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov 5980bb9cd8 amd64_edac: Cleanup old defines cruft
Remove unused defines, drop family names from define names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:18 +01:00
Borislav Petkov bcd781f46a amd64_edac: Cleanup NBSH cruft
Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:17 +01:00
Borislav Petkov a97fa68ec4 amd64_edac: Cleanup NBCFG handling
The fact whether we are chipkill capable or not does not have any
bearing when computing the channel index on a ganged DCT configuration
so remove that. Also, simplify debug statements. Finally, remove old
error injection leftovers, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:16 +01:00
Borislav Petkov c9f4f26eae amd64_edac: Cleanup NBCTL code
Remove family names from macro names, drop single bit defines and
comment their meaning instead.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov 78da121e15 amd64_edac: Cleanup DCT Select Low/High code
Shorten macro names, remove family name from macros, fix macro
arguments, shorten debug strings.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov cb32850744 amd64_edac: Cleanup Dram Configuration registers handling
* Restrict DCT ganged mode check since only Fam10h supports it
* Adjust DRAM type detection for BD since it only supports DDR3
* Remove second and thus unneeded DCLR read in k8_early_channel_count() - we do
  that in read_mc_regs()
* Cleanup comments and remove family names from register macros
* Remove unused defines

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov 525a1b20a6 amd64_edac: Cleanup DBAM handling
Do not read DBAM regs twice and simplify code around them.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov f678b8ccce amd64_edac: Replace huge bitmasks with a macro
Replace hard to read hex constants with a continuous masks macro.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov c8e518d567 amd64_edac: Sanitize f10_get_base_addr_offset
This function maps the system address to the normalized DCT address.
Document what the code does for more clarity and wrap insane bitmasks in
a more understandable macro which generates them. Also, reduce number of
arguments passed to the function. Finally, rename this function to what
it actually does.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov 229a7a11ac amd64_edac: Sanitize channel extraction
Cleanup and simplify f10_determine_channel(); make it more readable.
Also drop f10_map_intlv_en_to_shift() in favor of simply counting the
bits in F1x124[DramIntlvEn] which is equivalent.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov 11c75eadaf amd64_edac: Cleanup chipselect handling
Add a struct representing the DRAM chip select base/limit register
pairs. Concentrate all CS handling in a single function. Also, add CS
looping macros for cleaner, more readable code. While at it, adjust code
to F15h. Finally, do smaller macro names cleanups (remove family names
from register macros) and debug messages clarification.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov bc21fa5787 amd64_edac: Cleanup DHAR handling
Adjust to F15h, simplify code, fixup macros.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov 7f19bf755c amd64_edac: Remove DRAM base/limit subfields caching
Add a struct representing the DRAM base/limit range pairs and remove all
cached subfields. Replace them with accessor functions, which actually
saves us some space:

   text    data     bss     dec     hex filename
  14712    1577     336   16625    40f1 drivers/edac/amd64_edac_mod.o.after
  14831    1609     336   16776    4188 drivers/edac/amd64_edac_mod.o.before

Also, it simplifies the code a lot allowing to merge the K8 and F10h
routines.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov b2b0c60543 amd64_edac: Add support for F15h DCT PCI config accesses
F15h "multiplexes" between the configuration space of the two DRAM
controllers by toggling D18F1x10C[DctCfgSel] while F10h has a different
set of registers for DCT0, and DCT1 in extended PCI config space.

Add DCT configuration space accessors per family thus wrapping all the
different access prerequisites. Clean up code while at it, shorten
names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov 4d7963648f amd64_edac: Fix DIMMs per DCTs output
amd64_debug_display_dimm_sizes() reports the distribution of the DIMMs
on each DRAM controller and its chip select sizes. Thus, the last don't
have anything to do with whether we're running in ganged DCT mode or not
- their sizes don't change all of a sudden. Fix that by removing the
ganged-check and dump DCT0's config for DCT1 when in ganged mode since
they're identical.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-02-10 14:41:49 +01:00
Linus Torvalds 128283a47e Merge branch 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE: Fix NB error formatting
  EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
  EDAC, MCE: Enable MCE decoding on F15h
  EDAC, MCE: Allow F15h bank 6 MCE injection
  EDAC, MCE: Shorten error report formatting
  EDAC, MCE: Overhaul error fields extraction macros
  EDAC, MCE: Add F15h FP MCE decoder
  EDAC, MCE: Add F15 EX MCE decoder
  EDAC, MCE: Add an F15h NB MCE decoder
  EDAC, MCE: No F15h LS MCE decoder
  EDAC, MCE: Add F15h CU MCE decoder
  EDAC, MCE: Add F15h IC MCE decoder
  EDAC, MCE: Add F15h DC MCE decoder
  EDAC, MCE: Select extended error code mask
2011-01-07 14:54:03 -08:00
Borislav Petkov 6245288232 EDAC, MCE: Overhaul error fields extraction macros
Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:21 +01:00
Borislav Petkov a135cef79a amd64_edac: Disable DRAM ECC injection on K8
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:46 +01:00
Borislav Petkov 390944439f EDAC: Fixup scrubrate manipulation
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:31 +01:00
Borislav Petkov 360b7f3c60 amd64_edac: Remove two-stage initialization
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:

1. Probe PCI device: we only test ECC capabilities here and if none exit
early.

2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.

Remove "amd64_" prefix from static functions touched, while at it.

There actually should be no visible functional change resulting from
this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:03 +01:00
Borislav Petkov 2299ef7114 amd64_edac: Check ECC capabilities initially
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.

While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:02 +01:00
Borislav Petkov ae7bb7c679 amd64_edac: Carve out ECC-related hw settings
This is in preparation for the init path reorganization where we want
only to

1) test whether a particular node supports ECC
2) can it be enabled

and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.

The should be no functional change introduced by this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:00 +01:00
Borislav Petkov f1db274e1b amd64_edac: Remove PCI ECS enabling functions
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:59 +01:00
Borislav Petkov cc4d8860fc amd64_edac: Allocate driver instances dynamically
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:57 +01:00
Borislav Petkov 24f9a7fe3f amd64_edac: Rework printk macros
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:56 +01:00
Borislav Petkov 8d5b5d9c7b amd64_edac: Rename CPU PCI devices
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:54 +01:00
Borislav Petkov b8cfa02f83 amd64_edac: Concentrate per-family init even more
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:53 +01:00
Borislav Petkov bbd0c1f675 amd64_edac: Cleanup the CPU PCI device reservation
Shorten code and clarify comments, return proper -E* values on error.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:52 +01:00
Borislav Petkov 0092b20d4c amd64_edac: Simplify CPU family detection
Concentrate CPU family detection in the per-family init function.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:51 +01:00
Borislav Petkov 395ae783b3 amd64_edac: Add per-family init function
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:50 +01:00
Borislav Petkov 9f56da0e3c amd64_edac: Use cached extended CPU model
... instead of computing it needlessly again.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:49 +01:00
Borislav Petkov 3ab0e7dc2e amd64_edac: Remove F11h support
F11h doesn't support DRAM ECC so whack it away.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:47 +01:00
Linus Torvalds 42cbd8efb0 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Cleanup L3 cache index disable support
  x86, amd-nb: Cleanup AMD northbridge caching code
  x86, amd-nb: Complete the rename of AMD NB and related code
2011-01-06 10:50:28 -08:00
Borislav Petkov e726f3c368 amd64_edac: Fix interleaving check
When matching error address to the range contained by one memory node,
we're in valid range when node interleaving

1. is disabled, or
2. enabled and when the address bits we interleave on match the
interleave selector on this node (see the "Node Interleaving" section in
the BKDG for an enlightening example).

Thus, when we early-exit, we need to reverse the compound logic
statement properly.

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:54 +01:00
Hans Rosenfeld 9653a5c76c x86, amd-nb: Cleanup AMD northbridge caching code
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:05 +01:00
Hans Rosenfeld eec1d4fa00 x86, amd-nb: Complete the rename of AMD NB and related code
Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:04 +01:00
Linus Torvalds c029e405bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (21 commits)
  EDAC, MCE: Fix shift warning on 32-bit
  EDAC, MCE: Add a BIT_64() macro
  EDAC, MCE: Enable MCE decoding on F12h
  EDAC, MCE: Add F12h NB MCE decoder
  EDAC, MCE: Add F12h IC MCE decoder
  EDAC, MCE: Add F12h DC MCE decoder
  EDAC, MCE: Add support for F11h MCEs
  EDAC, MCE: Enable MCE decoding on F14h
  EDAC, MCE: Fix FR MCEs decoding
  EDAC, MCE: Complete NB MCE decoders
  EDAC, MCE: Warn about LS MCEs on F14h
  EDAC, MCE: Adjust IC decoders to F14h
  EDAC, MCE: Adjust DC decoders to F14h
  EDAC, MCE: Rename files
  EDAC, MCE: Rework MCE injection
  EDAC: Export edac sysfs class to users.
  EDAC, MCE: Pass complete MCE info to decoders
  EDAC, MCE: Sanitize error codes
  EDAC, MCE: Remove unused function parameter
  EDAC, MCE: Add HW_ERR prefix
  ...
2010-10-21 14:04:58 -07:00
Borislav Petkov 7cfd4a8744 EDAC, MCE: Pass complete MCE info to decoders
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Andreas Herrmann 23ac4ae827 x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-20 14:22:58 -07:00
Andreas Herrmann 900f9ac9f1 x86, k8-gart: Decouple handling of garts and northbridges
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-17 13:26:21 -07:00
Borislav Petkov 37b7370a8d amd64_edac: Do not report error overflow as a separate error
When the Overflow MCi_STATUS bit is set, EDAC reports the lost error
with a "no information available" message which often puzzles users
parsing the dmesg. This doesn't make much sense since this error has
been lost anyway so no need for reporting it separately. Thus, report
the overflow bit setting in the MCE dump instead. While at it, remove
reporting of MiscV and ErrorEnable (en) which are superfluous.

Now it looks like this:

[ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
[ 1501.666887] Northbridge Error, node 2

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-26 12:46:03 +02:00
Borislav Petkov c4799c7570 amd64_edac: Minor formatting fix
EDAC MC3: CE page 0xc32281, offset 0x8a0, grain 0, syndrome 0x1, row 2, channel 1, label "": amd64_edac
EDAC MC3: CE - no information available: amd64_edacError Overflow

Add the missing space before "Error Overflow" on the second line.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-04 11:16:01 +02:00
Borislav Petkov 962b70a1eb amd64_edac: Fix operator precendence error
The bitwise AND is of higher precedence, make that explicit.

Cc: <stable@kernel.org> # 34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-04 11:15:09 +02:00
Borislav Petkov eba042a81e edac, mc: Improve scrub rate handling
Fortify the interface to not accept negative values, remove
memctrl_int_store() as a result. Also, sanitize bandwidth setting by
making the argument a simple u32 instead of strange u32 pointer being
passed around for no obvious reason. Then, fix error handling and teach
it to return proper error values. Finally, make code more readable,
simplify debug messages.

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:06 +02:00
Borislav Petkov bc57117856 amd64_edac: Correct scrub rate setting
Exit early when setting scrub rate on unknown/unsupported families.

Cc: <stable@kernel.org> # 32.x 33.x 34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:05 +02:00
Borislav Petkov 9975a5f22a amd64_edac: Fix DCT base address selector
The correct check is to verify whether in high range we're below 4GB
and not to extract the DctSelBaseAddr again. See "2.8.5 Routing DRAM
Requests" in the F10h BKDG.

Cc: <stable@kernel.org> # .32.x .33.x .34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:04 +02:00
Borislav Petkov f4347553b3 amd64_edac: Remove polling mechanism
Switch to reusing the mcheck core's machine check polling mechanism
instead of duplicating functionality by using the EDAC polling routine.

Correct formatting while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:03 +02:00
Borislav Petkov ad6a32e969 amd64_edac: Sanitize syndrome extraction
Remove the two syndrome extraction macros and add a single function
which does the same thing but with proper typechecking. While at it,
make sure to cache ECC syndrome size and dump it in debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-03 16:13:31 +02:00
Borislav Petkov 41c310447f amd64_edac: Fix syndrome calculation on K8
When calculating the DCT channel from the syndrome we need to know the
syndrome type (x4 vs x8). On F10h, this is read out from extended PCI
cfg space register F3x180 while on K8 we only support x4 syndromes and
don't have extended PCI config space anyway.

Make the code accessing F3x180 F10h only and fall back to x4 syndromes
on everything else.

Cc: <stable@kernel.org> # .33.x .34.x
Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-07-02 17:32:34 +02:00
Linus Torvalds eaa5eec739 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Simplify ECC override handling
2010-03-03 09:25:37 -08:00
Linus Torvalds 0a135ba14d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: add __percpu sparse annotations to what's left
  percpu: add __percpu sparse annotations to fs
  percpu: add __percpu sparse annotations to core kernel subsystems
  local_t: Remove leftover local.h
  this_cpu: Remove pageset_notifier
  this_cpu: Page allocator conversion
  percpu, x86: Generic inc / dec percpu instructions
  local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
  module: Use this_cpu_xx to dynamically allocate counters
  local_t: Remove cpu_local_xx macros
  percpu: refactor the code in pcpu_[de]populate_chunk()
  percpu: remove compile warnings caused by __verify_pcpu_ptr()
  percpu: make accessors check for percpu pointer in sparse
  percpu: add __percpu for sparse.
  percpu: make access macros universal
  percpu: remove per_cpu__ prefix.
2010-03-03 07:34:18 -08:00
Borislav Petkov d95cf4de6a amd64_edac: Simplify ECC override handling
No need for clearing ecc_enable_override and checking it in two places.
Instead, simply check it during probing and act accordingly. Also,
rename the flag bitfields according to the functionality they actually
represent. What is more, make sure original BIOS ECC settings are
restored when the module is unloaded.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-03-01 19:25:12 +01:00
Tejun Heo a29d8b8e2d percpu: add __percpu sparse annotations to what's left
Add __percpu sparse annotations to places which didn't make it in one
of the previous patches.  All converions are trivial.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Neil Brown <neilb@suse.de>
2010-02-17 11:17:38 +09:00
Borislav Petkov cab4d27764 amd64_edac: Do not falsely trigger kerneloops
An unfortunate "WARNING" in the message amd64_edac dumps when the system
doesn't support DRAM ECC or ECC checking is not enabled in the BIOS
used to trigger kerneloops which qualified the message as an OOPS thus
misleading the users. See, e.g.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536
http://bugzilla.kernel.org/show_bug.cgi?id=15238

Downgrade the message level to KERN_NOTICE and fix the formulation.

Cc: stable@kernel.org # .32.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-02-11 20:32:14 +01:00
Roel Kluin 926311fd7d amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rate
Add a missing iterator variable thus fixing the conditional of the
for-loop in amd64_get_scrub_rate().

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-01-15 10:45:58 +01:00
Borislav Petkov 92389102b6 amd64_edac: restrict PCI config space access
Do not access F2x19[0,4] on K8 since they're undefined there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:08 +01:00
Borislav Petkov 43f5e68733 amd64_edac: fix forcing module load/unload
Clear the override flag after force-loading the module.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:08 +01:00
Borislav Petkov 56b34b91e2 amd64_edac: make driver loading more robust
Currently, the module does not initialize fully when the DIMMs aren't
ECC but remains still loaded. Propagate the error when no instance of
the driver is properly initialized and prevent further loading.

Reorganize and polish error handling in amd64_edac_init() while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Borislav Petkov 8f68ed9728 amd64_edac: fix driver instance freeing
Fix use-after-free errors by pushing all memory-freeing calls to the end
of amd64_remove_one_instance().

Reported-by: Darren Jenkins <darrenrjenkins@gmail.com>
LKML-Reference: <1261370306.11354.52.camel@ICE-BOX>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Borislav Petkov 603adaf6b3 amd64_edac: fix K8 chip select reporting
Fix the case when amd64_debug_display_dimm_sizes() reports only half the
amount of DRAM on it because it doesn't account for when the single DCT
operates in 128-bit mode and merges chip selects from different DIMMs.

Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Borislav Petkov 505422517d x86, msr: Add support for non-contiguous cpumasks
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.

This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.

Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 10:59:21 -08:00
Andrew Morton 18ba54ac12 amd64_edac: fix use-uninitialised bug
drivers/edac/amd64_edac.c: In function 'amd64_edac_init':
drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:38:13 +01:00
Borislav Petkov bdc30a0c8c amd64_edac: correct sys address to chip select mapping
The routine does the reverse mapping of the error address of a CECC back
to the node id, DRAM controller and chip select of the DIMM which caused
the error. We should lookup the channel using the syndromes _only_ when
the DCTs are ganged so fix that.

Also, add an early exit when there's an error while scanning for the
csrow thus decreasing indentation levels for better readability.

Finally, fixup comments.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:38:12 +01:00
Borislav Petkov bfc04aec7d amd64_edac: add a leaner syndrome decoding algorithm
Instead of using the whole syndrome tables for channel decoding, use a
set of eigenvectors with which the tables can be generated to search for
the syndrome in error. The algorithm operates independently of symbol
size and can be used for both x4 and x8 syndromes.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:37:59 +01:00
Borislav Petkov 986a42a250 amd64_edac: remove early hw support check
The .probe_valid_hardware low_ops member checked whether the DCTs are in
DDR3 mode and bailed out if so. Now that all the needed changes for DDR3
support is in place, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov 6b4c0bdeb0 amd64_edac: detect DDR3 memory type
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov 239642fe19 edac: add memory types strings for debugging
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.

This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov 1f6bcee75e amd64_edac: remove unneeded extract_error_address wrapper
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:30 +01:00
Borislav Petkov 44e9e2ee21 amd64_edac: rename StinkyIdentifier
SystemAddress -> sys_addr

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:30 +01:00
Borislav Petkov ad858bfa14 amd64_edac: remove superfluous dbg printk
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov 1433eb9903 amd64_edac: enhance address to DRAM bank mapping
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10
and all K8 flavors and remove klugdy table of pseudo values. Add a
low_ops->dbam_to_cs member which is family-specific and replaces
low_ops->dbam_map_to_pages since the pages calculation is a one liner
now.

Further cleanups, while at it:

- shorten family name defines
- align amd64_family_types struct members

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov d16149e8c3 amd64_edac: cleanup f10_early_channel_count
Do not read DCLR[01] again since this is done in
amd64_read_mc_registers() earlier. There can be more than two physical
DIMMs present so clamp the channels value to max 2. Also, do not report
DCT data width - it is also done earlier.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov 8566c4df16 amd64_edac: dump DIMM sizes on K8 too
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs
configuration on K8 revF too. Remove the ganged arg since we print the
DCT operating mode (ganged vs unganged) earlier.

Also, DCT csrow configuration is relevant therefore dump it as
KERN_DEBUG instead of only on debug builds. Remove misleading DIMM
output since there's no reliable way of mapping of chip selects to
actual physical DIMMs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov 8de1d91e62 amd64_edac: cleanup rest of amd64_dump_misc_regs
Clarify bitfields description, add PCI config function/offset names to
registers for easy reference, simplify code layout, remove unneeded
info.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov 68798e1760 amd64_edac: cleanup DRAM cfg low debug output
Carve out the register-specific debug statements into a separate
function, clarify meanings of the single bitfields in the register,
remove irrelevant output and macros.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov 6ba5dcdc44 amd64_edac: wrap-up pci config read error handling
Add a pci config read wrapper for signaling pci config space access
errors instead of them being visible only on a debug build. This is
important on amd64_edac since it uses all those pci config register
values to access the DRAM/DIMM configuration of the nodes.

In addition, the wrapper makes a _lot_ (look at the diffstat!) of
error handling code superfluous and improves much of the overall code
readability by removing error handling details out of the way.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov f6d6ae9657 amd64_edac: unify MCGCTL ECC switching
Unify almost identical code into one function and remove NUMA-specific
usage (specifically cpumask_of_node()) in favor of generic topology
methods.

Remove unused defines, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Rusty Russell ba578cb34a cpumask: use modern cpumask style in drivers/edac/amd64_edac.c
cpumask_t -> struct cpumask, and don't put one on the stack.  (Note: this
is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov e97f8bb8ce amd64_edac: make DRAM regions output more human-readable
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the
whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits
are only a subset of the 64bit register, the values are correct since
the remaining bits are Read-As-Zero and no shifting is needed.

Also, cleanup DRAM base/limit debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov 72381bd55e amd64_edac: clarify DRAM CTL debug reporting
Make debug info formulations about the DRAM and DCT configuration of the
machine more human readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:26 +01:00
Borislav Petkov 17adea01b9 amd64_edac: fix CECCs reporting
Shift error type bits properly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04 14:04:06 +01:00
Li Hong a3c4c58085 amd64_edac: fix a wrong goto clause in amd64_edac.c
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is
called before pci_register_driver. If it fails, should exit with err
directly.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04 14:02:32 +01:00
Borislav Petkov 4997811e3b amd64_edac: fix DRAM base and limit extraction masks, v2
This is a proper fix as a follow-up to 66216a7 and 916d11b.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-16 18:51:22 +02:00
Borislav Petkov 66216a7a15 amd64_edac: fix DRAM base and limit extraction
On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers
which specify the destination node of a DRAM address. Those address
boundaries are being extracted into ->dram_base[] and ->dram_limit[].
Correct the extraction masks to match the respective address bits.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:51:15 +02:00
Borislav Petkov 9d858bb10a amd64_edac: fix chip select handling
Different processor families support a different number of chip selects.
Handle this in a family-dependent way with the proper values assigned at
init time (see amd64_set_dct_base_and_mask).

Remove _DCSM_COUNT defines since they're used at one place and originate
from public documentation.

CC: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:50:50 +02:00
Keith Mannthey 2cff18c22c amd64_edac: simple fix to allow reporting of CECC errors
This allows the errors to be further decoded and mapped to csrows.
Tested with ECC debug dimms and an Rev F cpu based system.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:49:58 +02:00
Borislav Petkov 8edc544589 amd64_edac: fix K8 intlv_sel check
The check when DRAM interleaving is enabled should be done against the
pvt->dram_IntlvSel field and not against the ->dram_limit.

Simplify first loop and fixup printk formatting while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:49:43 +02:00
Borislav Petkov 72f158fe6f amd64_edac: fix interleave enable tests
The pvt->dram_IntlvEn saves the 3 "Interleave Enable" bits already
right-shifted by 8 so the check in find_mc_by_sys_addr() by shifting the
values to the left 8 bits is wrong.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:48:08 +02:00
Borislav Petkov 916d11b2b5 amd64_edac: fix DRAM base and limit address extraction
K8 DRAM base and limit addresses from F1x40 +8*i and F1x44 + 8*i, where
i in (0..7) are both bits 39-24 and therefore the shifting should be
done by 24 and not by 8.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:47:51 +02:00
Borislav Petkov 3011b20da9 amd64_edac: fix driver instance lookup table allocation
Allocate memory statically for 8-node machines max for simplicity
instead of relying on MAX_NUMNODES which is 0 on !CONFIG_NUMA builds.

Spotted by Jan Beulich.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:47:34 +02:00
Borislav Petkov 06724535f8 amd64_edac: check NB MCE bank enable on the current node properly
The old code was using smp_call_function_many which skips the current
cpu if it is in the supplied cpumask. Switch to the rdmsr_on_cpus()
interface which takes care of that.

In addition, add get_cpus_on_this_dct_cpumask helper which computes a
cpumask of all the cores on a node and thus on a DCT.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 13:05:46 +02:00
Wan Wei 57a30854c8 amd64_edac: Rewrite unganged mode code of f10_early_channel_count
Simplify the procedure by checking if there is any DIMM in each channel.
This patch will fix the bugs such as when there is no DIMMs under
certain node, two DIMMs in the same channel, and only one DIMM in each
channel of the node.

Borislav: minor fixups

Signed-off-by: Wan Wei <wanwei@mail.dawning.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 12:42:55 +02:00
Borislav Petkov be3468e8ff amd64_edac: cleanup amd64_check_ecc_enabled
Simplify code flow and make sure return value is always valid since
further driver init depends on it. Carve out long warning string and
make code more readable. Shorten some names, while at it.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 12:40:38 +02:00
Borislav Petkov d93cc222ad EDAC, AMD: carve out decoding of MCi_STATUS ErrorCode
This is the MCE error code from the MCi_STATUS banks, bits [15:0] which
describe what type of error was encountered: GART TLB, Memory or Bus
error. The semantics of those bits are identical across all MCE banks so
decode those separately, irrespectively of MCE type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:20 +02:00
Borislav Petkov b69b29de65 EDAC, AMD: carve out MCi_STATUS decoding
The MCi_STATUS registers have most field definitions in common so decode
them in the general path. Do not pass ecc_type along and compute it in
__amd64_decode_bus_error instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:07 +02:00
Borislav Petkov 549d042df2 x86, mce: pass mce info to EDAC for decoding
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:59:17 +02:00
Borislav Petkov ecaf5606de amd64_edac: cleanup amd64_decode_bus_error
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:37 +02:00
Borislav Petkov b7225e4fc1 amd64_edac: remove memory and GART TLB error decoders
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:29 +02:00
Borislav Petkov 5110dbdeab amd64_edac: cleanup/complete NB MCE decoding
* don't dump info which mcheck already does
* update to newest BKDG
* mv amd64_process_error_info -> amd64_decode_nb_mce
* shorten error struct names
* remove redundant info ptr in amd64_process_error_info
* remove unused ErrorCodeExt[19:16] (MCx_STATUS) defines

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:25 +02:00
Borislav Petkov ef44cc4c22 amd64_edac: cleanup amd64_process_error_info
* mv amd64_error_info_regs -> err_regs

* remove redundant info ptr

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:18 +02:00
Borislav Petkov b70ef01016 EDAC: move MCE error descriptions to EDAC core
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.

While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.

Remove superfluous htlink_msgs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:57:48 +02:00