edac.txt: update information about newer Intel CPUs

There's a chapter at edac.rst written by the time Nehalem
support was added. Such information is used not only by the
Nehalem driver (i7core_edac), but by all newer Intel CPU
architectures that are supported by i7core_edac, sb_edac
and sbx_edac drivers.

Update the information to reflect that.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
This commit is contained in:
Mauro Carvalho Chehab 2016-10-26 08:43:58 -02:00
parent 96714bd707
commit e4b5301674
1 changed files with 29 additions and 15 deletions

View File

@ -741,13 +741,25 @@ The ``test_device_edac`` sample driver is located at the
http://bluesmoke.sourceforge.net project site for EDAC.
Nehalem Usage of EDAC APIs
--------------------------
Usage of EDAC APIs on Nehalem and newer Intel CPUs
--------------------------------------------------
Due to the way Nehalem exports Memory Controller data, some adjustments
were done at i7core_edac driver. This chapter will cover those differences
On older Intel architectures, the memory controller was part of the North
Bridge chipset. Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Sky Lake and
newer Intel architectures integrated an enhanced version of the memory
controller (MC) inside the CPUs.
1) On Nehalem, there is one Memory Controller per Quick Patch Interconnect
This chapter will cover the differences of the enhanced memory controllers
found on newer Intel CPUs, such as ``i7core_edac``, ``sb_edac`` and
``sbx_edac`` drivers.
.. note::
The Xeon E7 processor families use a separate chip for the memory
controller, called Intel Scalable Memory Buffer. This section doesn't
apply for such families.
1) There is one Memory Controller per Quick Patch Interconnect
(QPI). At the driver, the term "socket" means one QPI. This is
associated with a physical CPU socket.
@ -757,7 +769,7 @@ were done at i7core_edac driver. This chapter will cover those differences
The minimum known unity is DIMMs. There are no information about csrows.
As EDAC API maps the minimum unity is csrows, the driver sequentially
maps channel/dimm into different csrows.
maps channel/DIMM into different csrows.
For example, supposing the following layout::
@ -780,8 +792,8 @@ were done at i7core_edac driver. This chapter will cover those differences
Each QPI is exported as a different memory controller.
2) Nehalem MC has the ability to generate errors. The driver implements this
functionality via some error injection nodes:
2) The MC has the ability to inject errors to test drivers. The drivers
implement this functionality via some error injection nodes:
For injecting a memory error, there are some sysfs nodes, under
``/sys/devices/system/edac/mc/mc?/``:
@ -855,13 +867,14 @@ were done at i7core_edac driver. This chapter will cover those differences
EDAC MC0: UE row 0, channel-a= 0 channel-b= 0 labels "-": NON_FATAL (addr = 0x0075b980, socket=0, Dimm=0, Channel=2, syndrome=0x00000040, count=1, Err=8c0000400001009f:4000080482 (read error: read ECC error))
3) Nehalem specific Corrected Error memory counters
3) Corrected Error memory register counters
Nehalem have some registers to count memory errors. The driver uses those
registers to report Corrected Errors on devices with Registered Dimms.
Those newer MCs have some registers to count memory errors. The driver
uses those registers to report Corrected Errors on devices with Registered
DIMMs.
However, those counters don't work with Unregistered Dimms. As the chipset
offers some counters that also work with UDIMMS (but with a worse level of
However, those counters don't work with Unregistered DIMM. As the chipset
offers some counters that also work with UDIMMs (but with a worse level of
granularity than the default ones), the driver exposes those registers for
UDIMM memories.
@ -896,8 +909,8 @@ were done at i7core_edac driver. This chapter will cover those differences
4) Standard error counters
The standard error counters are generated when an mcelog error is received
by the driver. Since, with udimm, this is counted by software, it is
possible that some errors could be lost. With rdimm's, they display the
by the driver. Since, with UDIMM, this is counted by software, it is
possible that some errors could be lost. With RDIMM's, they display the
contents of the registers
Reference documents used on ``amd64_edac``
@ -958,6 +971,7 @@ Credits
* |copy| Mauro Carvalho Chehab
- 05 Aug 2009 Nehalem interface
- 26 Oct 2016 Converted to ReST and cleanups at the Nehalem section
* EDAC authors/maintainers: