Commit Graph

1768 Commits

Author SHA1 Message Date
Linus Torvalds ff8be96420 - Add support for version 3 of the Synopsys DDR controller to synopsys_edac
- Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
   family 0x19 CPUs to amd64_edac
 
 - The usual set of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHb/PwACgkQEsHwGGHe
 VUqhbhAAo0mRNnBF3CJn1zlXRgmqrvV1IPnJQNp+z5iaXY1vr0qRMgO4OcgsJrxF
 nxrx/fdAYlQQO4vz1iq4t4j+eazOQyM/JZ0DKi4e+Dw2mC0axdCx8a0pyl1g2de6
 oQ5GkplRKUFn+3bTJpHIE5QnCOD7S85Mrp1F3Soa6jD9i+HwQIqAoltNMcCP7Yei
 ibhWUBX2H/oYcHARecIkP/YEyzSEHhcX6LRjNILW5haZQ6GziQUFzKUUwpUS3hsz
 9i6hXnHXEPhOq8JyoyWWhvVDywFK9z8lh57G7DFfZIhAk1FjuLDP2iI270D/LkYF
 shq6+M8ST9yqwOMV3Iaoa8VZFf/fjTyV0E0L2p2+faxaJ66rqdzbagLIZQv6hDNe
 N1/LD72/Io4et1kEbbaHm5jpxzSJ0jQwu1o+rY1/NmKsWhzE6V4X0GnDTZzZwP9b
 CbFJAWdCD+fi3WQjzv8HLVepjIsV+R3VVTOLq2oodn/mtoK0DRU/vTeCCwRS9ntF
 IyF55L/jSqy9CtP119KBnItGo4b84UrJDozXizGtc6Zt3chz7ljSpa2gJrKF+fCR
 Yhyr9Pt+vYQBpnIMDu1BPcoE58pwZYKoOSO00COUHHsLn+u8qhetGmQzYwWHCq7J
 hz8HHZnlTdFseZTp5tavk3B4md5HiPu+A7GevK3YH3lBv/vEmDM=
 =85ZE
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add support for version 3 of the Synopsys DDR controller to
   synopsys_edac

 - Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
   family 0x19 CPUs to amd64_edac

 - The usual set of fixes and cleanups

* tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Add support for family 19h, models 50h-5fh
  EDAC/sb_edac: Remove redundant initialization of variable rc
  RAS/CEC: Remove a repeated 'an' in a comment
  EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
  EDAC: Add RDDR5 and LRDDR5 memory types
  EDAC/sifive: Fix non-kernel-doc comment
  dt-bindings: memory: Add entry for version 3.80a
  EDAC/synopsys: Enable the driver on Intel's N5X platform
  EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR
  EDAC/synopsys: Use the quirk for version instead of ddr version
2022-01-10 11:45:23 -08:00
Linus Torvalds 7e740ae635 - First part of a series to move the AMD address translation code from
arch/x86/ to amd64_edac as that is its only user anyway
 
 - Some MCE error injection improvements to the AMD side
 
 - Reorganization of the #MC handler code and the facilities it calls to
 make it noinstr-safe
 
 - Add support for new AMD MCA bank types and non-uniform banks layout
 
 - The usual set of cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcGZ4ACgkQEsHwGGHe
 VUr6Zw//WBvNvfV/akQGsvVo94G0DaF+buYB+Tl1p0goMd7QfKA5iHxjB1alEJC2
 dTchIr7pjiiE3nr4svuWLLQZamx8kMwQNqipioBHXg3YThj0wD4PbUOC9TlIceBR
 3yxVbvwlD7Y7sb2PII6IMlagzTiIeW0/ps29DHFr5vqDBvEanNdAHoV/h2vQi+76
 Ma96psIxzTMSk11yGB6l9k66EASCdDGBU7sODjup7wuQmuRaQ/1oJAWY0wIJvJez
 frjpaz/YKmlTwTf9bxoJbky2FkeBsD4yXXUGwjDgMq0EyUUaeSbvaQkm8gSHX9Yr
 VDDv1WvT6QIw6x7Wc4skS8lWmZghNBbAHOoNS31BPJ2IDmFWkF5Q2bNEuHrtU4EC
 0mkNeyN6x48L/F8j/1aE/tm+SjiGexZX4zhi6MNWReTV140I1zqQq/r7CCu5+MEa
 PAB1YH/96k2dMPT6mbFrRIFJmkDuBuZOAkuwYWEjO/XjPl2SGBGj1jKolWW3qjRR
 Po7vBJnDt7wgigWFh6+R4rJv+fh87XfB7B2wEOt4Yn37jUkK6dNRIy0zFmDaC1J2
 bHgsJbWC+Sgs1G57gnYABJYzLj7RRdDyCu1/UUVyBBP7/WfZJw0kjABE7p3AaYTd
 15JV1L0c/Ypuv05LJf40LkyF2F5w2fnP5QM2Rr8U4xW/GumEyWs=
 =8Hu7
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:
 "A relatively big amount of movements in RAS-land this time around:

   - First part of a series to move the AMD address translation code
     from arch/x86/ to amd64_edac as that is its only user anyway

   - Some MCE error injection improvements to the AMD side

   - Reorganization of the #MC handler code and the facilities it calls
     to make it noinstr-safe

   - Add support for new AMD MCA bank types and non-uniform banks layout

   - The usual set of cleanups and fixes"

* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/mce: Reduce number of machine checks taken during recovery
  x86/mce/inject: Avoid out-of-bounds write when setting flags
  x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
  x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
  x86/mce: Check regs before accessing it
  x86/mce: Mark mce_start() noinstr
  x86/mce: Mark mce_timed_out() noinstr
  x86/mce: Move the tainting outside of the noinstr region
  x86/mce: Mark mce_read_aux() noinstr
  x86/mce: Mark mce_end() noinstr
  x86/mce: Mark mce_panic() noinstr
  x86/mce: Prevent severity computation from being instrumented
  x86/mce: Allow instrumentation during task work queueing
  x86/mce: Remove noinstr annotation from mce_setup()
  x86/mce: Use mce_rdmsrl() in severity checking code
  x86/mce: Remove function-local cpus variables
  x86/mce: Do not use memset to clear the banks bitmaps
  x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  x86/mce/inject: Check if a bank is populated before injecting
  x86/mce: Get rid of cpu_missing
  ...
2022-01-10 11:43:09 -08:00
Borislav Petkov da0119a912 Merge branches 'edac-misc' and 'edac-amd64' into edac-updates-for-v5.17
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-01-10 10:07:00 +01:00
Qiuxu Zhuo c370baa328 EDAC/i10nm: Release mdev/mbase when failing to detect HBM
On systems without HBM (High Bandwidth Memory) mdev/mbase are not
released/unmapped.

Add the code to release mdev/mbase when failing to detect HBM.

[Tony: re-word commit message]

Cc: <stable@vger.kernel.org>
Fixes: c945088384 ("EDAC/i10nm: Add support for high bandwidth memory")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com
2022-01-04 09:08:00 -08:00
Marc Bevand 0b8bf9cb14 EDAC/amd64: Add support for family 19h, models 50h-5fh
Add the new family 19h models 50h-5fh PCI IDs (device 18h functions 0
and 6) to support Ryzen 5000 APUs ("Cezanne").

Signed-off-by: Marc Bevand <m@zorinaq.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20211221233112.556927-1-m@zorinaq.com
2021-12-24 10:53:41 +01:00
Yazen Ghannam 91f75eb481 x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
2021-12-22 17:22:09 +01:00
Yazen Ghannam 5176a93ab2 x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
2021-12-22 17:19:18 +01:00
Colin Ian King 567617baac EDAC/sb_edac: Remove redundant initialization of variable rc
The variable rc is being initialized with a value that is never read, it
is being updated later on. The assignment is redundant and thus remove
it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211126221848.1125321-1-colin.i.king@gmail.com
2021-12-21 12:02:11 +01:00
Yazen Ghannam e2be5955a8 EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
Add a new family type for AMD Family 19h Models 10h to 1Fh. Use this new
family type for Models A0h to AFh also.

Increase the maximum number of controllers from 8 to 12.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208174356.1997855-3-yazen.ghannam@amd.com
2021-12-10 12:54:33 +01:00
Yazen Ghannam f957112423 EDAC: Add RDDR5 and LRDDR5 memory types
Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory
types.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com
2021-12-10 12:51:28 +01:00
Randy Dunlap ad2c302bc6 EDAC/sifive: Fix non-kernel-doc comment
scripts/kernel-doc complains about a comment that begins with "/**"
but is not in kernel-doc format, so correct it.

Prevents this warning:

  drivers/edac/sifive_edac.c:23: warning: This comment starts with '/**', \
  but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * EDAC error callback

Fixes: 91abaeaaff ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211201030913.10283-1-rdunlap@infradead.org
2021-12-05 19:54:46 +01:00
Dinh Nguyen f6bc0d8bc2 EDAC/synopsys: Enable the driver on Intel's N5X platform
Intel's N5X platform is also using the Synopsys EDAC controller.

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lkml.kernel.org/r/20211012190709.1504152-3-dinguyen@kernel.org
2021-11-20 19:41:48 +01:00
Dinh Nguyen f7824ded41 EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR
Add support for version 3.80a of the Synopsys DDR controller. This
version of the controller has the following differences:

- UE/CE are auto cleared
- Interrupts are supported by default

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lkml.kernel.org/r/20211012190709.1504152-2-dinguyen@kernel.org
2021-11-20 19:23:49 +01:00
Dinh Nguyen bd1d6da17c EDAC/synopsys: Use the quirk for version instead of ddr version
Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that
quirk to determine a call to setup_address_map().

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org
2021-11-20 18:13:06 +01:00
Yazen Ghannam 70aeb807cf EDAC/amd64: Add context struct
Define an address translation context struct. This will hold values that
will be passed between multiple functions.

Save return address, Node ID, and the Instance ID number to start.
Currently, the UMC number is used as the Instance ID, but future DF
versions may use another value.

Also include a "tmp" field to use when reading registers. This is to
avoid having to define a temporary variable in multiple functions.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com
2021-11-15 12:54:16 +01:00
Yazen Ghannam 448c3d6085 EDAC/amd64: Allow for DF Indirect Broadcast reads
The DF Indirect Access method allows for "Broadcast" accesses in which
case no specific instance is targeted. Add support using a reserved
instance ID of 0xFF to indicate a broadcast access. Set the FICAA
register appropriately.

Define helpers functions for instance and broadcast reads and use them
where appropriate.

Drop the "amd_" prefix since these functions are all static.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com
2021-11-15 12:48:55 +01:00
Yazen Ghannam b3218ae477 x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDAC
df_indirect_read() is used only for address translation. Move it to EDAC
along with the translation code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com
2021-11-15 12:44:47 +01:00
Yazen Ghannam 0b746e8c1e x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDAC
The address translation code used for current AMD systems is
non-architectural. So move it to EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.com
2021-11-15 12:36:32 +01:00
Linus Torvalds fe354159ca - amd64_edac: Add support for three-rank interleaving mode which is
present on AMD zen2 servers
 
 - The usual fixes and cleanups all over EDAC land
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/rQAACgkQEsHwGGHe
 VUqTsg//QSI/akiJnnZ5Ea3CMsFAWH/LIAZzjWa3zcit6OcxrjRMsym1JGs4llUY
 7Uis0lGTUH2je3kFrYdPVGXyoWPNdsQCQi/89Anabl5Jrf2u7N2MmUIKcCnzQWjS
 PHb4BSSWW2zR8wU8UXOo2I1QKltenKlaO97uE9Te9YZCYZ6PzS0z2qevd0SCPUtF
 tPQ+AqYNKE0JNNAqlPietSHs7MisyCuxTHqLiwC0QNuKJ3GhCtVhmYGKqaLOMcfr
 oKdBFaAZOhmtUFT0z8cLka8sKO5WDfqli9Qp6Zphv9E6iz6gJq7NCBaKRz8gOkPi
 boOjmSOhCi/xJq63DUtfKOogVRM+qhms1Kxni3EgJxyGR/oLwnVX4n0AopKBstb6
 AsHPUB8AxrtnEtoBPFM/bO18Eto3HkCZD2yCzztOc5NmtUnVlIh2/qN4oYjBG7pc
 20iXwZ49OAYVQwrLdoNvyXQ5BdKGC5tyMgvPWVyOO8Iad2YYfM5iUWDGSycEWSo3
 /xZgVljbFRBwjxaTqgoUDxOlBEgQsDg4ENTathCpxuR+meac/kqGZDSLc0zc7hV1
 zV2kHB7y6gWGAQiuS3Lq+j0BTNtgbUi43R61J00ecQEIOomsBaWLi6uewNJmvOZ9
 Ef7c0t/ree/PMTaXVer0IA8lU3vcHBMTrbNtQsPb4TCRsZOgFhI=
 =O0zA
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:
 "A small pile of EDAC updates which the autumn wind blew my way. :)

   - amd64_edac: Add support for three-rank interleaving mode which is
     present on AMD zen2 servers

   - The usual fixes and cleanups all over EDAC land"

* tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
  EDAC/ti: Remove redundant error messages
  EDAC/amd64: Handle three rank interleaving mode
  EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned
  EDAC/al_mc: Make use of the helper function devm_add_action_or_reset()
  EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()
2021-11-01 15:02:49 -07:00
Hans Potsch d9b7748ffc EDAC/armada-xp: Fix output of uncorrectable error counter
The number of correctable errors is displayed as uncorrectable
errors because the "SBE" error count is passed to both calls of
edac_mc_handle_error().

Pass the correct uncorrectable error count to the second
edac_mc_handle_error() call when logging uncorrectable errors.

 [ bp: Massage commit message. ]

Fixes: 7f6998a412 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC")
Signed-off-by: Hans Potsch <hans.potsch@nokia.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com
2021-10-14 11:46:03 +02:00
Eric Badger 537bddd069 EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
The computation of TOHM is off by one bit. This missed bit results in
too low a value for TOHM, which can cause errors in regular memory to
incorrectly report:

  EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory

Fixes: 50d1bb9367 ("sb_edac: add support for Haswell based systems")
Cc: stable@vger.kernel.org
Reported-by: Meeta Saggi <msaggi@purestorage.com>
Signed-off-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com
2021-10-11 08:28:46 -07:00
Tang Bin 0b6d4ab216 EDAC/ti: Remove redundant error messages
In the function ti_edac_probe(), devm_ioremap_resource() and
platform_get_irq() already issue error messages when they fail so remove
the redundant error messages in the EDAC driver.

Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210811112626.27848-1-tangbin@cmss.chinamobile.com
2021-10-07 19:16:01 +02:00
Yazen Ghannam 9f4873fb6a EDAC/amd64: Handle three rank interleaving mode
AMD Rome systems and later support interleaving between three identical
ranks within a channel.

Check for this mode by counting the number of enabled chip selects and
comparing their masks. If there are exactly three enabled chip selects
and their masks are identical, then three rank interleaving is enabled.

The size of a rank is determined from its mask value. However, three
rank interleaving doesn't follow the method of swapping an interleave
bit with the most significant bit. Rather, the interleave bit is flipped
and the most significant bit remains the same. There is only a single
interleave bit in this case.

Account for this when determining the chip select size by keeping the
most significant bit at its original value and ignoring any zero bits.
This will return a full bitmask in [MSB:1].

Fixes: e53a3b267f ("EDAC/amd64: Find Chip Select memory size using Address Mask")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com
2021-10-07 12:30:53 +02:00
Eric Badger 34417f27b9 EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned
This is cosmetically nicer for counts > INT32_MAX, and aligns the
MC-scope format with that of the lower layer sysfs counter files.

Signed-off-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20211003181653.GA685515@ebadger-ThinkPad-T590
2021-10-07 12:06:20 +02:00
Cai Huoqing 470b52564c EDAC/al_mc: Make use of the helper function devm_add_action_or_reset()
The helper function devm_add_action_or_reset() will internally call
devm_add_action(), and if devm_add_action() fails then it will
execute the action mentioned and return the error code. So use
devm_add_action_or_reset() instead of devm_add_action() to simplify the
error handling, reduce the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Talel Shenhar <talel@amazon.com>
Link: https://lkml.kernel.org/r/20210922125924.321-1-caihuoqing@baidu.com
2021-09-28 18:35:11 +02:00
Borislav Petkov 54607282fa EDAC/dmc520: Assign the proper type to dimm->edac_mode
dimm->edac_mode contains values of type enum edac_type - not the
corresponding capability flags. Fix that.

Fixes: 1088750d78 ("EDAC: Add EDAC driver for DMC520")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210916085258.7544-1-bp@alien8.de
2021-09-16 11:00:12 +02:00
Sai Krishna Potthuri 5297cfa6bd EDAC/synopsys: Fix wrong value type assignment for edac_mode
dimm->edac_mode contains values of type enum edac_type - not the
corresponding capability flags. Fix that.

Issue caught by Coverity check "enumerated type mixed with another
type."

 [ bp: Rewrite commit message, add tags. ]

Fixes: ae9b56e399 ("EDAC, synps: Add EDAC support for zynq ddr ecc controller")
Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com
2021-09-16 10:21:35 +02:00
Len Baker fca6116564 EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()
strcpy() performs no bounds checking on the destination buffer. This
could result in linear overflows beyond the end of the buffer, leading
to all kinds of misbehavior. The safe replacement is strscpy().
[1][2]

However, to simplify and clarify the code, to concatenate labels use the
scnprintf() function. This way it is not necessary to check the return
value of strscpy() (-E2BIG if the parameter count is 0 or the src was
truncated) since scnprintf() always returns the number of chars written
into the buffer. This function always returns a nul-terminated string
even if it needs to be truncated.

While at it, fix all other broken string generation code that wrongly
interprets snprintf()'s return code or just uses sprintf(), implement
that using scnprintf() here too. Drop breaks in loops around
scnprintf() as it is safe now to loop.

Moreover, the check is not needed: for the case when the buffer is
exhausted, len never gets zero because scnprintf() takes the full buffer
length as input parameter, but excludes the trailing '\0' in its return
code and thus, 1 is the minimum len.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
[2] https://github.com/KSPP/linux/issues/88

 [ rric: Replace snprintf() with scnprintf(), rework sprintf() user,
   drop breaks in loops around scnprintf(), introduce 'end' pointer to
   reduce pointer arithmetic, use prefix pattern for e->location,
   adjust subject and description ]

Co-developed-by: Joe Perches <joe@perches.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Len Baker <len.baker@gmx.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210903150539.7282-1-len.baker@gmx.com
2021-09-15 13:52:58 +02:00
Linus Torvalds 7d6e3fa87e Updates to the interrupt core and driver subsystems:
Core changes:
 
    - The usual set of small fixes and improvements all over the place, but nothing
      outstanding
 
 MSI changes:
 
    - Further consolidation of the PCI/MSI interrupt chip code
 
    - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts
      of platform devices in the same way as PCI exposes them.
 
 Driver changes:
 
    - Support for ARM GICv3 EPPI partitions
 
    - Treewide conversion to generic_handle_domain_irq() for all chained
      interrupt controllers
 
    - Conversion to bitmap_zalloc() throughout the irq chip drivers
 
    - The usual set of small fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmEsnpsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoS+/EACQdpRkzl3IDIYqThxVZ8KQzp2rKKVn
 qisAQiWg/6koNJx/yYy62KNAUyKjCIObNtRnWi7OAOx6OvNtQTD2WOLAwkh3Pgw1
 8ePYYl55k+yCs8VoITsZM9jYeO+Tk878pU2A6R943zR+g6G7bskGJrxEyZ9TbzIe
 qKfusNKnRY9/jMQaRALUAAtA9VIVR867GqORX5X8hKz8yE2rqlpb4y+1CFba5BTV
 Vlxw7cIXvXBn7BKAom5diRqEGDNJEbX+56jJ7yDZshgLo7m11D7QLw72kmb6TNVC
 g7PchvFi4afpc1ifEAAp0tk4RiSIAQ91nS3n0+jLcLbodOjIkl14eY02ZCJGAP29
 uslyzUbmy1wgejG6CA63JtZ4MYdrf/OSMGuoN78qnOKYcIsWFzOvlJmBWWNW34qW
 LCaUF9QdJ/slXu6B4vIx30GfN9q4myml8bFUobE5q9mBRrEk4R0B7iyBvPu1xKYr
 ZEan67prI5VEu+afJGpp4r294m4HNVkMLfl3nYmE5+y4MoLeMNKDY3IPTvI9iP4G
 kaFgoPvQo23WnuclNYpJ+CaA4aRASlB2nTY+oAXIYfehbey9EW5vq4/EK864ek6w
 oyUTepxxNhE81tG2jpQbf2tR4COsEHy986clxqPP4AvsZXcbypCw8O2FcflpQbHO
 5DLEAfTmp7cziQ==
 =qyll
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates to the interrupt core and driver subsystems:

  Core changes:

   - The usual set of small fixes and improvements all over the place,
     but nothing stands out

  MSI changes:

   - Further consolidation of the PCI/MSI interrupt chip code

   - Make MSI sysfs code independent of PCI/MSI and expose the MSI
     interrupts of platform devices in the same way as PCI exposes them.

  Driver changes:

   - Support for ARM GICv3 EPPI partitions

   - Treewide conversion to generic_handle_domain_irq() for all chained
     interrupt controllers

   - Conversion to bitmap_zalloc() throughout the irq chip drivers

   - The usual set of small fixes and improvements"

* tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  platform-msi: Add ABI to show msi_irqs of platform devices
  genirq/msi: Move MSI sysfs handling from PCI to MSI core
  genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
  irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy
  irqdomain: Export irq_domain_disconnect_hierarchy()
  irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
  irqchip/apple-aic: Fix irq_disable from within irq handlers
  pinctrl/rockchip: drop the gpio related codes
  gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type
  gpio/rockchip: support next version gpio controller
  gpio/rockchip: use struct rockchip_gpio_regs for gpio controller
  gpio/rockchip: add driver for rockchip gpio
  dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank
  pinctrl/rockchip: add pinctrl device to gpio bank struct
  pinctrl/rockchip: separate struct rockchip_pin_bank to a head file
  pinctrl/rockchip: always enable clock for gpio controller
  genirq: Fix kernel doc indentation
  EDAC/altera: Convert to generic_handle_domain_irq()
  powerpc: Bulk conversion to generic_handle_domain_irq()
  nios2: Bulk conversion to generic_handle_domain_irq()
  ...
2021-08-30 14:38:37 -07:00
Linus Torvalds 05b5fdb2a8 - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it
to the Intel SKx drivers
 
 - Print additional useful per-channel error information on i10nm, like on SKL
 
 - Check whether the AMD decoder is loaded on a guest and if so, don't
 
 - The usual round of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmEso2QACgkQEsHwGGHe
 VUqZRRAAqraSdImfO4a7dV4lHErI43GDx9WZIKhrfNyvEv32pDc7E6dwUqJ+NHa1
 ON469MjmV4N3X5Deq+Ecaij6giieBKazfqbbZA9pSbNtX4+ta+TEDTCmO7IX38BC
 pESwHlWinJbpy8SfopXQh9wgx0pWOHfPysStHS0pmG5mH+o9mIDqsn/20Q0o/kIY
 Cn+PZ1mJ2f7iJXvjQv7XsGMM/HotLFkJ5RNUoVjZqteOOdTcHRWytuL/r7QY0SpX
 iWAcmkkW3gcW1qypvo/n53ghxvSydZpqWJhI25QBJ79CGOCVocYfitkIdIYAYOtt
 xH82GFuC5EtDBTsQXQa2gzbraTwnxPCr6BXKY819a7+xsw/WX2x7MctZc6vCH4R+
 UoDRQoUAqgNG2D89lgtQHspy3h3gSSWbGRysLdFCohJlDv3mCU4hk7yyvoCtatqR
 ki725nkJn0XT1HZAQc+o3p6E+PCmVYd2Km8TtNJ2ZS8z+97rjsZQAV5eeZKLnUC7
 lVX57lZ9MciSlrBNe1k9b1er1Ir3/rLs1X33K+yqE8IwwCX1hEw1yPNEf6jSXQwK
 qce95INCamyl1Z8y0uhFZfgl8aEN5ZbaL/lBYz1Gk5R7IQCSVd9Amq5fqxOUdPb/
 PcJk24Ng+h0oPKiBBVHrsDhFiDnZKAv7JGqEPFNmOi3dWZPrNrI=
 =JYRz
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:
 "The usual EDAC stuff which managed to trickle in for 5.15:

   - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for
     it to the Intel SKx drivers

   - Print additional useful per-channel error information on i10nm,
     like on SKL

   - Don't load the AMD EDAC decoder in virtual images

   - The usual round of fixes and cleanups"

* tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/i10nm: Retrieve and print retry_rd_err_log registers
  EDAC/i10nm: Fix NVDIMM detection
  EDAC/skx_common: Set the memory type correctly for HBM memory
  EDAC/altera: Skip defining unused structures for specific configs
  EDAC/mce_amd: Do not load edac_mce_amd module on guests
  EDAC/mc: Add new HBM2 memory type
  EDAC/amd64: Use DEVICE_ATTR helper macros
2021-08-30 13:17:29 -07:00
Youquan Song cf4e6d52f5 EDAC/i10nm: Retrieve and print retry_rd_err_log registers
Retrieve and print retry_rd_err_log registers like the earlier change:
commit e80634a75a ("EDAC, skx: Retrieve and print retry_rd_err_log registers")

This is a little trickier than on Skylake because of potential
interference with BIOS use of the same registers. The default
behavior is to ignore these registers.

A module parameter retry_rd_err_log(default=0) controls the mode of operation:
- 0=off  : Default.
- 1=bios : Linux doesn't reset any control bits, but just reports values.
           This is "no harm" mode, but it may miss reporting some data.
- 2=linux: Linux tries to take control and resets mode bits,
           clears valid/UC bits after reading. This should be
           more reliable (especially if BIOS interference is reduced
           by disabling eMCA reporting mode in BIOS setup).

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com
2021-08-23 10:35:36 -07:00
Qiuxu Zhuo 2294a7299f EDAC/i10nm: Fix NVDIMM detection
MCDDRCFG is a per-channel register and uses bit{0,1} to indicate
the NVDIMM presence on DIMM slot{0,1}. Current i10nm_edac driver
wrongly uses MCDDRCFG as per-DIMM register and fails to detect
the NVDIMM.

Fix it by reading MCDDRCFG as per-channel register and using its
bit{0,1} to check whether the NVDIMM is populated on DIMM slot{0,1}.

Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Reported-by: Fan Du <fan.du@intel.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210818175701.1611513-2-tony.luck@intel.com
2021-08-23 10:35:36 -07:00
Qiuxu Zhuo fd07a4a0d3 EDAC/skx_common: Set the memory type correctly for HBM memory
Set the memory type to MEM_HBM2 if it's managed by the HBM2
memory controller.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210720163009.GA1417532@agluck-desk2.amr.corp.intel.com
2021-08-23 10:32:32 -07:00
Krzysztof Kozlowski 7d07deb3b8 EDAC/altera: Skip defining unused structures for specific configs
The Altera EDAC driver has several features conditionally built
depending on Kconfig options. The edac_device_prv_data structures
are conditionally used in of_device_id tables. They reference other
functions and structures which can be defined as __maybe_unused.

Silence build warnings like:

  drivers/edac/altera_edac.c:643:37: warning:
      ‘altr_edac_device_inject_fops’ defined but not used [-Wunused-const-variable=]

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://lkml.kernel.org/r/20210601092704.203555-1-krzysztof.kozlowski@canonical.com
2021-08-16 20:21:46 +02:00
Marc Zyngier eecb06813d EDAC/altera: Convert to generic_handle_domain_irq()
Replace generic_handle_irq(irq_linear_revmap()) with a single call to
generic_handle_domain_irq().

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-12 11:39:41 +01:00
Smita Koralahalli 767f4b620e EDAC/mce_amd: Do not load edac_mce_amd module on guests
Hypervisors likely do not expose the SMCA feature to the guest and
loading this module leads to false warnings. This module should not be
loaded in guests to begin with, but people tend to do so, especially
when testing kernels in VMs. And then they complain about those false
warnings.

Do the practical thing and do not load this module when running as a
guest to avoid all that complaining.

 [ bp: Rewrite commit message. ]

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com
2021-08-09 12:35:43 +02:00
Naveen Krishna Chatradhi e1ca90b7cc EDAC/mc: Add new HBM2 memory type
Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]'
for HBM2 (High Bandwidth Memory Gen 2) new memory type.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210630152828.162659-4-nchatrad@amd.com
2021-07-20 09:20:49 -07:00
Randy Dunlap a1c9ca5f65 EDAC/igen6: fix core dependency AGAIN
My previous patch had a typo/thinko which prevents this driver
from being enabled: change X64_64 to X86_64.

Fixes: 0a9ece9ba1 ("EDAC/igen6: fix core dependency")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac@vger.kernel.org
Cc: bowsingbetee <bowsingbetee@protonmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-15 11:59:59 -07:00
Dwaipayan Ray d19faf0e49 EDAC/amd64: Use DEVICE_ATTR helper macros
Instead of "open coding" DEVICE_ATTR, use the corresponding
helper macros DEVICE_ATTR_{RW,RO,WO} in amd64_edac.c

Some function names needed to be changed to match the device
conventions <foo>_show and <foo>_store, but the functionality
itself is unchanged.

The devices using EDAC_DCT_ATTR_SHOW() are left unchanged.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210713065130.2151-1-dwaipayanray1@gmail.com
2021-07-13 09:59:57 -07:00
Linus Torvalds 71bd934101 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
2021-07-02 12:08:10 -07:00
Andy Shevchenko f39650de68 kernel.h: split out panic and oops helpers
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
  Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:04 -07:00
Linus Torvalds 4b5e35ce07 Various fixes and support for new CPUS
- Clean up error messages from thunderx_edac
 - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload
 - Use %pR to print resources in aspeed_edac
 - Add Yazen Ghannam as MAINTAINER for AMD edac drivers
 - Fix Ice Lake and Sapphire Rapids drivers to report correct
   "near" or "far" device for errors in 2LM configurations
 - Add support of on package high bandwidth memory in Sapphire Rapids
 - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for
   ICL-NNPI, Tiger Lake and Alder Lake)
 - Don't even try to load Intel EDAC drivers when running as a guest
 - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQQW3WBGcnu5yJnSXn0kTJLX0iGMLAUCYNujCRQcdG9ueS5sdWNr
 QGludGVsLmNvbQAKCRAkTJLX0iGMLCazAPsHoGc9ymStw0hL06Lw71Va/3VyqiUZ
 ha1OTWXCcV42UgD/QlwhKkHC3UkEI1dTEAI6McXcC88s8+yXQ8fKAPeQyQw=
 =iTWK
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Tony Luck:
 "Various fixes and support for new CPUs:

   - Clean up error messages from thunderx_edac

   - Add MODULE_DEVICE_TABLE to ti_edac so it will autoload

   - Use %pR to print resources in aspeed_edac

   - Add Yazen Ghannam as MAINTAINER for AMD edac drivers

   - Fix Ice Lake and Sapphire Rapids drivers to report correct "near"
     or "far" device for errors in 2LM configurations

   - Add support of on package high bandwidth memory in Sapphire Rapids

   - New CPU support for three CPUs supporting in-band ECC (IOT SKUs for
     ICL-NNPI, Tiger Lake and Alder Lake)

   - Don't even try to load Intel EDAC drivers when running as a guest

   - Fix Kconfig dependency on X86_MCE_INTEL for EDAC_IGEN6"

* tag 'edac_updates_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/igen6: fix core dependency
  EDAC/Intel: Do not load EDAC driver when running as a guest
  EDAC/igen6: Add Intel Alder Lake SoC support
  EDAC/igen6: Add Intel Tiger Lake SoC support
  EDAC/igen6: Add Intel ICL-NNPI SoC support
  EDAC/i10nm: Add support for high bandwidth memory
  EDAC/i10nm: Add detection of memory levels for ICX/SPR servers
  EDAC/skx_common: Add new ADXL components for 2-level memory
  MAINTAINERS: Make Yazen Ghannam maintainer for EDAC-AMD64
  EDAC/aspeed: Use proper format string for printing resource
  EDAC/ti: Add missing MODULE_DEVICE_TABLE
  EDAC/thunderx: Remove irrelevant variable from error messages
2021-06-30 11:27:49 -07:00
Randy Dunlap 0a9ece9ba1 EDAC/igen6: fix core dependency
igen6_edac needs mce_register()/unregister() functions,
so it should depend on X86_MCE (or X86_MCE_INTEL).

That change prevents these build errors:

ld: drivers/edac/igen6_edac.o: in function `igen6_remove':
igen6_edac.c:(.text+0x494): undefined reference to `mce_unregister_decode_chain'
ld: drivers/edac/igen6_edac.o: in function `igen6_probe':
igen6_edac.c:(.text+0xf5b): undefined reference to `mce_register_decode_chain'

Fixes: 10590a9d4f ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210619160203.2026-1-rdunlap@infradead.org
2021-06-20 14:04:48 -07:00
Luck, Tony f0a029fff4 EDAC/Intel: Do not load EDAC driver when running as a guest
There's little to no point in loading an EDAC driver running in a guest:
1) The CPU model reported by CPUID may not represent actual h/w
2) The hypervisor likely does not pass in access to memory controller devices
3) Hypervisors generally do not pass corrected error details to guests

Add a check in each of the Intel EDAC drivers for X86_FEATURE_HYPERVISOR
and simply return -ENODEV in the init routine.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210615174419.GA1087688@agluck-desk2.amr.corp.intel.com
2021-06-17 18:23:14 -07:00
Qiuxu Zhuo ad774bd5a8 EDAC/igen6: Add Intel Alder Lake SoC support
Alder Lake SoC shares the same memory controller and In-Band ECC
(IBECC) IP with Tiger Lake SoC. Like Tiger Lake, it also has two
memory controllers each associated one IBECC instance. The minor
differences include the MMIO offset of each memory controller and
the type of memory error address logged in the IBECC.

So add Alder Lake compute die IDs, adjust the MMIO offset for each
memory controller and handle the type of memory error address logged
in the IBECC for Alder Lake EDAC support.

Tested-by: Vrukesh V Panse <vrukesh.v.panse@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-7-tony.luck@intel.com
2021-06-17 18:20:01 -07:00
Qiuxu Zhuo 0b7338b27e EDAC/igen6: Add Intel Tiger Lake SoC support
Tiger Lake SoC shares the same memory controller and In-Band ECC
(IBECC) IP with Elkhart Lake SoC. The main differences are that Tiger
Lake has two memory controllers each associated with one IBECC and
uses Machine Check for the memory error notification.

So add Tiger Lake compute die IDs, MCE decoding chain registration,
and memory slice decoding for Tiger Lake EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-6-tony.luck@intel.com
2021-06-17 18:19:53 -07:00
Qiuxu Zhuo 4e591c0568 EDAC/igen6: Add Intel ICL-NNPI SoC support
The Ice Lake Neural Network Processor for Deep Learning Inference
(ICL-NNPI) SoC shares the same memory controller and In-Band ECC with
Elkhart Lake SoC. Add the ICL-NNPI compute die IDs for EDAC support.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-5-tony.luck@intel.com
2021-06-17 18:19:46 -07:00
Qiuxu Zhuo c945088384 EDAC/i10nm: Add support for high bandwidth memory
A future Xeon processor will include in-package HBM (high bandwidth
memory). The in-package HBM memory controller shares the same
architecture with the regular DDR memory controller.

Add the HBM memory controller devices for EDAC support.

Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-4-tony.luck@intel.com
2021-06-17 18:19:39 -07:00
Qiuxu Zhuo 4bd4d32e9a EDAC/i10nm: Add detection of memory levels for ICX/SPR servers
Current i10nm_edac driver is only for system configured in 1-level
memory. If the system is configured in 2-level memory, the driver
doesn't report the 1st level memory DIMM for the error address, even
if the error occurs in the 1st level memory.

Both Ice Lake servers and Sapphire Rapids servers can be configured
in 2-level memory. Add detection of memory levels to i10nm_edac for
the two kinds of servers so that the driver can report the 2nd level
memory DIMM or the 1st level memory DIMM according to error source.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-3-tony.luck@intel.com
2021-06-17 18:19:30 -07:00
Qiuxu Zhuo 2f4348e5a8 EDAC/skx_common: Add new ADXL components for 2-level memory
Some Intel servers may configure memory in 2 levels, using
fast "near" memory (e.g. DDR) as a cache for larger, slower,
"far" memory (e.g. 3D X-point).

In these configurations the BIOS ADXL address translation for
an address in a 2-level memory range will provide details of
both the "near" and far components.

Current exported ADXL components are only for 1-level memory
system or for 2nd level memory of 2-level memory system. So
add new ADXL components for 1st level memory of 2-level memory
system to fully support 2-level memory system and the detection
of memory error source(1st level memory or 2nd level memory).

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-2-tony.luck@intel.com
2021-06-17 18:19:22 -07:00