Commit Graph

335 Commits

Author SHA1 Message Date
Borislav Petkov (AMD) ce8ac91130 Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updates
Combine all queued EDAC changes for submission into v6.4:

* ras/edac-drivers:
  EDAC/i10nm: Add Intel Sierra Forest server support
  EDAC/skx: Fix overflows on the DRAM row address mapping arrays

* ras/edac-amd64: (27 commits)
  EDAC/amd64: Fix indentation in umc_determine_edac_cap()
  EDAC/amd64: Add get_err_info() to pvt->ops
  EDAC/amd64: Split dump_misc_regs() into dct/umc functions
  EDAC/amd64: Split init_csrows() into dct/umc functions
  EDAC/amd64: Split determine_edac_cap() into dct/umc functions
  EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
  EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
  EDAC/amd64: Split ecc_enabled() into dct/umc functions
  EDAC/amd64: Split read_mc_regs() into dct/umc functions
  EDAC/amd64: Split determine_memory_type() into dct/umc functions
  EDAC/amd64: Split read_base_mask() into dct/umc functions
  EDAC/amd64: Split prep_chip_selects() into dct/umc functions
  EDAC/amd64: Rework hw_info_{get,put}
  EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
  EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
  EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
  EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
  EDAC/amd64: Rename debug_display_dimm_sizes()

* ras/edac-misc:
  EDAC/altera: Remove MODULE_LICENSE in non-module
  EDAC: Sanitize MODULE_AUTHOR strings
  EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
  EDAC/i5100: Fix typo in comment
  EDAC/altera: Remove redundant error logging

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-04-24 09:14:30 +02:00
Yang Li 49aba1c589 EDAC/amd64: Fix indentation in umc_determine_edac_cap()
Use consistent indentation to improve the readability and fix:

  drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting

Fixes: f6a4b4a1aa ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com
2023-04-04 17:22:55 +02:00
Borislav Petkov (AMD) 371b27f2f3 EDAC: Sanitize MODULE_AUTHOR strings
Fixup the remaining MODULE_AUTHOR strings to not contain newlines.
Shorten and unbreak others.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de
2023-03-28 15:43:30 +02:00
Muralidhara M K b3ece3a6a2 EDAC/amd64: Add get_err_info() to pvt->ops
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.

Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.

Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K f6f36382d6 EDAC/amd64: Split dump_misc_regs() into dct/umc functions
Add a function pointer to pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K 6fb8b5fb9e EDAC/amd64: Split init_csrows() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Muralidhara M K f6a4b4a1aa EDAC/amd64: Split determine_edac_cap() into dct/umc functions
Call them from their respective setup_mci_misc_attrs() paths.

No functional change is intended.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com
2023-03-24 13:03:21 +01:00
Yazen Ghannam 9369239e8d EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
...to match the "umc_" prefix convention.

No functional change is intended.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 0a42a37f65 EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
The init_one_instance() path is shared between legacy and modern
systems. So add the new functions to a function pointer in pvt->ops.

No functional change is intended.

  [ Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K eb2bcdfc37 EDAC/amd64: Split ecc_enabled() into dct/umc functions
Call them using a function pointer in pvt->ops. The "ECC enabled"
check is done outside of the hardware information gathering done in
hw_info_get(). So a high-level function pointer is needed to separate
the legacy and modern paths.

No functional change is intended.

  [Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 32ecdf8688 EDAC/amd64: Split read_mc_regs() into dct/umc functions
Call them from their respective hw_info_get() paths.

ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path. Do not save TOP_MEM* values on modern
controllers because they're not needed there (read: they were used only
for debugging, if anything).

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 78ec161a91 EDAC/amd64: Split determine_memory_type() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call them after all other hardware registers have been saved, since the
memory type for a device will be determined based on the saved
information.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K b29dad9bf3 EDAC/amd64: Split read_base_mask() into dct/umc functions
Call them from their respective hw_info_get() paths.

Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.

  [Yazen: Rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K 637f60ef2c EDAC/amd64: Split prep_chip_selects() into dct/umc functions
Call them from their respective hw_info_get() function. Avoid the
need for family/model-based function pointers.

Add the calls before reading hardware registers from the memory
controllers, since the number of chip select bases and masks needs to be
known first.

  [ Yazen: rebased/reworked patch and reworded commit message. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Yazen Ghannam 9a97a7f4d7 EDAC/amd64: Rework hw_info_{get,put}
The bulk of system-specific information is gathered at init time with
hw_info_get(). This function calls a number of helper functions, and
many of these helper functions are split between a modern UMC/DF path
and a legacy DCT path.

Split hw_info_get() into legacy and modern versions. This creates two
separate code paths early on, and legacy and modern helper functions can
be called directly in the appropriate code path.

Also, simplify hw_info_put() and share it between legacy and modern
systems. NULL pointer checks are done in pci_dev_put() and kfree(), so
they can be called unconditionally.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com
2023-03-24 13:03:20 +01:00
Muralidhara M K ed623d55ee EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU and GPU types. Therefore, a global family type shared across all
AMD nodes is no longer appropriate.

Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.

Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
GPU cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.

In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.

  [ Yazen: rebased/reworked patch and reworded commit message. ]
  [  bp: Fix rev8 or later check. ]

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam 5a1adb375d EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
The ECC symbol size was needed on legacy system to lookup the ECC syndrome.
This is not needed on modern systems because the ECC syndrome is explicitly
provided in the MCA information.

Remove the ECC symbol size discovery code for modern UMC-based systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam a2e59ab8e9 EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
The same function is used to calculate chip select size for all Zen-based
family/models. Therefore, a family/model function pointer is not necessary.

Drop the dbam_to_cs() function pointer for Family 17h and later systems.
Also, move the Family 17h function to avoid a forward declaration. Rename
it to indicate that the UMC Address Mask is used rather than the legacy
DBAM value.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam c0984666fd EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
Split get_csrow_nr_pages() into a legacy and modern versions in preparation
for further legacy/modern refactoring.

Also, rename f17_get_cs_mode() to match the new convention.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com
2023-03-24 13:03:19 +01:00
Yazen Ghannam 00e4feb8c0 EDAC/amd64: Rename debug_display_dimm_sizes()
Use the "dct" and "umc" prefixes for legacy and modern versions
respectively.

Also, move the "dct" version to avoid a forward declaration, and fixup
some checkpatch warnings in the process.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com
2023-03-24 12:54:47 +01:00
Yazen Ghannam 28980db947 EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive
Reportedly, clang cannot do interprocedural analysis:

  https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org

and see that those arguments won't be used uninitialized.

So, yeah, the code's fine even without this. Normally, such a "fix"
won't be applied but that warning gets automatically enabled in -Wall
builds and when CONFIG_WERROR is set in allmodconfig builds, the build
fails.

So shut it up with a minimal fix as this code will see more
reorganization very soon.

  [ bp: Write commit message. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X
2023-02-14 17:56:14 +01:00
Yazen Ghannam c4605bde33 EDAC/amd64: Remove early_channel_count()
The early_channel_count() function seems to have been useful in the past
for knowing how many EDAC mci structures to populate. However, this is no
longer needed as the maximum channel count for a system is used instead.

Remove the early_channel_count() helper functions and related code. Use the
size of the channel layer when iterating over channel structures.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com
2023-02-09 14:43:39 +01:00
Yazen Ghannam cf981562e6 EDAC/amd64: Remove PCI Function 0
PCI Function 0 is used on Family 17h and later only to read the "dhar"
value. This value is printed and provided through a module-specific
debug sysfs file. The value is not used for any Family 17h and later
code, and it does not have any apparent debug value on these systems.

Remove "dhar", Function 0 PCI IDs, and all related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com
2023-02-09 14:41:56 +01:00
Yazen Ghannam 6229235f7c EDAC/amd64: Remove PCI Function 6
PCI Function 6 is used on Family 17h and later to access scrub
registers. With scrub access removed, this function has no other use.

Remove all Function 6 PCI IDs and related code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com
2023-02-09 11:50:52 +01:00
Yazen Ghannam 6e241bc93c EDAC/amd64: Remove scrub rate control for Family 17h and later
The scrub registers on AMD Family 17h and later may be inaccessible to
the OS. Furthermore, hardware designers recommend that the scrubbing
feature is managed by the firmware.

Remove support for the sdram_scrub_rate interface for AMD Family 17h
systems and later by not setting the scrub function pointers. The EDAC MC
core will then not expose the scrub files in sysfs.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-3-yazen.ghannam@amd.com
2023-02-09 11:25:21 +01:00
Yazen Ghannam fdce765a13 EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
EDAC PCI control is used to detect/report legacy PCI errors like
"Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error
Reporting (AER), and legacy PCI errors should not be reported.

Remove EDAC PCI control setup on AMD Family 17h and later systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-2-yazen.ghannam@amd.com
2023-02-09 11:14:53 +01:00
Jia He 315bada690 EDAC: Check for GHES preference in the chipset-specific EDAC drivers
Call ghes_get_devices() to check whether ghes_edac should be used on the
platform where it is preferred over the corresponding chipset-specific
EDAC driver.

Unlike the existing edac_get_owner() check, the ghes_get_devices() check
works independent to the module_init ordering.

  [ bp: Massage. ]

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-6-justin.he@arm.com
2022-10-21 22:09:54 +02:00
Muralidhara M K e1907d3751 x86/amd_nb: Unexport amd_cache_northbridges()
amd_cache_northbridges() is exported by amd_nb.c and is called by
amd64-agp.c and amd64_edac.c modules at module_init() time so that NB
descriptors are properly cached before those drivers can use them.

However, the init_amd_nbs() initcall already does call
amd_cache_northbridges() unconditionally and thus makes sure the NB
descriptors are enumerated.

That initcall is a fs_initcall type which is on the 5th group (starting
from 0) of initcalls that gets run in increasing numerical order by the
init code.

The module_init() call is turned into an __initcall() in the MODULE=n
case and those are device-level initcalls, i.e., group 6.

Therefore, the northbridges caching is already finished by the time
module initialization starts and thus the correct initialization order
is retained.

Unexport amd_cache_northbridges(), update dependent modules to
call amd_nb_num() instead. While at it, simplify the checks in
amd_cache_northbridges().

  [ bp: Heavily massage and *actually* explain why the change is ok. ]

Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324122729.221765-1-nchatrad@amd.com
2022-04-05 19:22:27 +02:00
Yazen Ghannam 2151c84ece EDAC/amd64: Add new register offset support and related changes
Introduce a "family flags" bitmask that can be used to indicate any
special behavior needed on a per-family basis.

Add a flag to indicate a system uses the new register offsets introduced
with Family 19h Model 10h.

Use this flag to account for register offset changes, a new bitfield
indicating DDR5 use on a memory controller, and to set the proper number
of chip select masks.

Rework f17_addr_mask_to_cs_size() to properly handle the change in chip
select masks. And update code comments to reflect the updated Chip
Select, DIMM, and Mask relationships.

[uninitialized variable warning]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20220202144307.2678405-3-yazen.ghannam@amd.com
2022-02-23 22:01:33 +01:00
Yazen Ghannam 75aeaaf23d EDAC/amd64: Set memory type per DIMM
Current AMD systems allow mixing of DIMM types within a system. However,
DIMMs within a channel, i.e. managed by a single Unified Memory
Controller (UMC), must be of the same type.

Handle this possible configuration by checking and setting the memory
type for each individual "UMC" structure.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20220202144307.2678405-2-yazen.ghannam@amd.com
2022-02-23 21:53:45 +01:00
Linus Torvalds ff8be96420 - Add support for version 3 of the Synopsys DDR controller to synopsys_edac
- Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
   family 0x19 CPUs to amd64_edac
 
 - The usual set of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHb/PwACgkQEsHwGGHe
 VUqhbhAAo0mRNnBF3CJn1zlXRgmqrvV1IPnJQNp+z5iaXY1vr0qRMgO4OcgsJrxF
 nxrx/fdAYlQQO4vz1iq4t4j+eazOQyM/JZ0DKi4e+Dw2mC0axdCx8a0pyl1g2de6
 oQ5GkplRKUFn+3bTJpHIE5QnCOD7S85Mrp1F3Soa6jD9i+HwQIqAoltNMcCP7Yei
 ibhWUBX2H/oYcHARecIkP/YEyzSEHhcX6LRjNILW5haZQ6GziQUFzKUUwpUS3hsz
 9i6hXnHXEPhOq8JyoyWWhvVDywFK9z8lh57G7DFfZIhAk1FjuLDP2iI270D/LkYF
 shq6+M8ST9yqwOMV3Iaoa8VZFf/fjTyV0E0L2p2+faxaJ66rqdzbagLIZQv6hDNe
 N1/LD72/Io4et1kEbbaHm5jpxzSJ0jQwu1o+rY1/NmKsWhzE6V4X0GnDTZzZwP9b
 CbFJAWdCD+fi3WQjzv8HLVepjIsV+R3VVTOLq2oodn/mtoK0DRU/vTeCCwRS9ntF
 IyF55L/jSqy9CtP119KBnItGo4b84UrJDozXizGtc6Zt3chz7ljSpa2gJrKF+fCR
 Yhyr9Pt+vYQBpnIMDu1BPcoE58pwZYKoOSO00COUHHsLn+u8qhetGmQzYwWHCq7J
 hz8HHZnlTdFseZTp5tavk3B4md5HiPu+A7GevK3YH3lBv/vEmDM=
 =85ZE
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Add support for version 3 of the Synopsys DDR controller to
   synopsys_edac

 - Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD
   family 0x19 CPUs to amd64_edac

 - The usual set of fixes and cleanups

* tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Add support for family 19h, models 50h-5fh
  EDAC/sb_edac: Remove redundant initialization of variable rc
  RAS/CEC: Remove a repeated 'an' in a comment
  EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
  EDAC: Add RDDR5 and LRDDR5 memory types
  EDAC/sifive: Fix non-kernel-doc comment
  dt-bindings: memory: Add entry for version 3.80a
  EDAC/synopsys: Enable the driver on Intel's N5X platform
  EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR
  EDAC/synopsys: Use the quirk for version instead of ddr version
2022-01-10 11:45:23 -08:00
Marc Bevand 0b8bf9cb14 EDAC/amd64: Add support for family 19h, models 50h-5fh
Add the new family 19h models 50h-5fh PCI IDs (device 18h functions 0
and 6) to support Ryzen 5000 APUs ("Cezanne").

Signed-off-by: Marc Bevand <m@zorinaq.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20211221233112.556927-1-m@zorinaq.com
2021-12-24 10:53:41 +01:00
Yazen Ghannam e2be5955a8 EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh
Add a new family type for AMD Family 19h Models 10h to 1Fh. Use this new
family type for Models A0h to AFh also.

Increase the maximum number of controllers from 8 to 12.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211208174356.1997855-3-yazen.ghannam@amd.com
2021-12-10 12:54:33 +01:00
Yazen Ghannam 70aeb807cf EDAC/amd64: Add context struct
Define an address translation context struct. This will hold values that
will be passed between multiple functions.

Save return address, Node ID, and the Instance ID number to start.
Currently, the UMC number is used as the Instance ID, but future DF
versions may use another value.

Also include a "tmp" field to use when reading registers. This is to
avoid having to define a temporary variable in multiple functions.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com
2021-11-15 12:54:16 +01:00
Yazen Ghannam 448c3d6085 EDAC/amd64: Allow for DF Indirect Broadcast reads
The DF Indirect Access method allows for "Broadcast" accesses in which
case no specific instance is targeted. Add support using a reserved
instance ID of 0xFF to indicate a broadcast access. Set the FICAA
register appropriately.

Define helpers functions for instance and broadcast reads and use them
where appropriate.

Drop the "amd_" prefix since these functions are all static.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com
2021-11-15 12:48:55 +01:00
Yazen Ghannam b3218ae477 x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDAC
df_indirect_read() is used only for address translation. Move it to EDAC
along with the translation code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com
2021-11-15 12:44:47 +01:00
Yazen Ghannam 0b746e8c1e x86/MCE/AMD, EDAC/amd64: Move address translation to AMD64 EDAC
The address translation code used for current AMD systems is
non-architectural. So move it to EDAC.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211028175728.121452-2-yazen.ghannam@amd.com
2021-11-15 12:36:32 +01:00
Yazen Ghannam 9f4873fb6a EDAC/amd64: Handle three rank interleaving mode
AMD Rome systems and later support interleaving between three identical
ranks within a channel.

Check for this mode by counting the number of enabled chip selects and
comparing their masks. If there are exactly three enabled chip selects
and their masks are identical, then three rank interleaving is enabled.

The size of a rank is determined from its mask value. However, three
rank interleaving doesn't follow the method of swapping an interleave
bit with the most significant bit. Rather, the interleave bit is flipped
and the most significant bit remains the same. There is only a single
interleave bit in this case.

Account for this when determining the chip select size by keeping the
most significant bit at its original value and ignoring any zero bits.
This will return a full bitmask in [MSB:1].

Fixes: e53a3b267f ("EDAC/amd64: Find Chip Select memory size using Address Mask")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com
2021-10-07 12:30:53 +02:00
Dwaipayan Ray d19faf0e49 EDAC/amd64: Use DEVICE_ATTR helper macros
Instead of "open coding" DEVICE_ATTR, use the corresponding
helper macros DEVICE_ATTR_{RW,RO,WO} in amd64_edac.c

Some function names needed to be changed to match the device
conventions <foo>_show and <foo>_store, but the functionality
itself is unchanged.

The devices using EDAC_DCT_ATTR_SHOW() are left unchanged.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210713065130.2151-1-dwaipayanray1@gmail.com
2021-07-13 09:59:57 -07:00
Brijesh Singh 059e5c321a x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
2021-05-10 07:51:38 +02:00
Borislav Petkov 4cbcb73b1c EDAC/amd64: Issue probing messages only on properly detected hardware
amd64_edac was converted to CPU family autoprobing (from PCI device
IDs) to not have to add a new PCI device ID each time a new platform is
shipped but to support the whole family out-of-the-box.

However, this caused a lot of noise in dmesg even when the machine
doesn't have ECC DIMMs or ECC has been disabled in the BIOS:

  EDAC MC: Ver: 3.0.0
  EDAC amd64: F17h detected (node 0).
  EDAC amd64: Node 0: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 1).
  EDAC amd64: Node 1: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 2).
  EDAC amd64: Node 2: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 3).
  EDAC amd64: Node 3: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 4).
  EDAC amd64: Node 4: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 5).
  EDAC amd64: Node 5: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 6).
  EDAC amd64: Node 6: DRAM ECC disabled.
  EDAC amd64: F17h detected (node 7).
  EDAC amd64: Node 7: DRAM ECC disabled.

or even

$ grep EDAC dmesg.log | sed 's/\[.*\] //' | sort | uniq -c
    128 EDAC amd64: F17h detected (node 0).
    128 EDAC amd64: Node 0: DRAM ECC disabled.
      1 EDAC MC: Ver: 3.0.0

on a big machine. Yap, that's once per CPU for 128 of them.

So move the init messages after all probing has succeeded to avoid
unnecessary spew in dmesg.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210119164141.17417-1-bp@alien8.de
2021-01-22 11:13:47 +01:00
Borislav Petkov 1865bc71a8 EDAC/amd64: Limit error injection functionality to supported hw
Families up to and including 0x16 allow access to the injection
hardware. Starting with family 0x17, access to those registers is
blocked by security policy.

Limit that only on the families which support it.

Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201222180013.GD13463@zn.tnic
2020-12-28 19:36:37 +01:00
Borislav Petkov 61810096de EDAC/amd64: Merge error injection sysfs facilities
Merge them into the main driver and put them inside an EDAC_DEBUG
ifdeffery to simplify the driver and have all debugging/injection stuff
behind a debug build-time switch.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20201215110517.5215-2-bp@alien8.de
2020-12-28 19:36:25 +01:00
Borislav Petkov 2a28ceef00 EDAC/amd64: Merge sysfs debugging attributes setup code
There's no need for them to be in a separate file so merge them into the
main driver compilation unit like the other EDAC drivers do.

Drop now-unneeded function export, make the function static and shorten
static function names.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20201215110517.5215-1-bp@alien8.de
2020-12-28 19:36:17 +01:00
Yazen Ghannam 6a4afe3878 EDAC/amd64: Tone down messages about missing PCI IDs
Give these messages a debug severity as they are really only useful to
the module developers.

Also, drop the "(broken BIOS?)" phrase, since this can cause churn for
BIOS folks. The PCI IDs needed by the module, at least on modern systems,
are fixed in hardware.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215170131.8496-1-Yazen.Ghannam@amd.com
2020-12-28 19:18:03 +01:00
Borislav Petkov 6c13d7ff81 EDAC/amd64: Do not load on family 0x15, model 0x13
Those were only laptops and are very very unlikely to have ECC memory.
Currently, when the driver attempts to load, it issues:

  EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)

because the PCI device is the wrong one (it uses the F15h default one).

So do not load the driver on them as that is pointless.

Reported-by: Don Curtis <bugrprt21882@online.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Don Curtis <bugrprt21882@online.de>
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1179763
Link: https://lkml.kernel.org/r/20201218160622.20146-1-bp@alien8.de
2020-12-28 12:18:11 +01:00
Linus Torvalds 0d712978dc - Save the AMD's physical die ID into cpuinfo_x86.cpu_die_id and convert all
code to use it (Yazen Ghannam)
 
 - Remove a dead and unused TSEG region remapping workaround on AMD (Arvind Sankar)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl/XVlYACgkQEsHwGGHe
 VUpxTA/9F0KsgSyTh66uX+aX5qkQ3WTBVgxbXGFrn5qPvwcALXabU8qObDWTSdwS
 1YbiWDjKNBJX+dggWe/fcQgUZxu5DFkM4IKEW1V7MLJEcdfylcqCyc1YNpEI4ySn
 ebw2Sy4/5iXGAvhz802/WoUU/o3A2uZwe0RFyodHGxof5027HkZhRHeYB27Htw+l
 z0IsmiYOoPl/4mNuVgr/qieIFSw1SUE9kwjU8RvM6xVWmXWXpM68JHa9s+/51pFt
 6BaOz485OyzWUCtSx3/++GEkU2d53bWYOuQ1zTLEiuaBfYC5n5T/kAcT4WJNK6Tf
 tX7yrzmWm9ecykIxfkgMrhG57G38y2GMJcEg+dFQHeXC062fdHDg+oY6Ql2EkAm5
 t5RIQ/cyOmQCLns31rHI/kwQ3RMKc/lfnL/z8lrlfWsC5o755yFJKttbfLJugbTo
 3BO1fbs4xgQcgi0KoqXOUETrQtsOLtr9FJwvcArB94XXqcIPClE8Ir7n8T7FCuLr
 9litSXIdn46EHwD6hD5QIk7y+Rxwk/jxZFys3eh90jcWDDZTaG2lz3if33RbZ1go
 XBrS5X3HsMODGZlaMeUjrbFIz3e0Zyoo+RO/TX48w8nzivC6xSNxSNFgIZ1XTF5E
 SLMGa6lEQ9mLiqRfgFjynNwSYOSlGv3euMkZaVPS3hnNmn+vZbI=
 =RsCs
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:
 "Only AMD-specific changes this time:

   - Save the AMD physical die ID into cpuinfo_x86.cpu_die_id and
     convert all code to use it (Yazen Ghannam)

   - Remove a dead and unused TSEG region remapping workaround on AMD
     (Arvind Sankar)"

* tag 'x86_cpu_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove dead code for TSEG region remapping
  x86/topology: Set cpu_die_id only if DIE_TYPE found
  EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
  x86/CPU/AMD: Remove amd_get_nb_id()
  x86/CPU/AMD: Save AMD NodeId as cpu_die_id
2020-12-14 13:21:33 -08:00
Borislav Petkov 706657b1fe EDAC/amd64: Fix PCI component registration
In order to setup its PCI component, the driver needs any node private
instance in order to get a reference to the PCI device and hand that
into edac_pci_create_generic_ctl(). For convenience, it uses the 0th
memory controller descriptor under the assumption that if any, the 0th
will be always present.

However, this assumption goes wrong when the 0th node doesn't have
memory and the driver doesn't initialize an instance for it:

  EDAC amd64: F17h detected (node 0).
  ...
  EDAC amd64: Node 0: No DIMMs detected.

But looking up node instances is not really needed - all one needs is
the pointer to the proper device which gets discovered during instance
init.

So stash that pointer into a variable and use it when setting up the
EDAC PCI component.

Clear that variable when the driver needs to unwind due to some
instances failing init to avoid any registration imbalance.

Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de
2020-11-27 11:11:16 +01:00
Yazen Ghannam db970bd231 x86/CPU/AMD: Remove amd_get_nb_id()
The Last Level Cache ID is returned by amd_get_nb_id(). In practice,
this value is the same as the AMD NodeId for callers of this function.
The NodeId is saved in struct cpuinfo_x86.cpu_die_id.

Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and
remove the function.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
2020-11-19 11:43:17 +01:00
Tom Rix f09056c1de EDAC/amd64: Remove unneeded breaks
A break is not needed if it is preceded by a return.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Robert Richter <rric@kernel.org>
Link: https://lkml.kernel.org/r/20201019193524.13391-1-trix@redhat.com
2020-10-26 12:07:50 +01:00