Commit Graph

144 Commits

Author SHA1 Message Date
Li Yang 9368aa1882 APEI: GHES: correctly return NULL for ghes_get_devices()
Since 315bada690 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present.  Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.

Fixes: 9057a3f7ac ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:31:48 +02:00
Miaohe Lin 4348270137 ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()
ghes_estatus_pool_size_request() is unused now, so remove it.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-05 19:13:14 +02:00
Steven Rostedt (Google) 292a089d78 treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown".  After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed.  It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

    $ cat timer.cocci
    @@
    expression ptr, slab;
    identifier timer, rfield;
    @@
    (
    -       del_timer(&ptr->timer);
    +       timer_shutdown(&ptr->timer);
    |
    -       del_timer_sync(&ptr->timer);
    +       timer_shutdown_sync(&ptr->timer);
    )
      ... when strict
          when != ptr->timer
    (
            kfree_rcu(ptr, rfield);
    |
            kmem_cache_free(slab, ptr);
    |
            kfree(ptr);
    )

    $ spatch timer.cocci . > /tmp/t.patch
    $ patch -p1 < /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-25 13:38:09 -08:00
Linus Torvalds 7adcadb984 - Make ghes_edac a simple module like the rest of the EDAC drivers and
drop this forced built-in only configuration by disentangling it from
 GHES. Work by Jia He.
 
 - The usual small cleanups and improvements all over EDAC land
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXPvoACgkQEsHwGGHe
 VUq98xAAmhz4u9e9pXG0Ixkx25ZtnZ+YxANeQ53Hsa2gWicbcoFgL2E30gi97c1y
 X9W361B2Q5dYq+J/YRUnEOXlI/KMWLxzNykSvVipUFNfxXZH+PijEAArz2V35/uE
 6ZISRLUYVYEtHEoUXbTogeyBmBUnIaJfYheZCluDQlWPggsDESP1qmE+FTg25OBs
 rDl5y+zUZYPxrWustNodVThPyhdMwGyYAUS6qYKCoNs9SNkAjGnrXoPc9j/U+cV+
 qMY2dNS3uKnCujKEssQhcHucyWgCEDvmEKWMH4ItryV2UBBjpNRoM6HDe7XFKwVJ
 riOKX8VDrpdSdlV1jbCx9KB47BUwFygOYsFdW7gIDJ1hb8usN4nSYQNDIlZKEIQG
 cHNpv2XGT+pCSvyc4Iv2Fgyvnp25XensSQwQAtk5Y4/lJL1yrgcPjMOkPmRS+mmH
 BclDWNbL+gwqkyWxgfoivDBOetLgwJYTr2ewBr6QbBtwLB8rL4BxXIdomcoFPuxi
 jAxixZnTbS+Xq5S7uYK4r6KbaHGcJtwolXMGjx13IHmPfvYtTTQzfRcrBlAtQ/pV
 BDLoygmDVlkhSVx6bi5V5QZ06rcWYR4cRsBQ54FnBGMr730ZljgFONOHFtUab28T
 C+YUOaeLEYEYI0cIkkyoSuiz6avB6YvQAiyEPM0EdHZrQFwhBBw=
 =DFp8
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - Make ghes_edac a simple module like the rest of the EDAC drivers and
   drop the forced built-in only configuration by disentangling it from
   GHES (Jia He)

 - The usual small cleanups and improvements all over EDAC land

* tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()
  EDAC/i5400: Fix typo in comment: vaious -> various
  EDAC/mc_sysfs: Increase legacy channel support to 12
  MAINTAINERS: Make Mauro EDAC reviewer
  MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac
  EDAC/igen6: Return the correct error type when not the MC owner
  apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
  EDAC: Check for GHES preference in the chipset-specific EDAC drivers
  EDAC/ghes: Make ghes_edac a proper module
  EDAC/ghes: Prepare to make ghes_edac a proper module
  EDAC/ghes: Add a notifier for reporting memory errors
  efi/cper: Export several helpers for ghes_edac to use
  EDAC/i5000: Mark as BROKEN
2022-12-12 14:47:31 -08:00
Ard Biesheuvel dd3fa54b2e apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:

It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:

  /*
   * GHES error status reporting throttle, to report more kinds of
   * errors, instead of just most frequently occurred errors.
   */

New elements are added to the cache this way:

  if (!ghes_estatus_cached(estatus)) {
          if (ghes_print_estatus(NULL, ghes->generic, estatus))
                  ghes_estatus_cache_add(ghes->generic, estatus);

The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.

Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.

And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.

Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.

And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.

This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.

Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.

  [ bp:
    - Flesh out and summarize what was discussed on the thread now
      that that cache contraption is understood;
    - Touch up code style. ]

Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28 18:50:41 +02:00
Uwe Kleine-König 36006ccb6b ACPI: APEI: Drop unsetting driver data on remove
Since commit 0998d06310 ("device-core: Ensure drvdata = NULL when no
driver is bound") the driver core cares for cleaning driver data, so
don't do it in the driver, too.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28 18:38:56 +02:00
Ard Biesheuvel 0d2aa70b8f apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:

It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:

  /*
   * GHES error status reporting throttle, to report more kinds of
   * errors, instead of just most frequently occurred errors.
   */

New elements are added to the cache this way:

  if (!ghes_estatus_cached(estatus)) {
          if (ghes_print_estatus(NULL, ghes->generic, estatus))
                  ghes_estatus_cache_add(ghes->generic, estatus);

The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.

Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.

And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.

Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.

And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.

This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.

Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.

  [ bp:
    - Flesh out and summarize what was discussed on the thread now
      that that cache contraption is understood;
    - Touch up code style. ]

Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
2022-10-24 17:22:25 +02:00
Jia He 802e7f1dfe EDAC/ghes: Make ghes_edac a proper module
Commit

  dc4e8c07e9 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")

introduced a bug leading to ghes_edac_register() to be invoked before
edac_init(). Because at that time the bus "edac" hadn't been even
registered, this created sysfs nodes as /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

Fix this by turning ghes_edac into a proper module.

The list of GHES devices returned is not protected from being modified
concurrently but it is pretty static as it gets created only during GHES
init and latter is not a module so...

  [ bp: Massage. ]

Fixes: dc4e8c07e9 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
2022-10-21 21:59:19 +02:00
Jia He 9057a3f7ac EDAC/ghes: Prepare to make ghes_edac a proper module
To make ghes_edac a proper module, prepare to decouple its dependencies
from GHES.

Move the ghes_edac.force_load parameter to ghes.c in order to
properly control whether ghes_edac should be force-loaded: In
ghes_edac_register() it is too late to set the module flag.

Introduce a helper ghes_get_devices(), which returns the list of GHES
devices which got probed when the platform-check passes on the system.

The previous force_load check is not needed in ghes_edac_unregister()
since it will be checked in the module's init function of ghes_edac
later.

  [ bp: Massage. ]

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
2022-10-21 19:32:38 +02:00
Jia He 8e40612f61 EDAC/ghes: Add a notifier for reporting memory errors
In order to make it a proper module and disentangle it from facilities,
add a notifier for reporting memory errors. Use an atomic notifier
because calls sites like ghes_proc_in_irq() run in interrupt context.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
2022-10-20 13:25:53 +02:00
Ashish Kalra 43d2748394 ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Change num_ghes from int to unsigned int, preventing an overflow
and causing subsequent vmalloc() to fail.

The overflow happens in ghes_estatus_pool_init() when calculating
len during execution of the statement below as both multiplication
operands here are signed int:

len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);

The following call trace is observed because of this bug:

[    9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1
[    9.317131] Call Trace:
[    9.317134]  <TASK>
[    9.317137]  dump_stack_lvl+0x49/0x5f
[    9.317145]  dump_stack+0x10/0x12
[    9.317146]  warn_alloc.cold+0x7b/0xdf
[    9.317150]  ? __device_attach+0x16a/0x1b0
[    9.317155]  __vmalloc_node_range+0x702/0x740
[    9.317160]  ? device_add+0x17f/0x920
[    9.317164]  ? dev_set_name+0x53/0x70
[    9.317166]  ? platform_device_add+0xf9/0x240
[    9.317168]  __vmalloc_node+0x49/0x50
[    9.317170]  ? ghes_estatus_pool_init+0x43/0xa0
[    9.317176]  vmalloc+0x21/0x30
[    9.317177]  ghes_estatus_pool_init+0x43/0xa0
[    9.317179]  acpi_hest_init+0x129/0x19c
[    9.317185]  acpi_init+0x434/0x4a4
[    9.317188]  ? acpi_sleep_proc_init+0x2a/0x2a
[    9.317190]  do_one_initcall+0x48/0x200
[    9.317195]  kernel_init_freeable+0x221/0x284
[    9.317200]  ? rest_init+0xe0/0xe0
[    9.317204]  kernel_init+0x1a/0x130
[    9.317205]  ret_from_fork+0x22/0x30
[    9.317208]  </TASK>

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-13 20:40:09 +02:00
Shuai Xue 415fed694f ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.

For NMI like notifications, commit 7f17b4a121 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.

The code use init_mm to check whether the error occurs in user space:

    if (current->mm != &init_mm)

The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.

In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.

    sdei: event 805 on CPU 113 failed with error: -2

Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.

The condition should generally just do

    if (current->mm)

as described in active_mm.rst documentation.

Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.

Fixes: 7f17b4a121 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-04 16:04:41 +02:00
Shuai Xue 27e932a314 ACPI: APEI: rename ghes_init() with an "acpi_" prefix
ghes_init() sticks out in acpi_init() because it is the only functions
without an "acpi_" prefix.

Rename ghes_init with an "acpi_" prefix, then all looks fine.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-03 20:25:23 +01:00
Shuai Xue dc4e8c07e9 ACPI: APEI: explicit init of HEST and GHES in apci_init()
From commit e147133a42 ("ACPI / APEI: Make hest.c manage the estatus
memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
the estatus memory pool. On the other hand, ghes_init() relies on
sdei_init() to detect the SDEI version and (un)register events. The
dependencies are as follows:

    ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init()
    ghes_init() => sdei_init()

HEST is not PCI-specific and initcall ordering is implicit and not
well-defined within a level.

Based on above, remove acpi_hest_init() from acpi_pci_root_init() and
convert ghes_init() and sdei_init() from initcalls to explicit calls in the
following order:

    acpi_hest_init()
    ghes_init()
        sdei_init()

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-03 20:24:22 +01:00
Tony Luck 3ad6fd77a2 x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
2021-11-15 11:13:16 -08:00
Xiaofei Tan ccb5ecdc2d ACPI: APEI: fix synchronous external aborts in user-mode
Before commit 8fcc4ae6fa ("arm64: acpi: Make apei_claim_sea()
synchronise with APEI's irq work"), do_sea() would unconditionally
signal the affected task from the arch code. Since that change,
the GHES driver sends the signals.

This exposes a problem as errors the GHES driver doesn't understand
or doesn't handle effectively are silently ignored. It will cause
the errors get taken again, and circulate endlessly. User-space task
get stuck in this loop.

Existing firmware on Kunpeng9xx systems reports cache errors with the
'ARM Processor Error' CPER records.

Do memory failure handling for ARM Processor Error Section just like
for Memory Error Section.

Fixes: 8fcc4ae6fa ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-17 14:05:28 +02:00
Linus Torvalds 4a22709e21 arch-cleanup-2020-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
 VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
 mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
 I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
 ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
 br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
 w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
 h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
 zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
 8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
 cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
 J9ILnGAAeQ==
 =aVWo
 -----END PGP SIGNATURE-----

Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block

Pull arch task_work cleanups from Jens Axboe:
 "Two cleanups that don't fit other categories:

   - Finally get the task_work_add() cleanup done properly, so we don't
     have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
     all callers, and also fixes up the documentation for
     task_work_add().

   - While working on some TIF related changes for 5.11, this
     TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
     duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
  task_work: cleanup notification modes
  tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
2020-10-23 10:06:38 -07:00
Jens Axboe 91989c7078 task_work: cleanup notification modes
A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

Clean this up properly, and define a proper enum for the notification
mode. Now we have:

- TWA_NONE. This is 0, same as before the original change, meaning no
  notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
  that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
  notification.

Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.

Fixes: e91b481623 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 15:05:30 -06:00
Shiju Jose 9aa9cf3ee9 ACPI / APEI: Add a notifier chain for unknown (vendor) CPER records
CPER records describing a firmware-first error are identified by GUID.
The ghes driver currently logs, but ignores any unknown CPER records.
This prevents describing errors that can't be represented by a standard
entry, that would otherwise allow a driver to recover from an error.
The UEFI spec calls these 'Non-standard Section Body' (N.2.3 of
version 2.8).

Add a notifier chain for these non-standard/vendor-records. Callers
must identify their type of records by GUID.

Record data is copied to memory from the ghes_estatus_pool to allow
us to keep it until after the notifier has run.

Co-developed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200903123456.1823-2-shiju.jose@huawei.com
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
2020-09-16 10:30:42 +01:00
Linus Torvalds 118d6e9829 ACPI updates for 5.8-rc1
- Update the ACPICA code in the kernel to upstream revision
    20200430:
 
    * Move acpi_gbl_next_cmd_num definition (Erik Kaneda).
 
    * Ignore AE_ALREADY_EXISTS status in the disassembler when parsing
      create operators (Erik Kaneda).
 
    * Add status checks to the dispatcher (Erik Kaneda).
 
    * Fix required parameters for _NIG and _NIH (Erik Kaneda).
 
    * Make acpi_protocol_lengths static (Yue Haibing).
 
  - Fix ACPI table reference counting errors in several places, mostly
    in error code paths (Hanjun Guo).
 
  - Extend the Generic Event Device (GED) driver to support _Exx and
    _Lxx handler methods (Ard Biesheuvel).
 
  - Add new acpi_evaluate_reg() helper and modify the ACPI PCI hotplug
    code to use it (Hans de Goede).
 
  - Add new DPTF battery participant driver and make the DPFT power
    participant driver create more sysfs device attributes (Srinivas
    Pandruvada).
 
  - Improve the handling of memory failures in APEI (James Morse).
 
  - Add new blacklist entry for Acer TravelMate 5735Z to the backlight
    driver (Paul Menzel).
 
  - Add i2c address for thermal control to the PMIC driver (Mauro
    Carvalho Chehab).
 
  - Allow the ACPI processor idle driver to work on platforms with
    only one ACPI C-state present (Zhang Rui).
 
  - Fix kobject reference count leaks in error code paths in two
    places (Qiushi Wu).
 
  - Delete unused proc filename macros and make some symbols static
    (Pascal Terjan, Zheng Zengkai, Zou Wei).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl7VHb8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxVboQAIjYda2RhQANIlIvoEa+Qd2/FBd3HXgU
 Mv0LZ6y1xxxEZYeKne7zja1hzt5WetuZ1hZHGfg8YkXyrLqZGxfCIFbbhSA90BGG
 PGzFerGmOBNzB3I9SN6iQY7vSqoFHvQEV1PVh24d+aHWZqj2lnaRRq+GT54qbRLX
 /U3Hy5glFl8A/DCBP4cpoEjDr4IJHY68DathkDK2Ep2ybXV6B401uuqx8Su/OBd/
 MQmJTYI1UK/RYBXfdzS9TIZahnkxBbU1cnLFy08Ve2mawl5YsHPEbvm77a0yX2M6
 sOAerpgyzYNivAuOLpNIwhUZjpOY66nQuKAQaEl2cfRUkqt4nbmq7yDoH3d2MJLC
 /Ccz955rV2YyD1DtyV+PyT+HB+/EVwH/+UCZ+gsSbdHvOiwdFU6VaTc2eI1qq8K9
 4m5eEZFrAMPlvTzj/xVxr2Hfw1lbm23J5B5n7sM5HzYbT6MUWRQpvfV4zM3jTGz0
 rQd8JmcHVvZk/MV1mGrYHrN5TnGTLWpbS4Yv1lAQa6FP0N0NxzVud7KRfLKnCnJ1
 vh5yzW2fCYmVulJpuqxJDfXSqNV7n40CFrIewSp6nJRQXnWpImqHwwiA8fl51+hC
 fBL72Ey08EHGFnnNQqbebvNglsodRWJddBy43ppnMHtuLBA/2GVKYf2GihPbpEBq
 NHtX+Rd3vlWW
 =xH3i
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA code in the kernel to upstream revision
  20200430, fix several reference counting errors related to ACPI
  tables, add _Exx / _Lxx support to the GED driver, add a new
  acpi_evaluate_reg() helper, add new DPTF battery participant driver
  and extend the DPFT power participant driver, improve the handling of
  memory failures in the APEI code, add a blacklist entry to the
  backlight driver, update the PMIC driver and the processor idle
  driver, fix two kobject reference count leaks, and make a few janitory
  changes.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20200430:

      - Move acpi_gbl_next_cmd_num definition (Erik Kaneda).

      - Ignore AE_ALREADY_EXISTS status in the disassembler when parsing
        create operators (Erik Kaneda).

      - Add status checks to the dispatcher (Erik Kaneda).

      - Fix required parameters for _NIG and _NIH (Erik Kaneda).

      - Make acpi_protocol_lengths static (Yue Haibing).

   - Fix ACPI table reference counting errors in several places, mostly
     in error code paths (Hanjun Guo).

   - Extend the Generic Event Device (GED) driver to support _Exx and
     _Lxx handler methods (Ard Biesheuvel).

   - Add new acpi_evaluate_reg() helper and modify the ACPI PCI hotplug
     code to use it (Hans de Goede).

   - Add new DPTF battery participant driver and make the DPFT power
     participant driver create more sysfs device attributes (Srinivas
     Pandruvada).

   - Improve the handling of memory failures in APEI (James Morse).

   - Add new blacklist entry for Acer TravelMate 5735Z to the backlight
     driver (Paul Menzel).

   - Add i2c address for thermal control to the PMIC driver (Mauro
     Carvalho Chehab).

   - Allow the ACPI processor idle driver to work on platforms with only
     one ACPI C-state present (Zhang Rui).

   - Fix kobject reference count leaks in error code paths in two places
     (Qiushi Wu).

   - Delete unused proc filename macros and make some symbols static
     (Pascal Terjan, Zheng Zengkai, Zou Wei)"

* tag 'acpi-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
  ACPI: CPPC: Fix reference count leak in acpi_cppc_processor_probe()
  ACPI: sysfs: Fix reference count leak in acpi_sysfs_add_hotplug_profile()
  ACPI: GED: use correct trigger type field in _Exx / _Lxx handling
  ACPI: DPTF: Add battery participant driver
  ACPI: DPTF: Additional sysfs attributes for power participant driver
  ACPI: video: Use native backlight on Acer TravelMate 5735Z
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
  ACPI: APEI: Kick the memory_failure() queue for synchronous errors
  mm/memory-failure: Add memory_failure_queue_kick()
  ACPI / PMIC: Add i2c address for thermal control
  ACPI: GED: add support for _Exx / _Lxx handler methods
  ACPI: Delete unused proc filename macros
  ACPI: hotplug: PCI: Use the new acpi_evaluate_reg() helper
  ACPI: utils: Add acpi_evaluate_reg() helper
  ACPI: debug: Make two functions static
  ACPI: sleep: Put the FACS table after using it
  ACPI: scan: Put SPCR and STAO table after using it
  ACPI: EC: Put the ACPI table after using it
  ACPI: APEI: Put the HEST table for error path
  ACPI: APEI: Put the error record serialization table for error path
  ...
2020-06-02 13:25:52 -07:00
Joerg Roedel 73f693c3a7 mm: remove vmalloc_sync_(un)mappings()
These functions are not needed anymore because the vmalloc and ioremap
mappings are now synchronized when they are created or torn down.

Remove all callers and function definitions.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:12 -07:00
James Morse 7f17b4a121 ACPI: APEI: Kick the memory_failure() queue for synchronous errors
memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-19 19:51:11 +02:00
Joerg Roedel 763802b53a x86/mm: split vmalloc_sync_all()
Commit 3f8fd02b1b ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Bhaskar Upadhaya cea79e7e2f apei/ghes: Do not delay GHES polling
Currently, the ghes_poll_func() timer callback is registered with the
TIMER_DEFERRABLE flag. Thus, it is run when the CPU eventually wakes
up together with a subsequent non-deferrable timer and not at the precisely
configured polling interval.

For polling mode, the polling interval configured by firmware should not
be exceeded according to the ACPI spec 6.3, Table 18-394. The definition
of the polling interval is:

"Indicates the poll interval in milliseconds OSPM should use to
 periodically check the error source for the presence of an error
 condition."

If this interval is extended due to the timer callback deferring, error
records can get lost. Which we are observing on our ThunderX2 platforms.

Therefore, remove the TIMER_DEFERRABLE flag so that the timer callback
executes at the precise interval.

Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com>
[ bp: Subject & changelog ]
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-13 11:49:55 +01:00
Linus Torvalds 436b2a8039 Printk changes for 5.5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl3bpjoACgkQUqAMR0iA
 lPJJDA/+IJT4YCRp2TwV2jvIs0QzvXZrzEsxgCLibLE85mYTJgoQBD3W1bH2eyjp
 T/9U0Zh5PGr/84cHd4qiMxzo+5Olz930weG59NcO4RJBSr671aRYs5tJqwaQAZDR
 wlwaob5S28vUmjPxKulvxv6V3FdI79ZE9xrCOCSTQvz4iCLsGOu+Dn/qtF64pImX
 M/EXzPMBrByiQ8RTM4Ege8JoBqiCZPDG9GR3KPXIXQwEeQgIoeYxwRYakxSmSzz8
 W8NduFCbWavg/yHhghHikMiyOZeQzAt+V9k9WjOBTle3TGJegRhvjgI7508q3tXe
 jQTMGATBOPkIgFaZz7eEn/iBa3jZUIIOzDY93RYBmd26aBvwKLOma/Vkg5oGYl0u
 ZK+CMe+/xXl7brQxQ6JNsQhbSTjT+746LvLJlCvPbbPK9R0HeKNhsdKpGY3ugnmz
 VAnOFIAvWUHO7qx+J+EnOo5iiPpcwXZj4AjrwVrs/x5zVhzwQ+4DSU6rbNn0O1Ak
 ELrBqCQkQzh5kqK93jgMHeWQ9EOUp1Lj6PJhTeVnOx2x8tCOi6iTQFFrfdUPlZ6K
 2DajgrFhti4LvwVsohZlzZuKRm5EuwReLRSOn7PU5qoSm5rcouqMkdlYG/viwyhf
 mTVzEfrfemrIQOqWmzPrWEXlMj2mq8oJm4JkC+jJ/+HsfK4UU8I=
 =QCEy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow to print symbolic error names via new %pe modifier.

 - Use pr_warn() instead of the remaining pr_warning() calls. Fix
   formatting of the related lines.

 - Add VSPRINTF entry to MAINTAINERS.

* tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits)
  checkpatch: don't warn about new vsprintf pointer extension '%pe'
  MAINTAINERS: Add VSPRINTF
  tools lib api: Renaming pr_warning to pr_warn
  ASoC: samsung: Use pr_warn instead of pr_warning
  lib: cpu_rmap: Use pr_warn instead of pr_warning
  trace: Use pr_warn instead of pr_warning
  dma-debug: Use pr_warn instead of pr_warning
  vgacon: Use pr_warn instead of pr_warning
  fs: afs: Use pr_warn instead of pr_warning
  sh/intc: Use pr_warn instead of pr_warning
  scsi: Use pr_warn instead of pr_warning
  platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning
  platform/x86: asus-laptop: Use pr_warn instead of pr_warning
  platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning
  oprofile: Use pr_warn instead of pr_warning
  of: Use pr_warn instead of pr_warning
  macintosh: Use pr_warn instead of pr_warning
  idsn: Use pr_warn instead of pr_warning
  ide: Use pr_warn instead of pr_warning
  crypto: n2: Use pr_warn instead of pr_warning
  ...
2019-11-25 19:40:40 -08:00
Kefeng Wang 933ca4e323 acpi: Use pr_warn instead of pr_warning
As said in commit f2c2cbcc35 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.

Link: http://lkml.kernel.org/r/20191018031850.48498-8-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: two more indentation fixes]
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-18 15:00:19 +02:00
Liguang Zhang 6abc762227 ACPI / APEI: Release resources if gen_pool_add() fails
Destroy ghes_estatus_pool and release memory allocated via vmalloc() on
errors in ghes_estatus_pool_init() in order to avoid memory leaks.

 [ bp: do the labels properly and with descriptive names and massage. ]

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1563173924-47479-1-git-send-email-zhangliguang@linux.alibaba.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-20 23:52:11 +02:00
Andy Shevchenko bb100b6476 ACPI / APEI: Get rid of NULL_UUID_LE constant
This is a missed part of the commit 5b53696a30
("ACPI / APEI: Switch to use new generic UUID API"), i.e.
replacing old definition with a global constant variable.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-05 11:49:55 +02:00
Liguang Zhang 371b86897d ACPI / APEI: Remove needless __ghes_check_estatus() calls
Function __ghes_check_estatus() is always called after
__ghes_peek_estatus(), but it is already called in __ghes_peek_estatus().
So we should remove some needless __ghes_check_estatus() calls.

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-05 01:23:11 +02:00
Thomas Gleixner 1802d0beec treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 655 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:41 -07:00
James Morse f9f05395f3 ACPI / APEI: Add support for the SDEI GHES Notification type
If the GHES notification type is SDEI, register the provided event
using the SDEI-GHES helper.

SDEI may be one of two types of event, normal and critical. Critical
events can interrupt normal events, so these must have separate
fixmap slots and locks in case both event types are in use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-11 11:07:49 +01:00
James Morse b972d2eaf0 ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications
Now that ghes notification helpers provide the fixmap slots and
take the lock themselves, multiple NMI-like notifications can
be used on arm64.

These should be named after their notification method as they can't
all be called 'NMI'. x86's NOTIFY_NMI already is, change the SEA
fixmap entry to be called FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Because all of ghes.c builds on both architectures, provide a
constant for each fixmap entry that the architecture will never
use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:46 +01:00
James Morse d9f608dc15 ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry()
Each struct ghes has an worst-case sized buffer for storing the
estatus. If an error is being processed by ghes_proc() in process
context this buffer will be in use. If the error source then triggers
an NMI-like notification, the same buffer will be used by
in_nmi_queue_one_entry() to stage the estatus data, before
__process_error() copys it into a queued estatus entry.

Merge __process_error()s work into in_nmi_queue_one_entry() so that
the queued estatus entry is used from the beginning. Use the new
ghes_peek_estatus() to know how much memory to allocate from
the ghes_estatus_pool before reading the records.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>

Change since v6:
 * Added a comment explaining the 'ack-error, then goto no_work'.
 * Added missing esatus-clearing, which is necessary after reading the GAS,
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:46 +01:00
James Morse e00a6e3392 ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length
ghes_read_estatus() reads the record address, then the record's
header, then performs some sanity checks before reading the
records into the provided estatus buffer.

To provide this estatus buffer the caller must know the size of the
records in advance, or always provide a worst-case sized buffer as
happens today for the non-NMI notifications.

Add a function to peek at the record's header to find the size. This
will let the NMI path allocate the right amount of memory before reading
the records, instead of using the worst-case size, and having to copy
the records.

Split ghes_read_estatus() to create __ghes_peek_estatus() which
returns the address and size of the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>

Changes since v7:
 * Grammar
 * concistent argument ordering

Changes since v6:
 * Additional buf_addr = 0 error handling
 * Moved checking out of peek-estatus
 * Reworded an error message so we can tell them apart
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse f2a681b916 ACPI / APEI: Make GHES estatus header validation more user friendly
ghes_read_estatus() checks various lengths in the top-level header to
ensure the CPER records to be read aren't obviously corrupt.

Take the opportunity to make this more user-friendly, printing a
(ratelimited) message about the nature of the header format error.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: James Morse <james.morse@arm.com>
[ rjw: Add missing 'static' ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse f2a7e059aa ACPI / APEI: Pass ghes and estatus separately to avoid a later copy
The NMI-like notifications scribble over ghes->estatus, before
copying it somewhere else. If this interrupts the ghes_probe() code
calling ghes_proc() on each struct ghes, the data is corrupted.

All the NMI-like notifications should use a queued estatus entry
from the beginning, instead of the ghes version, then copying it.
To do this, break up any use of "ghes->estatus" so that all
functions take the estatus as an argument.

This patch just moves these ghes->estatus dereferences into separate
arguments, no change in behaviour. struct ghes becomes unused in
ghes_clear_estatus() as it only wanted ghes->estatus, which we now
pass directly. This is removed.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse b484079b9f ACPI / APEI: Let the notification helper specify the fixmap slot
ghes_copy_tofrom_phys() uses a different fixmap slot depending on in_nmi().
This doesn't work when there are multiple NMI-like notifications, that
could interrupt each other.

As with the locking, move the chosen fixmap_idx to the notification helper.
This only matters for NMI-like notifications, anything calling
ghes_proc() can use the IRQ fixmap slot as its already holding an irqsave
spinlock.

This lets us collapse the ghes_ioremap_pfn_*() helpers.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 3b880cbe4d ACPI / APEI: Move locking to the notification helper
ghes_copy_tofrom_phys() takes different locks depending on in_nmi().
This doesn't work if there are multiple NMI-like notifications, that
can interrupt each other.

Now that NOTIFY_SEA is always called in the same context, move the
lock-taking to the notification helper. The helper will always know
which lock to take. This avoids ghes_copy_tofrom_phys() taking a guess
based on in_nmi().

This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All
the other notifications use ghes_proc(), and are called in process
or IRQ context. Move the spin_lock_irqsave() around their ghes_proc()
calls.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 255097c82d ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Remove Kconfig's ability to turn ACPI_APEI_SEA off if ACPI_APEI_GHES
is selected. This roughly matches the x86 NOTIFY_NMI behaviour, and means
each architecture has at least one user of the estatus-queue, meaning it
doesn't need guarding with ifdef.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 9c9d080513 ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI
The estatus-queue code is currently hidden by the NOTIFY_NMI #ifdefs.
Once NOTIFY_SEA starts using the estatus-queue we can stop hiding
it as each architecture has a user that can't be turned off.

Split the existing CONFIG_HAVE_ACPI_APEI_NMI block in two, and move
the SEA code into the gap.

Move the code around ... and changes the stale comment describing
why the status queue is necessary: printk() is no longer the issue,
its the helpers like memory_failure_queue() that aren't nmi safe.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 06ddeadc8d ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors
During ghes_proc() we use ghes_ack_error() to tell an external agent
we are done with these records and it can re-use the memory.

rc may hold an error returned by ghes_read_estatus(), ENOENT causes
us to skip ghes_ack_error() (as there is nothing to ack), but rc may
also by EIO, which gets supressed.

ghes_clear_estatus() is where we mark the records as processed for
non GHESv2 error sources, and already spots the ENOENT case as
buf_paddr is set to 0 by ghes_read_estatus().

Move the ghes_ack_error() call in here to avoid extra logic with
the return code in ghes_proc().

This enables GHESv2 acking for NMI-like error sources. This is safe
as the buffer is pre-mapped by map_gen_v2() before the GHES is added
to any NMI handler lists.

This same pre-mapping step means we can't receive an error from
apei_read()/write() here as apei_check_gar() succeeded when it
was mapped, and the mapping was cached, so the address can't be
rejected at runtime. Remove the error-returns as this is now
called from a function with no return.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse ee2eb3d4ee ACPI / APEI: Generalise the estatus queue's notify code
Refactor the estatus queue's pool notification routine from
NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

Add rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

in_nmi_queue_one_entry() is separate from the rcu-list walker for a
later caller that doesn't need to walk a list.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
[ rjw: Drop unnecessary err variable in two places ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 5cc6c68287 ACPI / APEI: Don't update struct ghes' flags in read/clear estatus
ghes_read_estatus() sets a flag in struct ghes if the buffer of
CPER records needs to be cleared once the records have been
processed. This flag value is a problem if a struct ghes can be
processed concurrently, as happens at probe time if an NMI arrives
for the same error source. The NMI clears the flag, meaning the
interrupted handler may never do the ghes_estatus_clear() work.

The GHES_TO_CLEAR flags is only set at the same time as
buffer_paddr, which is now owned by the caller and passed to
ghes_clear_estatus(). Use this value as the flag.

A non-zero buf_paddr returned by ghes_read_estatus() means
ghes_clear_estatus() should clear this address. ghes_read_estatus()
already checks for a read of error_status_address being zero,
so CPER records cannot be written here.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 7d49f2c75a ACPI / APEI: Remove spurious GHES_TO_CLEAR check
ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going
on to __process_error(). This is pointless as ghes_read_estatus()
will always set this flag if it returns success, which was checked
earlier in the loop. Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse eeb2555779 ACPI / APEI: Don't store CPER records physical address in struct ghes
When CPER records are found the address of the records is stashed
in the struct ghes. Once the records have been processed, this
address is overwritten with zero so that it won't be processed
again without being re-populated by firmware.

This goes wrong if a struct ghes can be processed concurrently,
as can happen at probe time when an NMI occurs. If the NMI arrives
on another CPU, the probing CPU may call ghes_clear_estatus() on the
records before the handler had finished with them.
Even on the same CPU, once the interrupted handler is resumed, it
will call ghes_clear_estatus() on the NMIs records, this memory may
have already been re-used by firmware.

Avoid this stashing by letting the caller hold the address. A
later patch will do away with the use of ghes->flags in the
read/clear code too.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse fb7be08f1a ACPI / APEI: Make estatus pool allocation a static size
Adding new NMI-like notifications duplicates the calls that grow
and shrink the estatus pool. This is all pretty pointless, as the
size is capped to 64K. Allocate this for each ghes and drop
the code that grows and shrinks the pool.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse e147133a42 ACPI / APEI: Make hest.c manage the estatus memory pool
ghes.c has a memory pool it uses for the estatus cache and the estatus
queue. The cache is initialised when registering the platform driver.
For the queue, an NMI-like notification has to grow/shrink the pool
as it is registered and unregistered.

This is all pretty noisy when adding new NMI-like notifications, it
would be better to replace this with a static pool size based on the
number of users.

As a precursor, move the call that creates the pool from ghes_init(),
into hest.c. Later this will take the number of ghes entries and
consolidate the queue allocations.
Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put
this.

The pool is now initialised as part of ACPI's subsys_initcall():
(acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init())
Before this patch it happened later as a GHES specific device_initcall().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:45 +01:00
James Morse 0ac234be1a ACPI / APEI: Switch estatus pool to use vmalloc memory
The ghes code is careful to parse and round firmware's advertised
memory requirements for CPER records, up to a maximum of 64K.
However when ghes_estatus_pool_expand() does its work, it splits
the requested size into PAGE_SIZE granules.

This means if firmware generates 5K of CPER records, and correctly
describes this in the table, __process_error() will silently fail as it
is unable to allocate more than PAGE_SIZE.

Switch the estatus pool to vmalloc() memory. On x86 vmalloc() memory
may fault and be fixed up by vmalloc_fault(). To prevent this call
vmalloc_sync_all() before an NMI handler could discover the memory.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:44 +01:00
James Morse 93066e9aef ACPI / APEI: Remove silent flag from ghes_read_estatus()
Subsequent patches will split up ghes_read_estatus(), at which
point passing around the 'silent' flag gets annoying. This is to
suppress prink() messages, which prior to commit 42a0bb3f71
("printk/nmi: generic solution for safe printk in NMI"), were
unsafe in NMI context.

This is no longer necessary, remove the flag. printk() messages
are batched in a per-cpu buffer and printed via irq-work, or a call
back from panic().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:44 +01:00
James Morse 78b0b690f6 ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.

Don't wait for another CPU to release the die-lock before panic()ing,
our only goal is to print this fatal error and panic().

This code is always called in_nmi(), and since commit 42a0bb3f71
("printk/nmi: generic solution for safe printk in NMI"), it has been
safe to call printk() from this context. Messages are batched in a
per-cpu buffer and printed via irq-work, or a call back from panic().

Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-07 23:10:44 +01:00