commit 88d724e2301a69c1ab805cd74fc27aa36ae529e0 upstream.
When a CPU goes offline, the interrupts affine to that CPU are
re-configured.
Managed interrupts undergo either migration to other CPUs or shutdown if
all CPUs listed in the affinity are offline. The migration of managed
interrupts is guaranteed on x86 because there are interrupt vectors
reserved.
Regular interrupts are migrated to a still online CPU in the affinity mask
or if there is no online CPU to any online CPU.
This works as long as the still online CPUs in the affinity mask have
interrupt vectors available, but in case that none of those CPUs has a
vector available the migration fails and the device interrupt becomes
stale.
This is not any different from the case where the affinity mask does not
contain any online CPU, but there is no fallback operation for this.
Instead of giving up, retry the migration attempt with the online CPU mask
if the interrupt is not managed, as managed interrupts cannot be affected
by this problem.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423073413.79625-1-dongli.zhang@oracle.com
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a60dd06af674d3bb76b40da5d722e4a0ecefe650 upstream.
irq_restore_affinity_of_irq() restarts managed interrupts unconditionally
when the first CPU in the affinity mask comes online. That's correct during
normal hotplug operations, but not when resuming from S3 because the
drivers are not resumed yet and interrupt delivery is not expected by them.
Skip the startup of suspended interrupts and let resume_device_irqs() deal
with restoring them. This ensures that irqs are not delivered to drivers
during the noirq phase of resuming from S3, after non-boot CPUs are brought
back online.
Signed-off-by: David Stevens <stevensd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240424090341.72236-1-stevensd@chromium.org
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit edbbaae42a56f9a2b39c52ef2504dfb3fb0a7858 upstream.
Currently, whenever a caller is providing an affinity hint for an
interrupt, the allocation code uses it to calculate the node and copies the
cpumask into irq_desc::affinity.
If the affinity for the interrupt is not marked 'managed' then the startup
of the interrupt ignores irq_desc::affinity and uses the system default
affinity mask.
Prevent this by setting the IRQD_AFFINITY_SET flag for the interrupt in the
allocator, which causes irq_setup_affinity() to use irq_desc::affinity on
interrupt startup if the mask contains an online CPU.
[ tglx: Massaged changelog ]
Fixes: 45ddcecbfa ("genirq: Use affinity hint in irqdesc allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20240806072044.837827-1-shayd@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ce3e98184b625d2870991880bf9586ded7ea7f9 upstream.
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617ee ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68cbd415dd4b9c5b9df69f0f091879e56bf5907a upstream.
A proper task_work_cancel() API that actually cancels a callback and not
*any* callback pointing to a given function is going to be needed for
perf events event freeing. Do the appropriate rename to prepare for
that.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240621091601.18227-2-frederic@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b84a8aba806261d2f759ccedf4a2a6a80a5e55ba upstream.
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:
CPU0 CPU1
desc = mt_find()
delayed_free_desc(desc)
irq_desc_get_irq(desc)
The use-after-free is reported by KASAN:
Call trace:
irq_get_next_irq+0x58/0x84
show_stat+0x638/0x824
seq_read_iter+0x158/0x4ec
proc_reg_read_iter+0x94/0x12c
vfs_read+0x1e0/0x2c8
Freed by task 4471:
slab_free_freelist_hook+0x174/0x1e0
__kmem_cache_free+0xa4/0x1dc
kfree+0x64/0x128
irq_kobj_release+0x28/0x3c
kobject_put+0xcc/0x1e0
delayed_free_desc+0x14/0x2c
rcu_do_batch+0x214/0x720
Guard the access with a RCU read lock section.
Fixes: 721255b982 ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32 upstream.
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b4 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6309727ef27162deabd5c095c11af24970fba5a2 ]
Add a kthread_stop_put() helper that stops a thread and puts its task
struct. Use it to replace the various instances of kthread_stop()
followed by put_task_struct().
Remove the kthread_stop_put() macro in usbip that is similar but doesn't
return the result of kthread_stop().
[agruenba@redhat.com: fix kerneldoc comment]
Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com
[akpm@linux-foundation.org: document kthread_stop_put()'s argument]
Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: bb9025f4432f ("dma-mapping: benchmark: fix up kthread-related error handling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Upstream: no
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Ming Wang <wangming01@loongson.cn>
commit b184c8c2889ceef0a137c7d0567ef9fe3d92276e upstream.
For a CONFIG_SPARSE_IRQ=n kernel, early_irq_init() is supposed to
initialize all interrupt descriptors.
It does except for irq_desc::resend_node, which ia only initialized for the
first descriptor.
Use the indexed decriptor and not the base pointer to address that.
Fixes: bc06a9e087 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122085716.2999875-5-dawei.li@shingroup.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e7afb2eb7b2a7c81e9f608cbdf74a07606fd1b5 upstream.
irq_remove_generic_chip() calculates the Linux interrupt number for removing the
handler and interrupt chip based on gc::irq_base as a linear function of
the bit positions of set bits in the @msk argument.
When the generic chip is present in an irq domain, i.e. created with a call
to irq_alloc_domain_generic_chips(), gc::irq_base contains not the base
Linux interrupt number. It contains the base hardware interrupt for this
chip. It is set to 0 for the first chip in the domain, 0 + N for the next
chip, where $N is the number of hardware interrupts per chip.
That means the Linux interrupt number cannot be calculated based on
gc::irq_base for irqdomain based chips without a domain map lookup, which
is currently missing.
Rework the code to take the irqdomain case into account and calculate the
Linux interrupt number by a irqdomain lookup of the domain specific
hardware interrupt number.
[ tglx: Massage changelog. Reshuffle the logic and add a proper comment. ]
Fixes: cfefd21e69 ("genirq: Add chip suspend and resume callbacks")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231024150335.322282-1-herve.codina@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b56ebe7c896dc78b5865ec2c4b1dae3c93537517 upstream.
commit ef8dd01538 ("genirq/msi: Make interrupt allocation less
convoluted"), reworked the code so that the x86 specific quirk for affinity
setting of non-maskable PCI/MSI interrupts is not longer activated if
necessary.
This could be solved by restoring the original logic in the core MSI code,
but after a deeper analysis it turned out that the quirk flag is not
required at all.
The quirk is only required when the PCI/MSI device cannot mask the MSI
interrupts, which in turn also prevents reservation mode from being enabled
for the affected interrupt.
This allows ot remove the NOMASK quirk bit completely as msi_set_affinity()
can instead check whether reservation mode is enabled for the interrupt,
which gives exactly the same answer.
Even in the momentary non-existing case that the reservation mode would be
not set for a maskable MSI interrupt this would not cause any harm as it
just would cause msi_set_affinity() to go needlessly through the
functionaly equivalent slow path, which works perfectly fine with maskable
interrupts as well.
Rework msi_set_affinity() to query the reservation mode and remove all
NOMASK quirk logic from the core code.
[ tglx: Massaged changelog ]
Fixes: ef8dd01538 ("genirq/msi: Make interrupt allocation less convoluted")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a0b0bad10587ae2948a7c36ca4ffc206007fbcf3 ]
When a CPU is about to be offlined, x86 validates that all active
interrupts which are targeted to this CPU can be migrated to the remaining
online CPUs. If not, the offline operation is aborted.
The validation uses irq_matrix_allocated() to retrieve the number of
vectors which are allocated on the outgoing CPU. The returned number of
allocated vectors includes also vectors which are associated to managed
interrupts.
That's overaccounting because managed interrupts are:
- not migrated when the affinity mask of the interrupt targets only
the outgoing CPU
- migrated to another CPU, but in that case the vector is already
pre-allocated on the potential target CPUs and must not be taken into
account.
As a consequence the check whether the remaining online CPUs have enough
capacity for migrating the allocated vectors from the outgoing CPU might
fail incorrectly.
Let irq_matrix_allocated() return only the number of allocated non-managed
interrupts to make this validation check correct.
[ tglx: Amend changelog and fixup kernel-doc comment ]
Fixes: 2f75d9e1c9 ("genirq: Implement bitmap matrix allocator")
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231020072522.557846-1-yu.c.chen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Core:
- Prevent a deadlock of nested interrupt threads vs.
synchronize_hard()
- Removal of a stale extern declaration
Drivers:
- The first new driver since v6.2 for Amlogic-C3 SoCs
- The usual small fixes, cleanups and improvements all over
the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmTsjR0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofmLEAC5anouyAUbGjl/cL//+2GkvWB2YgO2
+D7q8tx3Tt5US/vpiYaDvFs+at+2lAyB6M/KUkvYSCne9bm80+YqaL+73iM6YQKH
yNDrAnLR1FA4+fHIvvhmk23U1uUjWgSTL7iKufgNWf8I0aYsWLTIX3N6m0606ZLE
eUNIf7w+aZRr/axHdadRQpib6l1fvfA3C72urPRBnZDA56ZDAgE9tS0kfk9D+3sW
BgXRp4knvHBf6I4RdA10hHDTa1RuX9xkDeAC1a/ljWpbCEgEDPJ+5JI+TD+fU/d5
TCVGa7GwqJc2srRFwy76/t0jQrG7DnwW56SsMomjS+vjIu4exNFwXJ6LqZSJacwa
Z3HB0Py3awQWPfHdFqdF9LHyum+a58RHX96RenlL8Q/42qe5K6RmAIfcAaiy2OpL
xAGy9+nplMWh+qde9q1o30WPr08GhhDEXrdHZdAAODjBeoUDGmFooH5NHAFjw2+Q
ba15/f7Nl8KIl854OUJv4cftNEv5klpueLR/YUviivoO55vydRae/k/CSPhvt7TN
VIQ+vgiaiOCEwAAx2kP7Au0ADeEMCYiEqH9KWBp33dvjNZMt2DbAGLDWagcy8N9y
R8ms4c5e7Z2MvN9Z6YDihQ1XvkQsdX/dWwJq3weH3c/tP1MBFFHZYdeQhIVKTIKR
4zFKi4jrlmn0vQ==
=jiUK
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Boring updates for the interrupt subsystem:
Core:
- Prevent a deadlock of nested interrupt threads vs.
synchronize_hard()
- Removal of a stale extern declaration
Drivers:
- The first new driver since v6.2 for Amlogic-C3 SoCs
- The usual small fixes, cleanups and improvements all over the
place"
* tag 'irq-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Add support for Amlogic-C3 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic-C3 SoCs
irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()
irqchip/ls-scfg-msi: Use devm_platform_get_and_ioremap_resource()
irqchip: Explicitly include correct DT includes
irqchip/orion: Use of_address_count() helper
irqchip/irq-pruss-intc: Do not check for 0 return after calling platform_get_irq()
irqchip/imx-mu-msi: Do not check for 0 return after calling platform_get_irq()
irqchipr/i8259: Mark i8259_of_init() static
irqchip/mips-gic: Mark gic_irq_domain_free() static
irqchip/xtensa-pic: Include header for xtensa_pic_init_legacy()
irqchip/loongson-eiointc: Fix return value checking of eiointc_index
genirq: Remove unused extern declaration
genirq: Prevent nested thread vs synchronize_hardirq() deadlock
The switch to using hlist for managing software resend of interrupts
broke resend in at least two ways:
First, unconditionally adding interrupt descriptors to the resend list can
corrupt the list when the descriptor in question has already been
added. This causes the resend tasklet to loop indefinitely with interrupts
disabled as was recently reported with the Lenovo ThinkPad X13s after
threaded NAPI was disabled in the ath11k WiFi driver.
This bug is easily fixed by restoring the old semantics of irq_sw_resend()
so that it can be called also for descriptors that have already been marked
for resend.
Second, the offending commit also broke software resend of nested
interrupts by simply discarding the code that made sure that such
interrupts are retriggered using the parent interrupt.
Add back the corresponding code that adds the parent descriptor to the
resend list.
Fixes: bc06a9e087 ("genirq: Use hlist for managing resend handlers")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/lkml/20230809073432.4193-1-johan+linaro@kernel.org/
Link: https://lore.kernel.org/r/20230826154004.1417-1-johan+linaro@kernel.org
There is a possibility of deadlock if synchronize_hardirq() is called
when the nested threaded interrupt is active. The following scenario
was observed on a uniprocessor PREEMPT_NONE system:
Thread 1 Thread 2
handle_nested_thread()
Set INPROGRESS
Call ->thread_fn()
thread_fn goes to sleep
free_irq()
__synchronize_hardirq()
Busy-loop forever waiting for INPROGRESS
to be cleared
The INPROGRESS flag is only supposed to be used for hard interrupt
handlers. Remove the incorrect usage in the nested threaded interrupt
case and instead re-use the threads_active / wait_for_threads mechanism
to wait for nested threaded interrupts to complete.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613-genirq-nested-v3-1-ae58221143eb@axis.com
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmSWHlEPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDgDcP/AyuNjjt9ybZ8LT5hGYRBKTfO3bsssiV/ZIE
NRbfwK6nWTurA3tN9XypYouB3HfKvTvDYHzeo9Yi2y51GI20o7noSfVNHHld5dAE
WTBYcfc6n+1KqGNA1tPVeE403jd50TnmOmFaXhN1Vn/KQVQKs+LmjoVdS7HXKmwu
PLk1tbFSGh5+vgLQTlhze5Wz7nU59lJ5WfCJ8negYls/5l3FseGgxEuffWKytElJ
I6PbnRyd9y7R/KYmkobFjS98cjeNznAxJEm/2InVpcb3IwUSEyJUHgNwbs6KSeYf
4BdVyxSAQlFPgxO17UKQxblYu/HuFY/NWghkg7dn6ZgVubj52rtrl9txUId57HGC
pYbSVm0bMevqlgHDYHold+Q4I6DqltyHKgiA9UfU9IQj4MV/KXbDjtng/ih4ZkHe
rw+BnckyrhZ5K5p3nDlzS/TNQT1baKqNFgx/O5HPV27NT8ARynqoFxNUzA0z37v3
x5dj34kiIW2kAGJDOP9+9jj2uF4yX0MKdNxdMs8PM9hYAsO0AqItNsXEQoUuzJF+
u1hhGZ1H6HYmY2WphXD57fnIR/7Hm6Iz00XC17+1jUP+wsefhAbreOyJIbbcpENF
IIxrcIxAPl3PQPsj8n98/9fKQ6u/QfrvWUH+6aoFQH/Np7RTFWmDh8bqucjuXsGS
8gxR3REr
=quHC
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org
* irq/misc-6.5:
: .
: Misc cleanups:
:
: - Add a number of missing prototypes
: - Mark global symbol as static where needed
: - Drop some now useless non-DT code paths
: - Add a missing interrupt mapping to the STM32 irqchip
: - Silence another STM32 warning when building with W=1
: - Fix the jcore-aic driver that actually never worked...
: .
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
Signed-off-by: Marc Zyngier <maz@kernel.org>
irq_domain_debugfs_init() is defined in irqdomain.c, but the
declaration is in a header that is not included here:
kernel/irq/irqdomain.c:1965:13: error: no previous prototype for 'irq_domain_debugfs_init' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230516200432.554240-1-arnd@kernel.org
There is a class of interrupt controllers out there that, once they
have signalled a given interrupt number, will still signal incoming
instances of the *same* interrupt despite the original interrupt
not having been EOIed yet.
As long as the new interrupt reaches the *same* CPU, nothing bad
happens, as that CPU still has its interrupts globally disabled,
and we will only take the new interrupt once the interrupt has
been EOIed.
However, things become more "interesting" if an affinity change comes
in while the interrupt is being handled. More specifically, while
the per-irq lock is being dropped. This results in the affinity change
taking place immediately. At this point, there is nothing that prevents
the interrupt from firing on the new target CPU. We end-up with the
interrupt running concurrently on two CPUs, which isn't a good thing.
And that's where things become worse: the new CPU notices that the
interrupt handling is in progress (irq_may_run() return false), and
*drops the interrupt on the floor*.
The whole race looks like this:
CPU 0 | CPU 1
-----------------------------|-----------------------------
interrupt start |
handle_fasteoi_irq | set_affinity(CPU 1)
handler |
... | interrupt start
... | handle_fasteoi_irq -> early out
handle_fasteoi_irq return | interrupt end
interrupt end |
If the interrupt was an edge, too bad. The interrupt is lost, and
the system will eventually die one way or another. Not great.
A way to avoid this situation is to detect this problem at the point
we handle the interrupt on the new target. Instead of dropping the
interrupt, use the resend mechanism to force it to be replayed.
Also, in order to limit the impact of this workaround to the pathetic
architectures that require it, gate it behind a new irq flag aptly
named IRQD_RESEND_WHEN_IN_PROGRESS.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: KarimAllah Raslan <karahmed@amazon.com>
Cc: Yipeng Zou <zouyipeng@huawei.com>
Cc: Zhang Jianhua <chris.zjh@huawei.com>
[maz: reworded commit mesage]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
Adding a bit more info about what the flags are used for may help future
code readers.
Signed-off-by: James Gowans <jgowans@amazon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230608120021.3273400-2-jgowans@amazon.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj
vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP
dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE=
=K860
-----END PGP SIGNATURE-----
Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a double free fix in the Xen pvcalls backend driver
- a fix for a regression causing the MSI related sysfs entries to not
being created in Xen PV guests
- a fix in the Xen blkfront driver for handling insane input data
better
* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pci/xen: populate MSI sysfs entries
xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
xen/blkfront: Only check REQ_FUA for writes
Commit bf5e758f02 ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.
Commit 6c796996ee ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.
Fix this by making use of the fallback functions for sysfs population.
Fixes: bf5e758f02 ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
The current implementation uses a static bitmap for interrupt descriptor
allocation and a radix tree to pointer store the pointer for lookup.
However, the size of the bitmap is constrained by the build time macro
MAX_SPARSE_IRQS, which may not be sufficient to support high-end servers,
particularly those with GICv4.1 hardware, which require a large interrupt
space to cover LPIs and vSGIs.
Replace the bitmap and the radix tree with a maple tree, which not only
stores pointers for lookup, but also provides a mechanism to find free
ranges. That removes the build time hardcoded upper limit.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-4-sdonthineni@nvidia.com
Move the open coded sparse bitmap handling into helper functions as
a preparatory step for converting the sparse interrupt management
to a maple tree.
No functional change.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-3-sdonthineni@nvidia.com
The current implementation utilizes a bitmap for managing interrupt resend
handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
However, this method may not efficiently utilize memory during runtime,
particularly when IRQ_BITMAP_BITS is large.
Address this issue by using an hlist to manage interrupt resend handlers
instead of relying on a static bitmap memory allocation. Additionally, a
new function, clear_irq_resend(), is introduced and called from
irq_shutdown to ensure a graceful teardown of the interrupt.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-2-sdonthineni@nvidia.com
For interrupts with secondary threads, the affinity is applied when the
thread is created but if the interrupts affinity is changed later only
the primary thread is updated.
Update the secondary thread's affinity as well to keep all the interrupts
activity on the assigned CPUs.
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230406180857.588682-1-john@metanate.com
- Prevent possible NULL pointer derefences in irq_data_get_affinity_mask()
and irq_domain_create_hierarchy().
- Take the per device MSI lock before invoking code which relies
on it being hold.
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted to
use core infrastructure and results in a fals positive warning.
- Remove dead code in the MSI subsystem.
- Clarify the documentation for pci_msix_free_irq().
- More kobj_type constification.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmQEVToTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodc9EACg5HBOGsh5OV8pwnuEDqfThK/dOZ5k
LJ8xQjGAx29JeNWu4gkkHaSDEGjZhwLiZlB6qUeH+LPQ1NgmAlLL3T2NxEOOWa6y
z/xQv+1Ceu6XxazpCSFRWR/6w4Nyup92jhlsUIkmmsWkVvKH/pV6Uo+3ta0WagWg
heb3vqts6J0AOJaMepF8azYGbwAPSIElNLI1UtiEuQYEKU55N8jLK20VJTL6lzJ2
FyRg/0ghNWDAaBdnv4cZCQ/MzoG5UkoU3f2cqhdSce5mqnq2fKRfgBjzllNgaRgA
zxOxIR88QaKTMHIr+WKD1dyWxDQlotFbBOkmVW39XAa13rn42s4GIeW88VCjJGww
RAm52SbC48cCIyNQlh4A6Vhb4vjPx2DndWbWnnVj5fWlUevdAPRxSlm4BjfxFxe+
LbuZCRRL1jjlC0fXmhVXTTxeE1/K7jarAZwRV7Nxhr3g0gT+Zv1jyaaW9rWuHq5U
3pS+xBl89LA/VYp9tv6jDfJlocmRwgrFbGX4UlfikqtObdTFqcH0FtmqisE61fZS
n0194BMWNDfPSibSpDohf/CDPoHZ6pNxeuqkVDiisUJHPpIYOt8+lH+8//DgBL7a
oi9zS0JazPIn2VM6NB4f/WXOYmS9GZq5+loiYEWb52AYtodKUKmOoWG0SUyy6XFr
E7yJzemsUwrJVg==
=jWot
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"A set of updates for the interrupt susbsystem:
- Prevent possible NULL pointer derefences in
irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
- Take the per device MSI lock before invoking code which relies on
it being hold
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted
to use core infrastructure and results in a fals positive warning
- Remove dead code in the MSI subsystem
- Clarify the documentation for pci_msix_free_irq()
- More kobj_type constification"
* tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
genirq/msi: Drop dead domain name assignment
irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
genirq/irqdesc: Make kobj_type structures constant
PCI/MSI: Clarify usage of pci_msix_free_irq()
genirq/msi: Take the per-device MSI lock before validating the control structure
genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
Miquel reported a warning in the MSI core which is triggered when
interrupts are freed via platform_msi_device_domain_free().
This code got reworked to use core functions for freeing the MSI
descriptors, but nothing took care to clear the msi_desc->irq entry, which
then triggers the warning in msi_free_msi_desc() which uses desc->irq to
validate that the descriptor has been torn down. The same issue exists in
msi_domain_populate_irqs().
Up to the point that msi_free_msi_descs() grew a warning for this case,
this went un-noticed.
Provide the counterpart of msi_domain_populate_irqs() and invoke it in
platform_msi_device_domain_free() before freeing the interrupts and MSI
descriptors and also in the error path of msi_domain_populate_irqs().
Fixes: 2f2940d168 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Reported-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87mt4wkwnv.ffs@tglx
Some polishing and small fixes for iommufd:
- Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem
- Use GFP_KERNEL_ACCOUNT inside the iommu_domains
- Support VFIO_NOIOMMU mode with iommufd
- Various typos
- A list corruption bug if HWPTs are used for attach
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/TgzQAKCRCFwuHvBreF
Ya3AAP4/WxTJIbDvtTyH3Fae3NxTdO8j8gsUvU1vrRYG83zdnAEAxd1yii7GEO8D
crkeq9D4FUiPAkFnJ64Exw2FHb060Qg=
=RABK
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Some polishing and small fixes for iommufd:
- Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt
subsystem
- Use GFP_KERNEL_ACCOUNT inside the iommu_domains
- Support VFIO_NOIOMMU mode with iommufd
- Various typos
- A list corruption bug if HWPTs are used for attach"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not add the same hwpt to the ioas->hwpt_list twice
iommufd: Make sure to zero vfio_iommu_type1_info before copying to user
vfio: Support VFIO_NOIOMMU with iommufd
iommufd: Add three missing structures in ucmd_buffer
selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
iommu: Remove IOMMU_CAP_INTR_REMAP
irq/s390: Add arch_is_isolated_msi() for s390
iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI
genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI
genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code
iommufd: Convert to msi_device_has_isolated_msi()
vfio/type1: Convert to iommu_group_has_isolated_msi()
iommu: Add iommu_group_has_isolated_msi()
genirq/msi: Add msi_device_has_isolated_msi()
Since commit d59f6617ee ("genirq: Allow fwnode to carry name
information only") an IRQ domain is always given a name during
allocation (e.g. used for the debugfs entry).
Drop the unused fallback name assignment when creating MSI domains.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230224130509.27814-1-johan+linaro@kernel.org
The recent switch to per-domain locking caused a NULL dereference in
irq_domain_create_hierarchy(), as Xen code is calling
msi_create_irq_domain() with a NULL parent pointer.
Fix that by testing parent to be set before dereferencing it. For a
non-existing parent the irqdomain's root will stay to point to
itself.
Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230223083800.31347-1-jgross@suse.com
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code.
This also contains a first usecase in the block multi-queue code which
Jens asked to take along with the librarization.
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing.
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context.
- Move the IPI multiplexing code from the Apple AIC driver into the
core. so it can be reused by RISCV.
Drivers:
- Plug OF node refcounting leaks in various drivers.
- Correctly mark level triggered interrupts in the Broadcom L2 drivers.
- The usual small fixes and improvements.
- No new drivers for the record!
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzUSkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoY3DEAC9E4yLO7VxxTrs/KrAVCgL3SnHVXQU
nE42uFbQwpCILuNmnqP3uvTHLCsXZkbuBaZEbxLBxC2iyU6+31N1Is+e6cClGMjK
kX6U9g9EqiRCdX3fgJiEU16fCgE8D1AEg+7XKLjeasQhCfKQGGtCtE9/Gmg/Ji92
gcEY/bjvm1hcoNo9dh/vR4k0k63fb13716RLScozUkS/XYVlu+LrrG349gD2WEA9
lh1twDkXvZTWkiYKWAkLorxcNyKhcnJxJw8zEIGVF5b6pCCudK8gXjBbMD5abC7W
xano6B8F455eSKNsi2TWyW47ZHUkC60sqCNDgI2MBTsI7D72UpAJoDfe0VjbMoaH
RQJnrGsUQbviBUen+LEet7nWZBQJRKZHOVtYEjA8ndB3PJUXKKcLeODdw11odyjR
bgZk+0wnowMArIaoLfeItF2oSpfSzLVxh2i8Aeus5tBesvhVCOi4LABRBKGCWvMj
cpSlMhZ4znMnr5j5lOGpcAjKFlWVh1HmF70Y2deGZi5xC8EXFL/VsB7rH5LEEEuF
7I8CO8M1mXeOTJoCchCbuAYgZyuk1DIhKUyOiYQZblaPNGcVGvCIN31SFBRT9h/8
e0VwSvVL756GhotUp/LjgTdG7MoKspWqRG00+q84SsDalsKGXMW7zmHc+1NgGN/C
Yxio1Jlly9Rwyw==
=+pu3
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Move the interrupt affinity spreading mechanism into lib/group_cpus
so it can be used for similar spreading requirements, e.g. in the
block multi-queue code
This also contains a first usecase in the block multi-queue code
which Jens asked to take along with the librarization
- Improve irqdomain locking to close a number race conditions which
can be observed with massive parallel device driver probing
- Enforce and document the semantics of disable_irq() which cannot be
invoked safely from non-sleepable context
- Move the IPI multiplexing code from the Apple AIC driver into the
core, so it can be reused by RISCV
Drivers:
- Plug OF node refcounting leaks in various drivers
- Correctly mark level triggered interrupts in the Broadcom L2
drivers
- The usual small fixes and improvements
- No new drivers for the record!"
* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
...
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions which prevents
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230217-kobj_type-irq-v1-1-fedfacaf8cdb@weissschuh.net
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, in msi_domain_populate_irqs() the lock is taken right after having
called msi_ctrl_valid(), which is just a tad too late.
Take the lock before invoking msi_ctrl_valid().
Fixes: 40742716f2 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20230220190101.314446-1-maz@kernel.org
If ipi_send_{mask|single}() is called with an invalid interrupt number, all
the local variables there will be NULL. ipi_send_verify() which is invoked
from these functions does verify its 'data' parameter, resulting in a
kernel oops in irq_data_get_affinity_mask() as the passed NULL pointer gets
dereferenced.
Add a missing NULL pointer check in ipi_send_verify()...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Fixes: 3b8e29a82d ("genirq: Implement ipi_send_mask/single()")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/b541232d-c2b6-1fe9-79b4-a7129459e4d0@omp.ru
- New and improved irqdomain locking, closing a number of races that
became apparent now that we are able to probe drivers in parallel
- A bunch of OF node refcounting bugs have been fixed
- We now have a new IPI mux, lifted from the Apple AIC code and
made common. It is expected that riscv will eventually benefit
from it
- Two small fixes for the Broadcom L2 drivers
- Various cleanups and minor bug fixes
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmPw4OgPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDYVgP/iVFxCPs+DCWUYvyTC8rvNzOj51COHUV/7yD
mY5BTIjH3yTQPDhQmFvITCAjKaMYc3eDLml/nF4tTCU0MFig+KsRsWNIEFXtSsI0
wO+S19QhHzj5odUok5IDC+cNTXScp2HV+vFoOhhf0zDzXqwVxRr7lO5i+n37ELMp
Mm9g2+EeUt43xTQxzbmNn5Kkpq9PMEnQFU2UkvJleg+KCgzSYThcR8/KUDKySZpk
TP+mcR5PevcqGhLt7vYS2lGh8Ye1warzp54C7Je8P8Txg3BM8xBynT1d3fgrlKfm
AOAPVW3PV6bPhgVYXZJopH3ykfmYM4ZiIvhRcgLyf6tbZAU6Twpiq823TAOVHyPI
SRcW8dehuvgq1VJIpRGZOSB2qIvFrqLhl0B1CtT04gFWJW9bSa2n5Y1h4Gcqy29o
SLJiKscx2KqvPmQqarLUUnuOZ5hhIrtYhkhhJuuwqZqzS1Kkz/mSB1MkPQEGxJi1
MpoTfbQ/0KTYXCqqgs/GBnDJ0mYrcvtBoGP7bjnVYnXpANP2bs+ZpQVPVq+17uuQ
k0gjxe8iENqXjW6JMlFX5K3dxG5ygXjfECMWsCJ+JdCtJdaIL8I46X/u7wHU2mfY
bohhb7xS2+HIPxz6w8aRu3IQG00mMv06vCYPBbPh+W0dUtocdM3U2kpe5gPYm1iz
kWx3WLaM
=ONcj
-----END PGP SIGNATURE-----
Merge tag 'irqchip-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- New and improved irqdomain locking, closing a number of races that
became apparent now that we are able to probe drivers in parallel
- A bunch of OF node refcounting bugs have been fixed
- We now have a new IPI mux, lifted from the Apple AIC code and
made common. It is expected that riscv will eventually benefit
from it
- Two small fixes for the Broadcom L2 drivers
- Various cleanups and minor bug fixes
Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
* irq/irqdomain-locking:
: .
: irqdomain locking overhaul courtesy of Johan Hovold.
:
: From the cover letter:
:
: "Parallel probing (e.g. due to asynchronous probing) of devices that
: share interrupts can currently result in two mappings for the same
: hardware interrupt to be created.
:
: This series fixes this mapping race and reworks the irqdomain locking so
: that in the end the global irq_domain_mutex is only used for managing
: the likewise global irq_domain_list, while domain operations (e.g. IRQ
: allocations) use per-domain (hierarchy) locking."
: .
irqdomain: Switch to per-domain locking
irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
irqchip/gic-v2m: Use irq_domain_create_hierarchy()
irqchip/alpine-msi: Use irq_domain_add_hierarchy()
x86/uv: Use irq_domain_create_hierarchy()
x86/ioapic: Use irq_domain_create_hierarchy()
irqdomain: Clean up irq_domain_push/pop_irq()
irqdomain: Drop leftover brackets
irqdomain: Drop dead domain-name assignment
irqdomain: Drop revmap mutex
irqdomain: Fix domain registration race
irqdomain: Fix mapping-creation race
irqdomain: Refactor __irq_domain_alloc_irqs()
irqdomain: Look for existing mapping only once
irqdomain: Drop bogus fwspec-mapping error handling
irqdomain: Fix disassociation race
irqdomain: Fix association race
Signed-off-by: Marc Zyngier <maz@kernel.org>
The IRQ domain structures are currently protected by the global
irq_domain_mutex. Switch to using more fine-grained per-domain locking,
which can speed up parallel probing by reducing lock contention.
On a recent arm64 laptop, the total time spent waiting for the locks
during boot drops from 160 to 40 ms on average, while the maximum
aggregate wait time drops from 550 to 90 ms over ten runs for example.
Note that the domain lock of the root domain (innermost domain) must be
used for hierarchical domains. For non-hierarchical domains (as for root
domains), the new root pointer is set to the domain itself so that
&domain->root->mutex always points to the right lock.
Also note that hierarchical domains should be constructed using
irq_domain_create_hierarchy() (or irq_domain_add_hierarchy()) to avoid
having racing allocations access a not fully initialised domain. As a
safeguard, the lockdep assertion in irq_domain_set_mapping() will catch
any offenders that also fail to set the root domain pointer.
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-21-johan+linaro@kernel.org
The irq_domain_push_irq() interface is used to add a new (outmost) level
to a hierarchical domain after IRQs have been allocated.
Possibly due to differing mental images of hierarchical domains, the
names used for the irq_data variables make these functions much harder
to understand than what they need to be.
Rename the struct irq_data pointer to the data embedded in the
descriptor as simply 'irq_data' and refer to the data allocated by this
interface as 'parent_irq_data' so that the names reflect how
hierarchical domains are implemented.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-12-johan+linaro@kernel.org
Drop some unnecessary brackets that were left in place when the
corresponding code was updated.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230213104302.17307-11-johan+linaro@kernel.org