When mapping a LPI, the ITS driver picks the first possible
affinity, which is in most cases CPU0, assuming that if
that's not suitable, someone will come and set the affinity
to something more interesting.
It apparently isn't the case, and people complain of poor
performance when many interrupts are glued to the same CPU.
So let's place the interrupts by finding the "least loaded"
CPU (that is, the one that has the fewer LPIs mapped to it).
So called 'managed' interrupts are an interesting case where
the affinity is actually dictated by the kernel itself, and
we should honor this.
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1575642904-58295-1-git-send-email-john.garry@huawei.com
Link: https://lore.kernel.org/r/20200515165752.121296-3-maz@kernel.org
(cherry picked from commit 840775cad1)
Signed-off-by: Alex Shi <alexsshi@tencent.com>
In order to improve the distribution of LPIs among CPUs, let start by
tracking the number of LPIs assigned to CPUs, both for managed and
non-managed interrupts (as separate counters).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20200515165752.121296-2-maz@kernel.org
(cherry picked from commit aca60b181c)
Signed-off-by: Alex Shi <alexsshi@tencent.com>
Gitee limit the repo's size to 3GB, to reduce the size of the code,
sync codes to ock 5.4.119-20.0009.21 in one commit.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Sync kernel codes to the same with 590eaf1fec ("Init Repo base on
linux 5.4.32 long term, and add base tlinux kernel interfaces."), which
is from tk4, and it is the base of tk4.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
On a system without Single VMOVP support (say GITS_TYPER.VMOVP == 0),
we will map vPEs only on ITSs that will actually control interrupts
for the given VM. And when moving a vPE, the VMOVP command will be
issued only for those ITSs.
But when issuing VMOVPs we seemed fail to present the exact ITSList
to ITSs who are actually included in the synchronization operation.
The its_list_map we're currently using includes all ITSs in the system,
even though some of them don't have the corresponding vPE mapping at all.
Introduce get_its_list() to get the per-VM its_list_map, to indicate
which ITSs have vPE mappings for the given VM, and use this map as
the expected ITSList when building VMOVP. This is hopefully a performance
gain not to do some synchronization with those unsuspecting ITSs.
And initialize the whole command descriptor to zero at beginning, since
the seq_num and its_list should be RES0 when GITS_TYPER.VMOVP == 1.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1571802386-2680-1-git-send-email-yuzenghui@huawei.com
When allocating a range of LPIs for a Multi-MSI capable device,
this allocation extended to the closest power of 2.
But on the release path, the interrupts are released one by
one. This results in not releasing the "extra" range, leaking
the its_device. Trying to reprobe the device will then fail.
Fix it by releasing the LPIs the same way we allocate them.
Fixes: 8208d1708b ("irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size")
Reported-by: Jiaxing Luo <luojiaxing@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/f5e948aa-e32f-3f74-ae30-31fee06c2a74@huawei.com
We try to find a free LPI region in device's lpi_map and allocate them
(set them to 1) when we want to allocate LPIs for this device. This is
what bitmap_find_free_region() has done for us. The following set_bit
is redundant and a bit confusing (since we only set_bit against the first
allocated LPI idx). Remove it, and make the set_bit explicit by comment.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Do not expose the ITS' VA (it appears in debugfs). Instead, record
the PA, which at least can be used to precisely identify the associated
irqchip and domain.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
- Fix a couple of UAF on error paths (RZA1, GICv3 ITS)
- Fix iMX GPCv2 trigger setting
- Add missing of_node_put on error path in MBIGEN
- Add another bunch of /* fall-through */ to silence warnings
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1B0ysPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDEq0P/19TlvVhBzRBQ7xrZsh7RYGYddgHZeYxEwcJ
iZaZnEwYPqrQXS7UIOyMmpdhDOE89sntgCfQWWhPqZdVMAPPAanucbcIA9vNxy+U
OI9ue90Eeo5MAm7T2qp3MgEROQ21npYeTNGPHM6Rhh3twbsLX5S7rQCOZBeuY1zT
l9FGOUsUgYfUgdEGAv4wZLdIakfTbwAlk+9QaryEn7leu4s/vzHIdyMmJ/RDlt/i
YPWFD4XfDV8WSE0CE88oy1T4ppD8e71C6LjS+NfJsY5brUGPDvcbllCm3tFudy/9
D/4kM6yjpNIGZK23p5ux8dBTZB/9+z/PadFJc6J78cFH+IMf2Clv40GrZ50cG8MK
GIDBgwGmooUU+vsmTTPeqtEn30CIyVGb1AkoZnTR/vRltsg0zaLhuqIHnhLTo5av
dEeTv1sBOcPR3b4NkX4WAHd5UJrF/a3/fSdW9h95rF/xlP3fto5lhRmtI9UOqlra
1X+GJXRWoEbMvLP75PlSOQkDiG8zKko9/5JX/da5Q0Qztl/gZPFaGexL/c36DLoJ
kKuHut/1TXu3xfoGPotYm8mainSHkEF1OThCOWX6sjS4m9JMYZMejjg9l1pNXfKw
lZVKKygYOwOeBubsYqmVOEliFai2EggSWmq77csu5FP84M/8JNRX+69yQLO0Cwea
ukYEvrvA
=oXHS
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
A small bunch of fixes from the irqchip department:
- Fix a couple of UAF on error paths (RZA1, GICv3 ITS)
- Fix iMX GPCv2 trigger setting
- Add missing of_node_put on error path in MBIGEN
- Add another bunch of /* fall-through */ to silence warnings
In its_vpe_init, when its_alloc_vpe_table fails, we should free
vpt_page allocated just before, instead of vpe->vpt_page.
Let's fix it.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull irq fixes from Ingo Molnar:
"Diverse irqchip driver fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Fix command queue pointer comparison bug
irqchip/mips-gic: Use the correct local interrupt map registers
irqchip/ti-sci-inta: Fix kernel crash if irq_create_fwspec_mapping fail
irqchip/irq-csky-mpintc: Support auto irq deliver to all cpus
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we run several VMs with PCI passthrough and GICv4 enabled, not
pinning vCPUs, we will occasionally see below warnings in dmesg:
ITS queue timeout (65440 65504 480)
ITS cmd its_build_vmovp_cmd failed
The reason for the above issue is that in BUILD_SINGLE_CMD_FUNC:
1. Post the write command.
2. Release the lock.
3. Start to read GITS_CREADR to get the reader pointer.
4. Compare the reader pointer to the target pointer.
5. If reader pointer does not reach the target, sleep 1us and continue
to try.
If we have several processors running the above concurrently, other
CPUs will post write commands while the 1st CPU is waiting the
completion. So we may have below issue:
phase 1:
---rd_idx-----from_idx-----to_idx--0---------
wait 1us:
phase 2:
--------------from_idx-----to_idx--0-rd_idx--
That is the rd_idx may fly ahead of to_idx, and if in case to_idx is
near the wrap point, rd_idx will wrap around. So the below condition
will not be met even after 1s:
if (from_idx < to_idx && rd_idx >= to_idx)
There is another theoretical issue. For a slow and busy ITS, the
initial rd_idx may fall behind from_idx a lot, just as below:
---rd_idx---0--from_idx-----to_idx-----------
This will cause the wait function exit too early.
Actually, it does not make much sense to use from_idx to judge if
to_idx is wrapped, but we need a initial rd_idx when lock is still
acquired, and it can be used to judge whether to_idx is wrapped and
the current rd_idx is wrapped.
We switch to a method of calculating the delta of two adjacent reads
and accumulating it to get the sum, so that we can get the real rd_idx
from the wrapped value even when the queue is almost full.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Heyi Guo <guoheyi@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull IRQ chip updates from Ingo Molnar:
"A late irqchips update:
- New TI INTR/INTA set of drivers
- Rewrite of the stm32mp1-exti driver as a platform driver
- Update the IOMMU MSI mapping API to be RT friendly
- A number of cleanups and other low impact fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
arm64: arch_k3: Enable interrupt controller drivers
irqchip/ti-sci-inta: Add msi domain support
soc: ti: Add MSI domain bus support for Interrupt Aggregator
irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver
dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings
irqchip/ti-sci-intr: Add support for Interrupt Router driver
dt-bindings: irqchip: Introduce TISCI Interrupt router bindings
gpio: thunderx: Use the default parent apis for {request,release}_resources
genirq: Introduce irq_chip_{request,release}_resource_parent() apis
firmware: ti_sci: Add helper apis to manage resources
firmware: ti_sci: Add RM mapping table for am654
firmware: ti_sci: Add support for IRQ management
firmware: ti_sci: Add support for RM core ops
...
its_irq_compose_msi_msg() may be called from non-preemptible context.
However, on RT, iommu_dma_map_msi_msg requires to be called from a
preemptible context.
A recent change split iommu_dma_map_msi_msg() in two new functions:
one that should be called in preemptible context, the other does
not have any requirement.
The GICv3 ITS driver is reworked to avoid executing preemptible code in
non-preemptible context. This can be achieved by preparing the MSI
mapping when allocating the MSI interrupt.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Using list_add + list_sort to insert an element and keeping the list
sorted is a somewhat blunt instrument; one can find the right place to
insert in fewer lines of code than the cmp callback uses. Moreover,
walking the entire list afterwards to merge adjacent ranges is
overkill, since we know that only the just-inserted element may be
merged with its neighbours.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There's no reason to ask kmalloc() to zero the allocation, since all
the fields get initialized immediately afterwards. Except that there's
also not any reason to initialize the ->entry member, since the
element gets added to the lpi_range_list immediately.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There's no reason to do the allocation of the new lpi_range inside the
lpi_range_lock. One could change the code to avoid the allocation
altogether in case the freed range can be merged with one or two
existing ranges (in which case the allocation would naturally be done
under the lock), but it's probably not worth complicating the code for
that.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The word 'entirely' has been misspelt in a comment in its_msi_prepare().
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Parsing entries in an ACPI table had assumed a generic header
structure. There is no standard ACPI header, though, so less common
layouts with different field sizes required custom parsers to go through
their subtable entry list.
Create the infrastructure for adding different table types so parsing
the entries array may be more reused for all ACPI system tables and
the common code doesn't need to be duplicated.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lpi_range_list is supposed to be sorted in ascending order of
->base_id (at least if the range merging is to work), but the current
comparison function returns a positive value if rb->base_id >
ra->base_id, which means that list_sort() will put A after B in that
case - and vice versa, of course.
Fixes: 880cb3cddd (irqchip/gic-v3-its: Refactor LPI allocator)
Cc: stable@vger.kernel.org (v4.19+)
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pull irq updates from Thomas Gleixner:
"The interrupt departement delivers this time:
- New infrastructure to manage NMIs on platforms which have a sane
NMI delivery, i.e. identifiable NMI vectors instead of a single
lump.
- Simplification of the interrupt affinity management so drivers
don't have to implement ugly loops around the PCI/MSI enablement.
- Speedup for interrupt statistics in /proc/stat
- Provide a function to retrieve the default irq domain
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- The usual small fixes, improvements and cleanups all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irqchip/imx-irqsteer: Add multi output interrupts support
irqchip/imx-irqsteer: Change to use reg_num instead of irq_group
dt-bindings: irq: imx-irqsteer: Add multi output interrupts support
dt-binding: irq: imx-irqsteer: Use irq number instead of group number
irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables
irqdomain: Allow the default irq domain to be retrieved
irqchip/sifive-plic: Implement irq_set_affinity() for SMP host
irqchip/sifive-plic: Differentiate between PLIC handler and context
irqchip/sifive-plic: Add warning in plic_init() if handler already present
irqchip/sifive-plic: Pre-compute context hart base and enable base
PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets
genirq/affinity: Remove the leftovers of the original set support
nvme-pci: Simplify interrupt allocation
genirq/affinity: Add new callback for (re)calculating interrupt sets
genirq/affinity: Store interrupt sets size in struct irq_affinity
genirq/affinity: Code consolidation
irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid.
irqchip/i8259: Fix shutdown order by moving syscore_ops registration
dt-bindings: interrupt-controller: loongson ls1x intc
...
- Core pseudo-NMI handling code
- Allow the default irq domain to be retrieved
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- A handful of other fixes (i8259, GICv3, PLIC)
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlxwGtgVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD+2YP/2m9cVU3Z9ak8+HdSblq2Sw8QPfd
RshYS+DzppLUzhzj2w2jnz9eP2fWEqBwrQmvtOI8Fo+id0PvdE3ngaP4hPMJDyuU
Ou02TV6YwE4jknoO02RXOdeBJArccc1WR5++YZjp1gGUABFUPCHwKLoZgysurapV
sZQ1Ten3wlsrZKKNTdWfYFWB36d7J3eqFYeGy3sll1wQ6XUbHmUJPPrSfXMqDYzY
giDD/DH8IIhfnRs+T2TxGzKtTDMnJRYJYQK2bNgtNAW+wEY2BtCLSHj8//3bK0R9
Jek9xg1NLpbQE+T8f2ZUd6BjbVxmDd3mGPvshXKyHFESl4fvC9yrddC86dBzHwrN
VJmaES974PBuMtE2xPZGInh77EcelVC7OPeXsnjVMrUZo0s7tFY/TWA+rqCOLmgC
A+0jagCDx1nTTYGXsqoyrHThoQoYZRX6AnXFeDJb9OLo3cV7x4w/FPORstM0PbAc
butyZulVg1YQ+Y+oJK/UvIkdFL7FFqB/kgZK/lrL0InvbQMj4CBt3bsWY5OxgInF
E02tgzEnrx1nHGi1XPnCTOs7DnKeaPR/h/u3PjoT7FeiZLClyiGDw7V/NuF+buLB
w7Pqpn835CnkXC27MycTjPo23eZv690M4vcHL4vrhN+iuGp+2hZdXUiR15mZnH6m
g0N8anZbL1iol0Gm
=M6YA
-----END PGP SIGNATURE-----
Merge tag 'irqchip-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier
- Core pseudo-NMI handling code
- Allow the default irq domain to be retrieved
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- A handful of other fixes (i8259, GICv3, PLIC)
The NUMA node information is visible to ITS driver but not being used
other than handling hardware errata. ITS/GICR hardware accesses to the
local NUMA node is usually quicker than the remote NUMA node. How slow
the remote NUMA accesses are depends on the implementation details.
This patch allocates memory for ITS management tables and command
queue from the corresponding NUMA node using the appropriate NUMA
aware functions. This change improves the performance of the ITS
tables read latency on systems where it has more than one ITS block,
and with the slower inter node accesses.
Apache Web server benchmarking using ab tool on a HiSilicon D06
board with multiple numa mem nodes shows Time per request and
Transfer rate improvements of ~3.6% with this patch.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In current logic, its_parse_indirect_baser() will be invoked twice
when allocating Device tables. Add a *break* to omit the unnecessary
and annoying (might be ...) invoking.
Fixes: 32bd44dc19 ("irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size")
Cc: stable@vger.kernel.org
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In the unlikely event that we cannot find any available LPI in the
system, we should gracefully return an error instead of carrying
on with no LPI allocated at all.
Fixes: 38dd7c494c ("irqchip/gic-v3-its: Drop chunk allocation compatibility")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On systems or VMs where multiple devices share a single DevID
(because they sit behind a PCI bridge, or because the HW is
broken in funky ways), we reuse the save its_device structure
in order to reflect this.
It turns out that there is a distinct lack of locking when looking
up the its_device, and two device being probed concurrently can result
in double allocations. That's obviously not nice.
A solution for this is to have a per-ITS mutex that serializes device
allocation.
A similar issue exists on the freeing side, which can run concurrently
with the allocation. On top of now taking the appropriate lock, we
also make sure that a shared device is never freed, as we have no way
to currently track the life cycle of such object.
Reported-by: Zheng Xiang <zhengxiang9@huawei.com>
Tested-by: Zheng Xiang <zhengxiang9@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
1. In current implementation, every VLPI will temporarily be mapped to
the first CPU in system (normally CPU0) and then moved to the real
scheduled CPU later.
2. So there is a time window and a VLPI may be sent to CPU0 instead of
the real scheduled vCPU, in a multi-CPU virtual machine.
3. However, CPU0 may have not been scheduled as a virtual CPU after
system boots up, so the value of its GICR_VPROPBASER is unknown at
that moment.
4. If the INTID of VLPI is larger than 2^(GICR_VPROPBASER.IDbits+1),
while IDbits is also in unknown state, GIC will behave as if the VLPI
is out of range and simply drop it, which results in interrupt missing
in Guest.
As no code will clear GICR_VPROPBASER at runtime, we can safely
initialize the IDbits field at boot time for each CPU to get rid of
this issue.
We also clear Valid bit of GICR_VPENDBASER in case any ancient
programming gets left in and causes memory corrupting. A new function
its_clear_vpend_valid() is added to reuse the code in
its_vpe_deschedule().
Fixes: e643d80340 ("irqchip/gic-v3-its: Add VPE scheduling")
Signed-off-by: Heyi Guo <guoheyi@huawei.com>
Signed-off-by: Heyi Guo <heyi.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The way we allocate events works fine in most cases, except
when multiple PCI devices share an ITS-visible DevID, and that
one of them is trying to use MultiMSI allocation.
In that case, our allocation is not guaranteed to be zero-based
anymore, and we have to make sure we allocate it on a boundary
that is compatible with the PCI Multi-MSI constraints.
Fix this by allocating the full region upfront instead of iterating
over the number of MSIs. MSI-X are always allocated one by one,
so this shouldn't change anything on that front.
Fixes: b48ac83d6b ("irqchip: GICv3: ITS: MSI support")
Cc: stable@vger.kernel.org
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
LPIs use the same priority value as other GIC interrupts.
Make the GIC default priority definition visible to ITS implementation
and use this same definition for LPI priorities.
Tested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If the LPI tables have been reserved with the EFI reservation
mechanism, we assume that these tables are safe to use even
when we find the redistributors to have LPIs enabled at
boot time, meaning that kexec can now work with GICv3.
You're welcome.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Upon enabling a redistributor, let's register the allocated tables
with the EFI table that tracks the memory reservations.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If booting with LPIs enabled, all the redistributors must have the
exact same property table. No ifs, no buts.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If using a kdump kernel, and that we cannot disable LPIs to install
our own tables, let's switch to using the already allocated tables.
This means that we'll change some of the initial kernel's memory,
but at least we'll be able to have LPIs in this secondary kernel.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to cope with kexec and GICv3, let's try and spot when
we're booting with LPIs already enabled, and the tables already
programmed into the redistributors.
This code is currently guarded by a predicate that is always false,
meaning this is not functionnal just yet.
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're currently only tracking the page allocated to contain the
property table by its struct page. In the future, it is going to
be convenient to track both PA and VA for that page instead. Let's
do that.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Pending tables for the redistributors are currently allocated
one at a time as each CPU boots. This is causing some grief
for Linux/RT (allocation from within a CPU hotplug notifier is
frown upon).
Let's move this allocation to take place at init time, when we
only have a single CPU. It means we're allocating memory for CPUs
that are not online yet, but most system will boot all of their
CPUs anyway, so that's not completely wasted.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we're going to reuse some pre-allocated memory for the property
table, split out the zeroing of that table into a separate function
for later use.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
LPI_PENDING_SZ is always used in conjunction with a max(), which doesn't
make much sense, since we're guaranteed that LPI_PENDING_SZ is already
aligned to 64K. Let's remove it.
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Tested-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit fe8e93504c ("irqchip/gic-v3-its: Use full range of LPIs"), removes
the cap for lpi_id_bits, which causes the following warning to trigger on a
QDF2400 server:
WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:4066 __alloc_pages_nodemask
...
Call trace:
__alloc_pages_nodemask+0x2d8/0x1188
alloc_pages_current+0x8c/0xd8
its_allocate_prop_table+0x5c/0xb8
its_init+0x220/0x3c0
gic_init_bases+0x250/0x380
gic_acpi_init+0x16c/0x2a4
In its_alloc_lpi_tables(), lpi_id_bits is 24 in QDF2400. The allocation in
allocate_prop_table() tries therefore to allocate 16M (order 12 if
pagesize=4k), which triggers the warning.
As said by MarcL
Capping lpi_id_bits at 16 (which is what we had before) is plenty,
will save a some memory, and gives some margin before we need to push
it up again.
Bring the upper limit of lpi_id_bits back to prevent
Fixes: fe8e93504c ("irqchip/gic-v3-its: Use full range of LPIs")
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Olof Johansson <olof@lixom.net>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1535432006-2304-1-git-send-email-jia.he@hxt-semitech.com
The its_lock lock is held while a new device is added to the list and
during setup while the CPU is booted. Even on -RT the CPU-bootup is
performed with disabled interrupts.
Make its_lock a raw_spin_lock_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A recent extension to the GIC architecture allows a hypervisor to
arbitrarily reduce the number of LPIs available to a guest, no
matter what the GIC says about the valid range of IntIDs.
Let's factor in this information when computing the number of
available LPIs
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of exposing the GIC distributor IntID field in the rdist
structure that is passed to the ITS, let's replace it with a
copy of the whole GICD_TYPER register. We are going to need
some of this information at a later time.
No functionnal change.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
At the moment, the core ITS driver imposes the allocation to be
in chunks of 32. As we want to relax this on a per bus basis, let's
move the the the allocation constraints to each bus.
No functionnal change.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we used to represent the LPI range using a bitmap, we were reducing
the number of LPIs to at most 64k in order to preserve memory.
With our new allocator, there is no such need, as dealing with 2^16
or 2^32 LPIs takes the same amount of memory.
So let's use the number of IntID bits reported by the GIC instead of
an arbitrary limit.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Our current LPI allocator relies on a bitmap, each bit representing
a chunk of 32 LPIs, meaning that each device gets allocated LPIs
in multiple of 32. It served us well so far, but new use cases now
require much more finer grain allocations, down the the individual
LPI.
Given the size of the IntID space (up to 32bit), it isn't practical
to continue using a bitmap, so let's use a different data structure
altogether.
We switch to a list, where each element represent a contiguous range
of LPIs. On allocation, we simply grab the first group big enough to
satisfy the allocation, and substract what we need from it. If the
group becomes empty, we just remove it. On freeing interrupts, we
insert a new group of interrupt in the list, sort it and fuse the
adjacent groups.
This makes freeing interrupt much more expensive than allocating
them (an unusual behaviour), but that's fine as long as we consider
that freeing interrupts is an extremely rare event.
We still allocate interrupts in blocks of 32 for the time being,
but subsequent patches will relax this.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Enabling LPIs was made a lot stricter recently, by checking that they are
disabled before enabling them. By doing so, the CPU hotplug case was missed
altogether, which leaves LPIs enabled on hotplug off (expecting the CPU to
eventually come back), and won't write a different value anyway on hotplug
on.
So skip that check if that particular case is detected
Fixes: 6eb486b66a ("irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lkml.kernel.org/r/20180622095254.5906-8-marc.zyngier@arm.com