Under certain loads, this soft lockup has been observed:
BUG: soft lockup - CPU#2 stuck for 22s! [ip6tables:1016]
Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211 rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vfat fat efivarfs xfs libcrc32c
CPU: 2 PID: 1016 Comm: ip6tables Not tainted 3.13.0-0.rc7.30.sa2.aarch64 #1
task: fffffe03e81d1400 ti: fffffe03f01f8000 task.ti: fffffe03f01f8000
PC is at __cpu_flush_kern_tlb_range+0xc/0x40
LR is at __purge_vmap_area_lazy+0x28c/0x3ac
pc : [<fffffe000009c5cc>] lr : [<fffffe0000182710>] pstate: 80000145
sp : fffffe03f01fbb70
x29: fffffe03f01fbb70 x28: fffffe03f01f8000
x27: fffffe0000b19000 x26: 00000000000000d0
x25: 000000000000001c x24: fffffe03f01fbc50
x23: fffffe03f01fbc58 x22: fffffe03f01fbc10
x21: fffffe0000b2a3f8 x20: 0000000000000802
x19: fffffe0000b2a3c8 x18: 000003fffdf52710
x17: 000003ff9d8bb910 x16: fffffe000050fbfc
x15: 0000000000005735 x14: 000003ff9d7e1a5c
x13: 0000000000000000 x12: 000003ff9d7e1a5c
x11: 0000000000000007 x10: fffffe0000c09af0
x9 : fffffe0000ad1000 x8 : 000000000000005c
x7 : fffffe03e8624000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000
x3 : fffffe0000c09cc8 x2 : 0000000000000000
x1 : 000fffffdfffca80 x0 : 000fffffcd742150
The __cpu_flush_kern_tlb_range() function looks like:
ENTRY(__cpu_flush_kern_tlb_range)
dsb sy
lsr x0, x0, #12
lsr x1, x1, #12
1: tlbi vaae1is, x0
add x0, x0, #1
cmp x0, x1
b.lo 1b
dsb sy
isb
ret
ENDPROC(__cpu_flush_kern_tlb_range)
The above soft lockup shows the PC at tlbi insn with:
x0 = 0x000fffffcd742150
x1 = 0x000fffffdfffca80
So __cpu_flush_kern_tlb_range has 0x128ba930 tlbi flushes left
after it has already been looping for 23 seconds!.
Looking up one frame at __purge_vmap_area_lazy(), there is:
...
list_for_each_entry_rcu(va, &vmap_area_list, list) {
if (va->flags & VM_LAZY_FREE) {
if (va->va_start < *start)
*start = va->va_start;
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
}
}
...
if (nr || force_flush)
flush_tlb_kernel_range(*start, *end);
So if two areas are being freed, the range passed to
flush_tlb_kernel_range() may be as large as the vmalloc
space. For arm64, this is ~240GB for 4k pagesize and ~2TB
for 64kpage size.
This patch works around this problem by adding a loop limit.
If the range is larger than the limit, use flush_tlb_all()
rather than flushing based on individual pages. The limit
chosen is arbitrary as the TLB size is implementation
specific and not accessible in an architected way. The aim
of the arbitrary limit is to avoid soft lockup.
Signed-off-by: Mark Salter <msalter@redhat.com>
[catalin.marinas@arm.com: commit log update]
[catalin.marinas@arm.com: marginal optimisation]
[catalin.marinas@arm.com: changed to MAX_TLB_RANGE and added comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The architecture specification states that both DSB and ISB are required
between page table modifications and subsequent memory accesses using the
corresponding virtual address. When TLB invalidation takes place, the
tlb_flush_* functions already have the necessary barriers. However, there are
other functions like create_mapping() for which this is not the case.
The patch adds the DSB+ISB instructions in the set_pte() function for
valid kernel mappings. The invalid pte case is handled by tlb_flush_*
and the user mappings in general have a corresponding update_mmu_cache()
call containing a DSB. Even when update_mmu_cache() isn't called, the
kernel can still cope with an unlikely spurious page fault by
re-executing the instruction.
In addition, the set_pmd, set_pud() functions gain an ISB for
architecture compliance when block mappings are created.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Leif Lindholm <leif.lindholm@linaro.org>
Acked-by: Steve Capper <steve.capper@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
When calling our low-level barrier macros directly, we can often suffice
with more relaxed behaviour than the default "all accesses, full system"
option.
This patch updates the users of dsb() to specify the option which they
actually require.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The tlb maintainence functions: __cpu_flush_user_tlb_range and
__cpu_flush_kern_tlb_range do not take into consideration the page
granule when looping through the address range, and repeatedly flush
tlb entries for the same page when operating with 64K pages.
This patch re-works the logic s.t. we instead advance the loop by
1 << (PAGE_SHIFT - 12), so avoid repeating ourselves.
Also the routines have been converted from assembler to static inline
functions to aid with legibility and potential compiler optimisations.
The isb() has been removed from flush_tlb_kernel_range(.) as it is
only needed when changing the execute permission of a mapping. If one
needs to set an area of the kernel as execute/non-execute an isb()
must be inserted after the call to flush_tlb_kernel_range.
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bring Transparent HugePage support to ARM. The size of a
transparent huge page depends on the normal page size. A
transparent huge page is always represented as a pmd.
If PAGE_SIZE is 4KB, THPs are 2MB.
If PAGE_SIZE is 64KB, THPs are 512MB.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
This patch adds the TLB maintenance functions. There is no distinction
made between the I and D TLBs. TLB maintenance operations are
automatically broadcast between CPUs in hardware. The inner-shareable
operations are always present, even on UP systems.
NOTE: Large part of this patch to be dropped once Peter Z's generic
mmu_gather patches are merged.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>