Implement the TLB flush routine to evict a sepcific Super TLB entry,
vs. moving to a new ASID on every such flush.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
MMUv4 in HS38x cores supports Super Pages which are basis for Linux THP
support.
Normal and Super pages can co-exist (ofcourse not overlap) in TLB with a
new bit "SZ" in TLB page desciptor to distinguish between them.
Super Page size is configurable in hardware (4K to 16M), but fixed once
RTL builds.
The exact THP size a Linx configuration will support is a function of:
- MMU page size (typical 8K, RTL fixed)
- software page walker address split between PGD:PTE:PFN (typical
11:8:13, but can be changed with 1 line)
So for above default, THP size supported is 8K * 256 = 2M
Default Page Walker is 2 levels, PGD:PTE:PFN, which in THP regime
reduces to 1 level (as PTE is folded into PGD and canonically referred
to as PMD).
Thus thp PMD accessors are implemented in terms of PTE (just like sparc)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
[1] Only L1 cache exists
[2] Both L1 and L2 exist, but no IO coherency unit
[3] L1, L2 caches and IO coherency unit exist
Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.
This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.
As for IOC support for it we need:
[a] Implement empty DMA ops because IOC takes care of cache
coherency with DMAed data
[b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
This is required to make IOC work in first place and also
serves as optimization as LD/ST to coherent buffers can be
srviced from caches w/o going all the way to memory
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta:
-Added some comments about IOC gains
-Marked dma ops as static,
-Massaged changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
alloc_pages_exact() get gfp flags and handle zero'ing already
And while it, fix the case where ioremap fails: return rightaway.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
SLC maintenance ops need to be serialized by software as there is no
inherent buffering / quequing of aux commands. It can silently ignore a
new aux operation if previous one is still ongoing (SLC_CTRL_BUSY)
So gaurd the SLC op using a spin lock
The spin lock doesn't seem to be contended even in heavy workloads such
as iperf. On FPGA @ 75 MHz.
[1] Before this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.110 port 38935 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 48.4 MBytes 40.6 Mbits/sec
============================================================
[2] After this change:
============================================================
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60248 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 47.5 MBytes 39.8 Mbits/sec
# iperf -c 10.42.0.1
------------------------------------------------------------
Client connecting to 10.42.0.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 10.42.0.243 port 60249 connected with 10.42.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 54.9 MBytes 46.0 Mbits/sec
============================================================
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARCv2 is the next generation ISA from Synopsys and basis for the
HS3{4,6,8} families of processors which retain the traditional ARC mantra of
low power and configurability and are now more performant and feature rich.
HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and
SMP (upto 4 cores) among other features.
+ www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor
+ http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications
+ http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps
- Support for ARC SDP (Software Development platform): Main Board + CPU Cards
= AXS101: CPU Card with ARC700 in silicon @ 700 MHz
= AXS103: CPU Card with HS38x in FPGA
- Refactoring of ARCompact port to accomodate new ARCv2 ISA
- Miscll updates/cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVk0g8AAoJEGnX8d3iisJecqsQAI6gvBC4GSNYDrmgGJJK1uLQ
uf6ZXQRLBtyxwa6VMvaNFe91i5XV5WvEXDuNBQX4FdYbp7Fs+Jz5VK79xFtbVEdU
H6mgKcs9HBwQvrHBxl54XxxXfX7kD1kxrlV7cL4b7bXTEX0XyH5ROUj600/YP+B4
8t+XdYcfgFK0HpeFGXVP+Xmv/e+hBbzCpOjOd2ZFqEwymvSpZDc4KZ2yDvV2+Ybn
JNZ421urQOrxR27njvvPvtpeN7uuJKfRYq7IuIR8+Ad72S19EDdw+DZHp2XoUMXA
wgydWrrOaX2Dr2CmXHGA1C4nWEG7+Yo9I1WitjJct0tkOQyDR2OIDGmvKGBd1uoS
QsihtoKBRvns+2gpXBEOmOHmF6ggpHNN0ppIwCp+AK5kX3fmxBtyUekyYmVpg8oQ
xgFIuJgmiAvW7QB7xIO6SFFt18De2ifDRrKWJwVauvfW/PvUIwuUBEcbh0OHAn54
ebUUWu2ZdVNe0XCsZOAQGwYHZRWBk8Bn3bhFpNnOliRiF77e9GsKeGYeIswYFy7I
42Gp35ftEj1pLLFZ1vIsAo72N6ErmHwPOcJkaBYaTbPGPcTEO2aR6b8WOcCjsPxK
DUeUV3H2HV+6V4jw/96lnsaRqsaj4TsJxEAFRR3wT1DLoRudCIDubaXTdvvDie77
RgKn4ZdxgmXD97+deBqc
=KwNo
-----END PGP SIGNATURE-----
Merge tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC architecture updates from Vineet Gupta:
- support for HS38 cores based on ARCv2 ISA
ARCv2 is the next generation ISA from Synopsys and basis for the
HS3{4,6,8} families of processors which retain the traditional ARC mantra of
low power and configurability and are now more performant and feature rich.
HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and
SMP (upto 4 cores) among other features.
+ www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor
+ http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications
+ http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps
- support for ARC SDP (Software Development platform): Main Board + CPU Cards
= AXS101: CPU Card with ARC700 in silicon @ 700 MHz
= AXS103: CPU Card with HS38x in FPGA
- refactoring of ARCompact port to accomodate new ARCv2 ISA
- misc updates/cleanups
* tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (72 commits)
ARC: Fix build failures for ARCompact in linux-next after ARCv2 support
ARCv2: Allow older gcc to cope with new regime of ARCv2/ARCompact support
ARCv2: [vdk] dts files and defconfig for HS38 VDK
ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores
ARC: [axs101] Prepare for AXS103
ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores
ARCv2: All bits in place, allow ARCv2 builds
ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)
ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock
ARC: Reduce bitops lines of code using macros
ARCv2: barriers
arch: conditionally define smp_{mb,rmb,wmb}
ARC: add smp barriers around atomics per Documentation/atomic_ops.txt
ARC: add compiler barrier to LLSC based cmpxchg
ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution
ARCv2: SMP: clocksource: Enable Global Real Time counter
ARCv2: SMP: ARConnect debug/robustness
ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al
ARC: make plat_smp_ops weak to allow over-rides
ARCv2: clocksource: Introduce 64bit local RTC counter
...
- CONFIG_ARC_UBOOT_SUPPORT to handle arguments passed in r0, r1, r2
- CONFIG_DEVTMPFS_MOUNT for mouting rootfs since it uses external cpio
for rootfs
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Ruud Derwig <rderwig@synopsys.com>
[vgupta: folded the Main baord DT files for smp/up into one]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
L2 cache on ARCHS processors is called SLC (System Level Cache)
For working DMA (in absence of hardware assisted IO Coherency) we need
to manage SLC explicitly when buffers transition between cpu and
controllers.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Caveats about cache flush on ARCv2 based cores
- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
need to setup PTAG reg
- icache is still VIPT but only aliasing configs need PTAG setup
So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- Remove the ifdef'ery and write distinct versions for each mmu ver even
if there is some code duplication
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
That is because __after_dc_op() already reads it for status check, so it
is better anyways to use that "newer" value.
Also reduces the clutter in callers for passing from/to these routines.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).
In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().
Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.
faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.
This patch is based on a patch from Thomas Gleixner.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 609838cfed ("mm: invoke oom-killer from remaining unconverted page
fault handlers") converted arc to call pagefault_out_of_memory(), so remove
the comment about future conversion.
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
INV cmd for dcache provides 2 modes discard or wback-before-discard.
One is default and other needs to be set, if so desired. This is common
for line-op/entire-cache-op. So refactor them out into a helper
Doesn't affect generated code but paves way for any common
micro-optimization.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* print aliasing or not, VIPT/PIPT etc
* compress param storage using bitfields
* more use of IS_ENABLED to de-uglify code
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
icaches are not snooped hence not cohrent in SMP setups which means
kernel has to do cross core calls to ensure the same.
The leaf routine __ic_line_inv_vaddr() now does cross core calls.
__sync_icache_dcache() is affected due to this:
* local dcache line flushed ahead of remote icache inv requests
* can't disable interrupts anymore, since
__ic_line_inv_vaddr()->on_each_cpu() can deadlock.
| WARNING: CPU: 0 PID: 1 at kernel/smp.c:374
| smp_call_function_many+0x25a/0x2c4()
|
| init_kprobes+0x90/0xc8
| register_kprobe+0x1d6/0x510
| __sync_icache_dcache+0x28/0x80
|
| DISABLE IRQ
|
| __ic_line_inv_vaddr
| on_each_cpu
| smp_call_function_many+0x25a/0x2c4 --> WARN
| __ic_line_inv_vaddr_local
| __dc_line_op
* TODO: Needs to use mask of relevant CPUs to avoid broadcasting
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With commit 9df62f0544 "arch: use ASM_NL instead of ';'" the generic
macros can handle the arch specific newline quirk. Hence we can get rid
of ARC asm macros and use the "C" style macros.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This fixes a subtle issue with cache flush which could potentially cause
random userspace crashes because of stale icache lines.
This error crept in when consolidating the cache flush code
Fixes: bd12976c36 (ARC: cacheflush refactor #3: Unify the {d,i}cache)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 3.13
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
- Add mm_cpumask setting (aggregating only, unlike some other arches)
used to restrict the TLB flush cross-calling
- cross-calling versions of TLB flush routines (thanks to Noam)
Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
------------------>8----------------------
arch/arc/mm/tlb.c: In function ‘do_tlb_overlap_fault’:
arch/arc/mm/tlb.c:688:13: warning: array subscript is above array bounds
[-Warray-bounds]
(pd0[n] & PAGE_MASK)) {
^
------------------>8----------------------
While at it, remove the usless last iteration of outer loop when reading
a TLB SET for duplicate entries.
Suggested-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
switch the args (address, pt_regs) to match with all the other "C"
exception handlers.
This removes the awkwardness in EV_ProtV for page fault vs. unaligned
access.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Line op needs vaddr (indexing) and paddr (tag match). For page sized
flushes (V-P const), each line op will need a different index, but the
tag bits wil remain constant, hence paddr can be setup once outside the
loop.
This improves select LMBench numbers for Aliasing dcache where we have
more "preventive" cache flushing.
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.11-rc7- Linux 3.11.0- 80 4.66 8.88 69.7 112. 268. 8.60 28.0 3489 13.K 27.K # Non alias ARC700
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 68.6 98.5 271. 8.58 28.1 4160 15.K 32.K # Aliasing
3.11-rc7- Linux 3.11.0- 80 4.64 8.51 69.8 99.4 270. 8.73 27.5 3880 15.K 31.K # PTAG loop Inv
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With Line length being constant now, we can fold the 2 helpers into 1.
This allows applying any optimizations (forthcoming) to single place.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC dcache supports 3 ops - Inv, Flush, Flush-n-Inv.
The programming model however provides 2 commands FLUSH, INV.
INV will either discard or flush-n-discard (based on DT_CTRL bit)
The leaf helper __dc_line_loop() used to take the AUX register
(corresponding to the 2 commands). Now we push that to within the
helper, paving way for code consolidations to follow.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
task's "active_mm". ARC vmalloc fault handler however was using mm.
A vmalloc fault for non user task context (actually pre-userland, from
init thread's open for /dev/console) caused the handler to deref NULL mm
(for mm->pgd)
The reasons it worked so far is amazing:
1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
In SMP that MMU register is repurposed hence need for mm pointer deref.
2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
pre-userland code path - it was introduced with commit 20bafb3d23
"n_tty: Move buffers into n_tty_data"
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: stable@vger.kernel.org #3.10 and 3.11
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All arches do essentially the same thing now for
early_init_dt_setup_initrd_arch, so it can now be removed.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved. They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.
This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.
Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.
Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM. OOM
handling is restricted to user triggered faults that have no other
option.
Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.
Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.
This patch:
Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.
Now that all fault handlers call into the generic OOM killer (see commit
609838cfed97: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding the
entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSL12LAAoJEEFnBt12D9kB64gP/RBipnYbo3RPanHg+lE/J1V7
KSVFNGKWJHxTg47VVC1YJGIG21jqxAilpdS2MQL5FP7iyd+IzvtHpQiJgp+2G+pq
di06yrdyrYErxRgZgGQi8IpR538ZzOEVLCKJGdb09YelkRzPT5au7CC1MAsX3qco
yba7PHk0/Nc4hZE4aGbgR1DlRmn86ob7mM0KFE/LORaSN2BueMgWcwKhQXYNGyoh
assX4yNhAbUG6Bgw7paBLDGqHh8c5Ei5AppU8yPb+N094jgYHBJryUoDlzzUHD23
qqiEqHhUKT0TpgHNs8KH0WZFugcmjKvYEbzdzadBxqfXnJN4fKSEcdfF3iz4T14j
U6EZks89GoHwA523OghUZkKNOqlsUdWfdKz+8/grQqKisYwDcf3fCxEYk/4weDCQ
b6fFlOv6+AI3btjXp6F511ZKxyT4ZZzkHjp/ZSrhBygyamNZfax0ma0j+ZS9AZql
kPxQS0nOve6NKaP7vXxMmW5sGMnL19ER/Hm31wthGcWI43GVebUdklnzfGaEeSjs
pmP8oiCNemceqVpiPKxcOxiguf/eyIjP1SFXbguASygUmQeTDbbJ8n1FYznCitue
xJgWttKWsEf/aMR3eJtQ3aBmHR3rijAV4E28Wlq8XMkocwvpQm2zMocS2Z5BJ80S
hi1kQVy8+RxNX96tOSp1
=GSWl
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree core updates from Grant Likely:
"Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding
the entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either"
Tim Bird questions whether the boot time cost of the random feeding may
be noticeable. And "add_device_randomness()" is definitely not some
speed deamon of a function.
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
of/platform: add error reporting to of_amba_device_create()
irq/of: Fix comment typo for irq_of_parse_and_map
of: Feed entire flattened device tree into the random pool
of/fdt: Clean up casting in unflattening path
of/fdt: Remove duplicate memory clearing on FDT unflattening
gpio: implement gpio-ranges binding document fix
of: call __of_parse_phandle_with_args from of_parse_phandle
of: introduce of_parse_phandle_with_fixed_args
of: move of_parse_phandle()
of: move documentation of of_parse_phandle_with_args
of: Fix missing memory initialization on FDT unflattening
of: consolidate definition of early_init_dt_alloc_memory_arch()
of: Make of_get_phy_mode() return int i.s.o. const int
include: dt-binding: input: create a DT header defining key codes.
of/platform: Staticize of_platform_device_create_pdata()
of: Specify initrd location using 64-bit
dt: Typo fix
OF: make of_property_for_each_{u32|string}() use parameters if OF is not enabled
This helps remove asid-to-mm reverse map
While mm->context.id contains the ASID assigned to a process, our ASID
allocator also used asid_mm_map[] reverse map. In a new allocation
cycle (mm->ASID >= @asid_cache), the Round Robin ASID allocator used this
to check if new @asid_cache belonged to some mm2 (from prev cycle).
If so, it could locate that mm using the ASID reverse map, and mark that
mm as unallocated ASID, to force it to refresh at the time of switch_mm()
However, for SMP, the reverse map has to be maintained per CPU, so
becomes 2 dimensional, hence got rid of it.
With reverse map gone, it is NOT possible to reach out to current
assignee. So we track the ASID allocation generation/cycle and
on every switch_mm(), check if the current generation of CPU ASID is
same as mm's ASID; If not it is refreshed.
(Based loosely on arch/sh implementation)
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ASID allocation changes/1
This patch does 2 things:
(1) get_new_mmu_context() NOW moves mm->ASID to a new value ONLY if it
was from a prev allocation cycle/generation OR if mm had no ASID
allocated (vs. before would unconditionally moving to a new ASID)
Callers desiring unconditional update of ASID, e.g.local_flush_tlb_mm()
(for parent's address space invalidation at fork) need to first force
the parent to an unallocated ASID.
(2) get_new_mmu_context() always sets the MMU PID reg with unchanged/new
ASID value.
The gains are:
- consolidation of all asid alloc logic into get_new_mmu_context()
- avoiding code duplication in switch_mm() for PID reg setting
- Enables future change to fold activate_mm() into switch_mm()
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-Asm code already has values of SW and HW ASID values, so they can be
passed to the printing routine.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This reorganizes the current TLB operations into psuedo-ops to better
pair with MMUv4's native Insert/Delete operations
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With previous commit freeing up PTE bits, reassign them so as to:
- Match the bit to H/w counterpart where possible
(e.g. MMUv2 GLOBAL/PRESENT, this avoids a shift in create_tlb())
- Avoid holes in _PAGE_xxx definitions
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The current ARC VM code has 13 flags in Page Table entry: some software
(accesed/dirty/non-linear-maps) and rest hardware specific. With 8k MMU
page, we need 19 bits for addressing page frame so remaining 13 bits is
just about enough to accomodate the current flags.
In MMUv4 there are 2 additional flags, SZ (normal or super page) and WT
(cache access mode write-thru) - and additionally PFN is 20 bits (vs. 19
before for 8k). Thus these can't be held in current PTE w/o making each
entry 64bit wide.
It seems there is some scope of compressing the current PTE flags (and
freeing up a few bits). Currently PTE contains fully orthogonal distinct
access permissions for kernel and user mode (Kr, Kw, Kx; Ur, Uw, Ux)
which can be folded into one set (R, W, X). The translation of 3 PTE
bits into 6 TLB bits (when programming the MMU) can be done based on
following pre-requites/assumptions:
1. For kernel-mode-only translations (vmalloc: 0x7000_0000 to
0x7FFF_FFFF), PTE additionally has PAGE_GLOBAL flag set (and user
space entries can never be global). Thus such a PTE can translate
to Kr, Kw, Kx (as appropriate) and zero for User mode counterparts.
2. For non global entries, the PTE flags can be used to create mirrored
K and U TLB bits. This is true after commit a950549c67
"ARC: copy_(to|from)_user() to honor usermode-access permissions"
which ensured that user-space translations _MUST_ have same access
permissions for both U/K mode accesses so that copy_{to,from}_user()
play fair with fault based CoW break and such...
There is no such thing as free lunch - the cost is slightly infalted
TLB-Miss Handlers.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* reduce editor lines taken by pt_regs
* ARCompact ISA specific part of TLB Miss handlers clubbed together
* cleanup some comments
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
After the recent cleanups, all the exception handlers now have same
boilerplate prologue code. Move that into common macro.
This reduces readability but helps greatly with sharing / duplicating
entry code with ARCv2 ISA where the handlers are pretty much the same,
just the entry prologue is different (due to hardware assist).
Also while at it, add the missing FAKE_RET_FROM_EXCPN calls in couple of
places to drop down to pure kernel mode (from exception mode) before
jumping off into "C" code.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
On some PAE architectures, the entire range of physical memory could reside
outside the 32-bit limit. These systems need the ability to specify the
initrd location using 64-bit numbers.
This patch globally modifies the early_init_dt_setup_initrd_arch() function to
use 64-bit numbers instead of the current unsigned long.
There has been quite a bit of debate about whether to use u64 or phys_addr_t.
It was concluded to stick to u64 to be consistent with rest of the device
tree code. As summarized by Geert, "The address to load the initrd is decided
by the bootloader/user and set at that point later in time. The dtb should not
be tied to the kernel you are booting"
More details on the discussion can be found here:
https://lkml.org/lkml/2013/6/20/690https://lkml.org/lkml/2012/9/13/544
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
A few remaining architectures directly kill the page faulting task in an
out of memory situation. This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.
Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill. Convert
the remaining architectures over to this hook.
To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...
Prepare for removing num_physpages and simplify mem_init().
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> # for arch/arc
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it. With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().
With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Address more review comments from last round of code review.
1) Enhance free_reserved_area() to support poisoning freed memory with
pattern '0'. This could be used to get rid of poison_init_mem()
on ARM64.
2) A previous patch has disabled memory poison for initmem on s390
by mistake, so restore to the original behavior.
3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:
arch/arm/mm/init.c: In function 'mem_init':
arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
^
In file included from include/linux/mman.h:4:0,
from arch/arm/mm/init.c:15:
include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
In file included from arch/mips/include/asm/page.h:49:0,
from include/linux/mmzone.h:20,
from include/linux/gfp.h:4,
from include/linux/mm.h:8,
from mm/page_alloc.c:18:
arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
mm/page_alloc.c: In function 'free_area_init_nodes':
mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
Also address some minor code review comments.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/arc uses of the __cpuinit macros from
all C files. Currently arc does not have any __CPUINIT used in
assembly files.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
LOAD_FAULT_PTE macro is expected to set r2 with faulting vaddr.
However in case of CONFIG_ARC_DBG_TLB_MISS_COUNT, it was getting
clobbered with statistics collection code.
Fix latter by using a different register.
Note that only I-TLB Miss handler was potentially affected.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* No need to check for READ access in I-TLB Miss handler
* Redundant PAGE_PRESENT update in PTE
Post TLB entry installation, in updating PTE for software accessed/dity
bits, no need to update PAGE_PRESENT since it will already be set.
Infact the entry won't have installed if !PAGE_PRESENT.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With ECR now part of pt_regs
* No need to propagate from lowest asm handlers as arg
* No need to save it in tsk->thread.cause_code
* Avoid bit chopping to access the bit-fields
More code consolidation, cleanup
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This can be ascertained within do_page_fault() since it gets the full
ECR (Exception Cause Register).
Further, for both the callers of do_page_fault(): Prot-V / D-TLB-Miss,
the cause sub-fields in ECR are same for same type of access, making the
code much more simpler.
D-TLB-Miss [LD] 0x00_21_01_00
Prot-V [LD] 0x00_23_01_00
^^
D-TLB-Miss [ST] 0x00_21_02_00
Prot-V [ST] 0x00_23_02_00
^^
D-TLB-Miss [EX] 0x00_21_03_00
Prot-V [EX] 0x00_23_03_00
^^
This helps code consolidation, which is even better when moving code from
assembler to "C".
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Non-congruent SRC page in copy_user_page() is dcache clean in the end -
so record that fact, to avoid a subsequent extraneous flush.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* Number of (i|d)cache ways can be retrieved from BCRs and hence no need
to cross check with with built-in constants
* Use of IS_ENABLED() to check for a Kconfig option
* is_not_cache_aligned() not used anymore
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* Move the various sub-system defines/types into relevant files/functions
(reduces compilation time)
* move CPU specific stuff out of asm/tlb.h into asm/mmu.h
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.
So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()
The intention was to have the equivalent of following:
if (a || (1 ? b : 0))
but we ended up with following:
if (a || 1 ? b : 0)
And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of <a>
Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
be missed if U/K mappings were congruent.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This manifested as grep failing psuedo-randomly:
-------------->8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
inet 127.0.0.1/8 scope host lo
-------------->8---------------------
ARC700 MMU provides fully orthogonal permission bits per page:
Ur, Uw, Ux, Kr, Kw, Kx
The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.
1. Read access to an anon mapped page in libc .bss: write-protected
zero_page mapped: TLB Entry installed with Ur + K[rwx]
2. grep calls libc:getc() -> buffered read layer calls read(2) with the
internal read buffer in same .bss page.
The read() call is on STDIN which has been redirected to a pipe.
read(2) => sys_read() => pipe_read() => copy_to_user()
3. Since page has Kernel-write permission (despite being user-mode
write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
Exception (page-fault for ARC). core-MM is unaware that kernel
erroneously wrote to the reserved read-only zero-page (BUG #1)
4. Control returns to userspace which now does a write to same .bss page
Since Linux MM is not aware that page has been modified by kernel, it
simply reassigns a new writable zero-init page to mapping, loosing the
prior write by kernel - effectively zero'ing out the libc read buffer
under the hood - hence grep doesn't see right data (BUG #2)
The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.
The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.
Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.
With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Flush and INVALIDATE the dcache page.
This helper is only used for writeback of CODE pages to memory. So
there's no value in keeping the dcache lines around. Infact it is risky
as a writeback on natural eviction under pressure can cause un-needed
writeback with weird issues on aliasing dcache configurations.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
I'm satisified with testing, specially with fuse which has historically given
grief to VIPT arches (ARM/PARISC...)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRjPVlAAoJEGnX8d3iisJeF60P/36JELOrdTOJB8GDn/i22yu4
zxFrroH4EpXCobe4JzrmhXU0vdMKmyhaRsdmQt2pV474LX1ajQeoRi6n7xuiYymr
p0H8/jde7ZDHGxJRP9OIuIIU57N+sVwQvrXvfnz/dc4c92MD9EWEwvoytnCVuKSx
k/YIQ5oDj6hFH13V68eXXs+KBrXyPiYO9UIWYmwRZv/0Bm6P1EpXQSMWzeCP0X/y
FHVEc4i92bIkqpOadUnhCBHGZqjkrhwFL2FCI35/12TgSwjPU1Q6IJCtH7Lg9ygI
oFqut5wN+dhkxlfiyvgTXErmnvhDOHLbMQJYC9svHin8TtfznQhJDAWYeSH/6Mde
/hE/bXV55ucdJ48ZT6JRvxHeJQgTAvbXDIIQYe/wAJEvXidWEo4Y/qRTIXP03Ixm
Ie69d9WSmUlx4FmYxBQodDQFK9slnc+0UfsWcCG+OrT0wwrKBRr3To2WYEFowQer
fpIk6/68+LiQK+IzFAhPtBppSgcQoEBsXAD/MDKsQcRk84W1nKPt6SiT/wCAzOZ7
ezTL1eSlfzWXNbtohdm8xN8frt/4fZLZTaZkqtnMPBi0iIC5DAsJy27YlBQDaRd/
Hzd0y6bJQDcsjjWmZuUkFZVbCyYlE0CIW6sbHC91i51egKReYHy7T6Ef/n+qn+ho
jxohq5/JvG+0Vha0Q2h6
=r4lw
-----END PGP SIGNATURE-----
Merge tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull second set of arc arch updates from Vineet Gupta:
"Aliasing VIPT dcache support for ARC
I'm satisified with testing, specially with fuse which has
historically given grief to VIPT arches (ARM/PARISC...)"
* tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [TB10x] Remove GENERIC_GPIO
ARC: [mm] Aliasing VIPT dcache support 4/4
ARC: [mm] Aliasing VIPT dcache support 3/4
ARC: [mm] Aliasing VIPT dcache support 2/4
ARC: [mm] Aliasing VIPT dcache support 1/4
ARC: [mm] refactor the core (i|d)cache line ops loops
ARC: [mm] serious bug in vaddr based icache flush
* Support for two new platforms based on ARC700
- Abilis TB10x SoC [Chritisian/Pierrick]
- Simulator only System-C Model [Mischa]
* ARC specific MM improvements
- Avoid full TLB flush (ASID increment) on munmap (even single page)
- VIPT Cache Flushing improvements
+ Delayed dcache flush for non-aliasing dcache (big performance boost)
+ icache flush aliasing agnostic (no need to kill all possible aliases)
* Others
- Avoid needless rebuild of DTB files for every kernel build
- Remove builtin cmdline as that is already provided by DeviceTree/bootargs
- Fixing unaligned access emulation corner case
- checkpatch fixes [Sachin]
- Various fixlets [Noam]
- Minor build failures/cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRiydEAAoJEGnX8d3iisJewBkQAJ/cvIjrIuMMdeDo0bokzZN2
bfsG8U+V7S0CKjqLyUD1bMRXo7rTgus8hp/klVORRXoAwSKiWhkj0p6dqJjCGUGc
LqtEPlxaLR1X+pe5wKtpB3j6kTpRwictVhFUqkFACUxGZx1GbYFxdeL8+3oDsepW
lwidBYgoya+Q4puRfmY/sCTtVhJlwTUi6g0lmpaEPWc3T2s83/u1GnlVBYd9B7yA
fCYkUceC2ZnuRMfH00sRsjQ3USyyYptGpY8U1nv9STFJ3fC+EestBPppAAwAzcm0
iiRgo3s314EzfeNRgzjjHrbULnVUJWkrdFXPisWepyPyVjsgnVr84YkO3kiEN6l7
JM1cUCvJUbKZaOb2iUwrlPOqISNjCyY9QZWwqW6PCG3TBpfQsJM2mnXKnPLvV2v2
5Ttba4+ETZ3rn4pPIuTeC6REr0aV/Zl1LKeWuk8PXz4aljWrlOrifBN6QkXWr4HU
4z6X3j2nw4T2LzqwbNYF+xLaDaZZpV8UdvQwOkliCxZ04myx135ImvgOim7Hh9j+
Ow1Jp9mRZha/44qM3jXdZ6Cv55pEOR4oQJsl6OBNUXgb5bnDcFHz1UpZtzoZ2V+s
RPozsfnleNXxDIJlCGK96+PG5qVTRQvklowsz6aTP8r+EEbQnrRlnrhi82cQWqnM
sMxzN320zrt/MWXxec89
=WeDp
-----END PGP SIGNATURE-----
Merge tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC port updates from Vineet Gupta:
"Support for two new platforms based on ARC700:
- Abilis TB10x SoC [Chritisian/Pierrick]
- Simulator only System-C Model [Mischa]
ARC specific MM improvements:
- Avoid full TLB flush (ASID increment) on munmap (even single page)
- VIPT Cache Flushing improvements
+ Delayed dcache flush for non-aliasing dcache (big performance boost)
+ icache flush aliasing agnostic (no need to kill all possible aliases)
Others:
- Avoid needless rebuild of DTB files for every kernel build
- Remove builtin cmdline as that is already provided by DeviceTree/bootargs
- Fixing unaligned access emulation corner case
- checkpatch fixes [Sachin]
- Various fixlets [Noam]
- Minor build failures/cleanups"
* tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (35 commits)
ARC: [mm] Lazy D-cache flush (non aliasing VIPT)
ARC: [mm] micro-optimize page size icache invalidate
ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers
ARC: [mm] consolidate icache/dcache sync code
ARC: [mm] optimise icache flush for kernel mappings
ARC: [mm] optimise icache flush for user mappings
ARC: [mm] optimize needless full mm TLB flush on munmap
ARC: Add support for nSIM OSCI System C model
ARC: [TB10x] Adapt device tree to new compatible string
ARC: [TB10x] Add support for TB10x platform
ARC: [TB10x] Device tree of TB100 and TB101 Development Kits
ARC: Prepare interrupt code for external controllers
ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy
ARC: [cmdline] Don't overwrite u-boot provided bootargs
ARC: [cmdline] Remove CONFIG_CMDLINE
ARC: [plat-arcfpga] defconfig update
ARC: unaligned access emulation broken if callee-reg dest of LD/ST
ARC: unaligned access emulation error handling consolidation
ARC: Debug/crash-printing Improvements
ARC: fix typo with clock speed
...
This is the meat of the series which prevents any dcache alias creation
by always keeping the U and K mapping of a page congruent.
If a mapping already exists, and other tries to access the page, prev
one is flushed to physical page (wback+inv)
Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
of a page, but try to defer flushing, unless U-mapping exist.
When page is actually mapped to userspace, update_mmu_cache() flushes
the K-mapping (in certain cases this can be optimised out)
Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
handle the puring of stale userspace mappings on exit/munmap...
flush_anon_page() handles the existing U-mapping for anon page before
kernel reads it via the GUP path.
Note that while not complete, this is enough to boot a simple
dynamically linked Busybox based rootfs
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>