This is the meat of the series which prevents any dcache alias creation
by always keeping the U and K mapping of a page congruent.
If a mapping already exists, and other tries to access the page, prev
one is flushed to physical page (wback+inv)
Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
of a page, but try to defer flushing, unless U-mapping exist.
When page is actually mapped to userspace, update_mmu_cache() flushes
the K-mapping (in certain cases this can be optimised out)
Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
handle the puring of stale userspace mappings on exit/munmap...
flush_anon_page() handles the existing U-mapping for anon page before
kernel reads it via the GUP path.
Note that while not complete, this is enough to boot a simple
dynamically linked Busybox based rootfs
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This preps the low level dcache flush helpers to take vaddr argument in
addition to the existing paddr to properly flush the VIPT dcache
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Nothing semantical
* simplify the alignement code by using & operation only
* rename variables clearly as paddr
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
vaddr used to index the cache was clipped from the wrong end, and thus
would potentially fail to flush the correct lines.
The problem was dorment for so long because up until the recent
optimizations it was only used for ptrace break-point only flushes.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
flush_dcache_page( ) is MM hook to ensure that a page has consistent
views between kernel and userspace. Thus it is called when
* kernel writes to a page which at some later point could get mapped to
userspace (so kernel mapping needs to be flushed-n-inv)
* kernel is about to read from a page with possible userspace mappings
(so userspace mappings needs to be made coherent with kernel ones)
However for Non aliasing VIPT dcache, any userspace mapping will always
be congruent to kernel mapping. Thus d-cache need need not be flushed at
all (or delayed indefinitely).
The only reason it does need to be flushed is when mapping code pages.
Since icache doesn't snoop dcache, those dirty dcache lines need to be
written back to memory and icache line invalidated so that icache lines
fetch will get the right data.
Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.
(1) FPGA @ 80 MHZ
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc6-a Linux 3.9.0-r 80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
3.9-rc6-b Linux 3.9.0-r 80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
3.9-rc7-c Linux 3.9.0-r 80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
^^^^ ^^^^ ^^^
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc6-a Linux 3.9.0-r 317.8 204.2 1122.3 375.1 3522.0 4.288 20.7 126.8
3.9-rc6-b Linux 3.9.0-r 298.7 223.0 1141.6 367.8 3531.0 4.866 20.9 126.4
3.9-rc7-c Linux 3.9.0-r 278.4 179.2 862.1 339.3 3705.0 3.223 20.3 126.6
^^^^^ ^^^^^ ^^^^^ ^^^^
(2) Customer Silicon @ 500 MHz (166 MHz mem)
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
abilis-ba Linux 3.9.0-r 497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
abilis-ca Linux 3.9.0-r 497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
^^^^ ^^^^ ^^^
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
start address is already page aligned and size is const PAGE_SIZE,
thus fixups for alignment not needed in generated code.
bloat-o-meter vmlinux-mm5 vmlinux
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32)
function old new delta
__inv_icache_page 82 50 -32
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Now that we have same helper used for all icache invalidates (i.e.
vaddr+paddr based exact line invalidate), consolidate the open coded
calls into one place.
Also rename flush_icache_range_vaddr => __sync_icache_dcache
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This change continues the theme from prev commit - this time icache
handling for kernel's own code modification (vmalloc: loadable modules,
breakpoints for kprobes/kgdb...)
flush_icache_range() calls the CDU icache helper with vaddr to enable
exact line invalidate.
For a true kernel-virtual mapping, the vaddr is actually virtual hence
valid as index into cache. For kprobes breakpoint however, the vaddr arg
is actually paddr - since that's how normal kernel is mapped in ARC
memory map. This implies that CDU will use the same addr for
indexing as for tag match - which is fine since kernel code would only
have that "implicit" mapping and none other.
This should speed up module loading significantly - specially on default
ARC700 icache configurations (32k) which alias.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC icache doesn't snoop dcache thus executable pages need to be made
coherent before mapping into userspace in flush_icache_page().
However ARC700 CDU (hardware cache flush module) requires both vaddr
(index in cache) as well as paddr (tag match) to correctly identify a
line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
the paddr only based flush_icache_page() API couldn't be implemented
efficiently. It had to loop thru all possible alias indexes and perform
the invalidate operation (ofcourse the cache op would only succeed at
the index(es) where tag matches - typically only 1, but the cost of
visiting all the cache-bins needs to paid nevertheless).
Turns out however that the vaddr (along with paddr) is available in
update_mmu_cache() hence better suits ARC icache flush semantics.
With both vaddr+paddr, exactly one flush operation per line is done.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
!CONFIG_ARC_HAS_(I|D)CACHE makes Linux disable caches (assuming they
exist in hardware) - mostly for debugging issues with new peripherals.
However, independent of CONFIG_ARC_HAS_(I|D)CACHE, Linux also needs to
handle, non-existant caches, using the information in Cache BCRs (Build
Configuration Reg)
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC700 has VIPT L1 Caches
* Caches don't snoop and are not coherent
* Given the PAGE_SIZE and Cache associativity, we don't support aliasing
D$ configurations (yet), but do allow aliasing I$ configs
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>