Commit Graph

20 Commits

Author SHA1 Message Date
Chen Jie 615eb603f4 MIPS: csum_partial: Improve instruction parallelism.
Computing sum introduces true data dependency. This patch removes some
true data depdendencies, hence increases instruction level parallelism.

This patch brings up to 50% csum performance gain on Loongson 3a.

One example about how this patch works is in CSUM_BIGCHUNK1:
// ** original **    vs    ** patch applied **
    ADDC(sum, t0)           ADDC(t0, t1)
    ADDC(sum, t1)           ADDC(t2, t3)
    ADDC(sum, t2)           ADDC(sum, t0)
    ADDC(sum, t3)           ADDC(sum, t2)

In the original implementation, each ADDC(sum, ...) depends on the sum
value updated by previous ADDC(as source operand).

With this patch applied, the first two ADDC operations are independent,
hence can be executed simultaneously if possible.

Another example is in the "copy and sum calculating chunk":
// ** original **    vs    ** patch applied **
    STORE(t0, UNIT(0) ...   STORE(t0, UNIT(0) ...
    ADDC(sum, t0)           ADDC(t0, t1)
    STORE(t1, UNIT(1) ...   STORE(t1, UNIT(1) ...
    ADDC(sum, t1)           ADDC(sum, t0)
    STORE(t2, UNIT(2) ...   STORE(t2, UNIT(2) ...
    ADDC(sum, t2)           ADDC(t2, t3)
    STORE(t3, UNIT(3) ...   STORE(t3, UNIT(3) ...
    ADDC(sum, t3)           ADDC(sum, t2)

With this patch applied, ADDC and the **next next** ADDC are independent.

Signed-off-by: chenj <chenj@lemote.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9608/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-04-01 17:22:11 +02:00
Chen Jie 3c09bae43b MIPS: Use WSBH/DSBH/DSHD on Loongson 3A
Signed-off-by: chenj <chenj@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: chenhc@lemote.com
Patchwork: https://patchwork.linux-mips.org/patch/7542/
Patchwork: https://patchwork.linux-mips.org/patch/7550/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-09-22 13:35:46 +02:00
Maciej W. Rozycki 44ba138f55 MIPS: csum_partial.S CPU_DADDI_WORKAROUNDS bug fix
This change reverts most of commit
60724ca59e [MIPS: IP checksums: Remove
unncessary .set pseudos] that introduced warnings with the
CPU_DADDI_WORKAROUNDS option set:

arch/mips/lib/csum_partial.S: Assembler messages:
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:467: Warning: used $3 with ".set at=$3"
[...]
arch/mips/lib/csum_partial.S:577: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:577: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:577: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:601: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:601: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:601: Warning: used $3 with ".set at=$3"
arch/mips/lib/csum_partial.S:601: Warning: used $3 with ".set at=$3"
[and so on, and so on...]

The warnings are benign and good code is produced regardless because no
macros that'd use the assembler's temporary register are involved, however
the `.set noat' directives removed by the commit referred are crucial to
guarantee this is still going to be the case after any changes in the
future.  Therefore they need to be brought back to place which this
change does.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6686/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-05-13 00:29:39 +02:00
Markos Chandras 6f85cebe49 MIPS: lib: csum_partial: Add EVA support
Use EVA specific functions to read and write data to
user address space.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2014-03-26 23:09:17 +01:00
Markos Chandras e89fb56c8b MIPS: lib: csum_partial: Add macro to build csum_partial symbols
In preparation for EVA support, we use a macro to build the
__csum_partial_copy_user main code so it can be shared across
multiple implementations. EVA uses the same code but it replaces
the load/store/prefetch instructions with the EVA specific ones
therefore using a macro avoids unnecessary code duplications.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2014-03-26 23:09:17 +01:00
Markos Chandras 2ab82e6648 MIPS: lib: csum_partial: Merge EXC and load/store macros
Each load/store macro always adds an entry to the __ex_table
using the EXC macro. There are cases where a load instruction may
never fail such as when we are sure the load happens in the kernel
address space. Therefore, we merge these the EXC and LOADX/STOREX
macros into a single one. We also expand the argument list in the EXC
macro to make the macro more flexible. The extra 'type' argument is not
used by this commit, but it will be used when EVA support is added to
memcpy.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2014-03-26 23:09:17 +01:00
Markos Chandras ac85227f76 MIPS: checksum: Split the 'copy_user' symbol
The 'copy_user' symbol can be used to copy from or to
userland so we will use two different symbols for these
operations. This makes no difference in the existing code,
but when the core is operating in EVA mode, different instructions
need to be used to read and write to userland address space.
The old function has also been renamed to 'copy_kernel' to denote
that it is suitable for copy data to and from kernel space.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2014-03-26 23:09:17 +01:00
Gabor Juhos e744109fce MIPS: Use CONFIG_CPU_MIPSR2 in csum_partial.S
The csum_partial implementation contain optimalizations for the MIPS R2
instruction set. This optimization is never enabled however because the
if directive uses the CPU_MIPSR2 constant which is not defined anywhere.

Use the CONFIG_CPU_MIPSR2 constant instead.

Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4971/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-03-12 10:18:18 +01:00
Ralf Baechle 7034228792 MIPS: Whitespace cleanup.
Having received another series of whitespace patches I decided to do this
once and for all rather than dealing with this kind of patches trickling
in forever.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-02-01 10:00:22 +01:00
Ralf Baechle b65a75b8c9 MIPS: IP checksums: Optimize adjust of sum on buffers of odd alignment.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-10-11 16:18:53 +01:00
Ralf Baechle 60724ca59e MIPS: IP checksums: Remove unncessary .set pseudos
They possibly silence meaningful warnings ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-10-11 16:18:53 +01:00
Ralf Baechle d86a8123b1 MIPS: IP checksums: Remove unncessary folding of sum to 16 bit.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-10-11 16:18:53 +01:00
Atsushi Nemoto b80a1b8081 [MIPS] Fix 64-bit IP checksum code
Use unsigned loads to avoid possible misscalculation of IP checksums.  This
bug was instruced in f761106cd728bcf65b7fe161b10221ee00cf7132 (lmo) /
ed99e2bc1d (kernel.org).

[Original fix by Atsushi.  Improved instruction scheduling and fix for
unaligned unsigned load by me -- Ralf]

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-09-21 14:52:56 +02:00
Ralf Baechle c5ec1983e4 [MIPS] Eleminate local symbols from the symbol table.
These symbols appear in oprofile output, stacktraces and similar but only
make the output harder to read.  Many identical symbol names such as
"both_aligned" were also being used in multiple source files making it
impossible to see which file actually was meant.  So let's get rid of them.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-01-29 10:14:59 +00:00
Maciej W. Rozycki 619b6e18fc [MIPS] R4000/R4400 daddiu erratum workaround
This complements the generic R4000/R4400 errata workaround code and adds 
bits for the daddiu problem.  In most places it just modifies handwritten 
assembly code so that the assembler is allowed to use a temporary register 
as daddiu may now be treated as a macro that expands to a sequence of li 
and daddu.  It is the AT register or, where AT is unavailable or used 
explicitly for another purpose, an explicitly-named register is selected, 
using the .set at=<reg> feature added recently to gas.  This feature is 
only used if CONFIG_CPU_DADDI_WORKAROUNDS has been set, so if the 
workaround remains disabled, the required version of binutils stays 
unchanged.

 Similarly, daddiu instructions put in branch delay slots in noreorder 
fragments are now taken out of them and the assembler is allowed to 
reorder them itself as possible (which it does making the whole idea of 
scheduling them into delay slots manually questionable).

 Also in the very few places where such a simple conversion was not 
possible, a handcoded longer sequence is implemented.

 Other than that there are changes to code responsible for building the 
TLB fault and page clear/copy handlers to avoid daddiu as appropriate.  
These are only effective if the erratum is verified to be present at the 
run time.

 Finally there is a trivial update to __delay(), because it uses daddiu in 
a branch delay slot.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-01-29 10:14:55 +00:00
Atsushi Nemoto f860c90bd6 [MIPS] csum_partial and copy in parallel
Implement optimized asm version of csum_partial_copy_nocheck,
csum_partial_copy_from_user and csum_and_copy_to_user which can do
calculate and copy in parallel, based on memcpy.S.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-01-08 21:41:04 +00:00
Atsushi Nemoto ed99e2bc1d [MIPS] Optimize csum_partial for 64bit kernel
Make csum_partial 64-bit powered.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-09 01:03:59 +00:00
Atsushi Nemoto 773ff78838 [MIPS] Optimize flow of csum_partial
Delete dead codes at end of the function and move small_csumcopy
there.  This makes some labels (maybe_end_cruft, small_memcpy,
end_bytes, out) needless and eliminates some branches.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-09 01:03:59 +00:00
Atsushi Nemoto 52ffe760ea [MIPS] Make csum_partial more readable
Use standard o32 register name instead of T0, T1, etc, like memcpy.S.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-09 01:03:59 +00:00
Atsushi Nemoto 0bcdda0f3a [MIPS] Unify csum_partial.S
The 32-bit version and 64-bit version are almost equal.  Unify them.  This
makes further improvements (for example, copying with parallel, supporting
PREFETCH, etc.) easier.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-12-04 22:43:13 +00:00