OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Will Deacon	c6a771d932	arm64: csum: Disable KASAN for do_csum() do_csum() over-reads the source buffer and therefore abuses READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for READ_ONCE_NOCHECK() becoming a macro, and therefore losing its '__no_sanitize_address' annotation, just annotate do_csum() explicitly and fall back to normal loads. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2020-04-15 21:36:41 +01:00
Robin Murphy	e9c7ddbf8b	arm64: csum: Optimise IPv6 header checksum Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2020-03-09 18:08:25 +00:00
Robin Murphy	c2c24edb1d	arm64: csum: Fix pathological zero-length calls In validating the checksumming results of the new routine, I sadly neglected to test its not-checksumming results. Thus it slipped through that the one case where @buff is already dword-aligned and @len = 0 manages to defeat the tail-masking logic and behave as if @len = 8. For a zero length it doesn't make much sense to deference @buff anyway, so just add an early return (which has essentially zero impact on performance). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2020-01-17 16:05:50 +00:00
Robin Murphy	5777eaed56	arm64: Implement optimised checksum routine Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2020-01-16 15:23:29 +00:00

4 Commits