After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).
This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:
arch/arm/lib/changebit.S: Assembler messages:
arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
not allowed for the current base architecture
This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.
This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The bitops functions (e.g. _test_and_set_bit) on ARM do not have unwind
annotations and therefore the kernel cannot backtrace out of them on a
fatal error (for example, NULL pointer dereference).
This patch annotates the bitops assembly macros with UNWIND annotations
so that we can produce a meaningful backtrace on error. Callers of the
macros are modified to pass their function name as a macro parameter,
enforcing that the macros are used as standalone function implementations.
Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The kernel doesn't officially need to interwork, but using BX
wherever appropriate will help educate people into good assembler
coding habits.
BX is appropriate here because this code is predicated on
__LINUX_ARM_ARCH__ >= 6
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Switch the set/clear/change bitops to use the word-based exclusive
operations, which are only present in a wider range of ARM architectures
than the byte-based exclusive operations.
Tested record:
- Nicolas Pitre: ext3,rw,le
- Sourav Poddar: nfs,le
- Will Deacon: ext3,rw,le
- Tony Lindgren: ext3+nfs,le
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Sourav Poddar <sourav.poddar@ti.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add additional instructions to our assembly bitops functions to ensure
that they only operate on word-aligned pointers. This will be necessary
when we switch these operations to use the word-based exclusive
operations.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Before this patch enabling and disabling irqs in assembler code and by
the hardware wasn't tracked completly.
I had to transpose two instructions in arch/arm/lib/bitops.h because
restore_irqs doesn't preserve the flags with CONFIG_TRACE_IRQFLAGS=y
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Mathieu Desnoyers pointed out that the ARM barriers were lacking:
- cmpxchg, xchg and atomic add return need memory barriers on
architectures which can reorder the relative order in which memory
read/writes can be seen between CPUs, which seems to include recent
ARM architectures. Those barriers are currently missing on ARM.
- test_and_xxx_bit were missing SMP barriers.
So put these barriers in. Provide separate atomic_add/atomic_sub
operations which do not require barriers.
Reported-Reviewed-and-Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
save_and_disable_irqs does not need to use mov + msr (which was
introduced to work around a documentation bug which was propagated
into binutils.) Use msr with an immediate constant, and if we're
building for ARMv6 or later, use the new CPSID instruction.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The 'K' extension adds several new instructions to the ARMv6 ISA
which are primerily useful for SMP.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We sometimes forgot to check whether the exclusive store succeeded.
Ensure that we always check. Also ensure that we always use the
out of line versions, since the inline versions are not SMP safe.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If we found that the bit was already in the desired state, we
would skip performing the operation, and write random data back.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>