arm64: rework EL0 MRS emulation

On CPUs without FEAT_IDST, ID register emulation is slower than it needs
to be, as all threads contend for the same lock to perform the
emulation. This patch reworks the emulation to avoid this unnecessary
contention.

On CPUs with FEAT_IDST (which is mandatory from ARMv8.4 onwards), EL0
accesses to ID registers result in a SYS trap, and emulation of these is
handled with a sys64_hook. These hooks are statically allocated, and no
locking is required to iterate through the hooks and perform the
emulation, allowing emulation to occur in parallel with no contention.

On CPUs without FEAT_IDST, EL0 accesses to ID registers result in an
UNDEFINED exception, and emulation of these accesses is handled with an
undef_hook. When an EL0 MRS instruction is trapped to EL1, the kernel
finds the relevant handler by iterating through all of the undef_hooks,
requiring undef_lock to be held during this lookup.

This locking is only required to safely traverse the list of undef_hooks
(as it can be concurrently modified), and the actual emulation of the
MRS does not require any mutual exclusion. This locking is an
unfortunate bottleneck, especially given that MRS emulation is enabled
unconditionally and is never disabled.

This patch reworks the non-FEAT_IDST MRS emulation logic so that it can
be invoked directly from do_el0_undef(). This removes the bottleneck,
allowing MRS traps to be handled entirely in parallel, and is a stepping
stone to making all of the undef_hooks lock-free.

I've tested this in a 64-vCPU VM on a 64-CPU ThunderX2 host, with a
benchmark which spawns a number of threads which each try to read
ID_AA64ISAR0_EL1 1000000 times. This is vastly more contention than will
ever be seen in realistic usage, but clearly demonstrates the removal of
the bottleneck:

  | Threads || Time (seconds)                       |
  |         || Before           || After            |
  |         || Real   | System  || Real   | System  |
  |---------++--------+---------++--------+---------|
  |       1 ||   0.29 |    0.20 ||   0.24 |    0.12 |
  |       2 ||   0.35 |    0.51 ||   0.23 |    0.27 |
  |       4 ||   1.08 |    3.87 ||   0.24 |    0.56 |
  |       8 ||   4.31 |   33.60 ||   0.24 |    1.11 |
  |      16 ||   9.47 |  149.39 ||   0.23 |    2.15 |
  |      32 ||  19.07 |  605.27 ||   0.24 |    4.38 |
  |      64 ||  65.40 | 3609.09 ||   0.33 |   11.27 |

Aside from the speedup, there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221019144123.612388-6-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
This commit is contained in:
Mark Rutland 2022-10-19 15:41:19 +01:00 committed by Will Deacon
parent dbfbd87efa
commit f5962add74
3 changed files with 10 additions and 19 deletions

View File

@ -832,7 +832,8 @@ static inline bool system_supports_tlb_range(void)
cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
}
extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
{

View File

@ -3435,35 +3435,22 @@ int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt)
return rc;
}
static int emulate_mrs(struct pt_regs *regs, u32 insn)
bool try_emulate_mrs(struct pt_regs *regs, u32 insn)
{
u32 sys_reg, rt;
if (compat_user_mode(regs) || !aarch64_insn_is_mrs(insn))
return false;
/*
* sys_reg values are defined as used in mrs/msr instruction.
* shift the imm value to get the encoding.
*/
sys_reg = (u32)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_16, insn) << 5;
rt = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RT, insn);
return do_emulate_mrs(regs, sys_reg, rt);
return do_emulate_mrs(regs, sys_reg, rt) == 0;
}
static struct undef_hook mrs_hook = {
.instr_mask = 0xffff0000,
.instr_val = 0xd5380000,
.pstate_mask = PSR_AA32_MODE_MASK,
.pstate_val = PSR_MODE_EL0t,
.fn = emulate_mrs,
};
static int __init enable_mrs_emulation(void)
{
register_undef_hook(&mrs_hook);
return 0;
}
core_initcall(enable_mrs_emulation);
enum mitigation_state arm64_get_meltdown_state(void)
{
if (__meltdown_safe)

View File

@ -499,6 +499,9 @@ void do_el0_undef(struct pt_regs *regs, unsigned long esr)
if (user_insn_read(regs, &insn))
goto out_err;
if (try_emulate_mrs(regs, insn))
return;
if (call_undef_hook(regs, insn) == 0)
return;