Commit Graph

25805 Commits

Author SHA1 Message Date
Christophe Leroy f2c45962cc powerpc/8xx: Simplify pte_update() with 16k pages
While looking at code generated for code patching, I saw that
pte_clear generated:

 2d8:	38 a0 00 00 	li      r5,0
 2dc:	38 e0 10 00 	li      r7,4096
 2e0:	39 00 20 00 	li      r8,8192
 2e4:	39 40 30 00 	li      r10,12288
 2e8:	90 a9 00 00 	stw     r5,0(r9)
 2ec:	90 e9 00 04 	stw     r7,4(r9)
 2f0:	91 09 00 08 	stw     r8,8(r9)
 2f4:	91 49 00 0c 	stw     r10,12(r9)

With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.

Now it is:

 2cc:	39 40 00 00 	li      r10,0
 2d0:	91 49 00 00 	stw     r10,0(r9)
 2d4:	91 49 00 04 	stw     r10,4(r9)
 2d8:	91 49 00 08 	stw     r10,8(r9)
 2dc:	91 49 00 0c 	stw     r10,12(r9)

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
2022-11-24 23:31:48 +11:00
Dmitry Torokhov 4e87bd14e5 powerpc/sgy_cts1000: convert to using gpiod API and facelift
This patch converts the driver to newer gpiod API, and away from
OF-specific legacy gpio API that we want to stop using.

While at it, let's address a few more issues:

- switch to using dev_info()/pr_info() and friends
- cancel work when unbinding the driver

Note that the original code handled halt GPIO polarity incorrectly:
in halt callback, when line polarity is "low" it would set trigger to
"1" and drive halt line high, which is counter to the annotation.
gpiod API will drive such line low. However I do not see any DTSes
in mainline that have a DT node with "sgy,gpio-halt" compatible.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YzNNznewTyCJiGFz@google.com
2022-11-24 23:31:48 +11:00
Dmitry Torokhov 1892e87a3e powerpc/warp: switch to using gpiod API
This switches PIKA Warp away from legacy gpio API and to newer gpiod
API, so that we can eventually deprecate the former.

Because LEDs are normally driven by leds-gpio driver, but the
platform code also wants to access the LEDs during thermal shutdown,
and gpiod API does not allow locating GPIO without requesting it,
the platform code is now responsible for locating GPIOs through device
tree and requesting them. It then constructs platform data for
leds-gpio platform device and registers it. This allows platform
code to retain access to LED GPIO descriptors and use them when needed.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YzKSLcrYmV5kjyeX@google.com
2022-11-24 23:31:47 +11:00
Gustavo A. R. Silva 1c4a4a4c84 powerpc/xmon: Fix -Wswitch-unreachable warning in bpt_cmds
When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable into the case that uses it, which silences the warning:

arch/powerpc/xmon/xmon.c: In function ‘bpt_cmds’:
arch/powerpc/xmon/xmon.c:1529:13: warning: statement will never be executed [-Wswitch-unreachable]
 1529 |         int mode;
      |             ^~~~

Fixes: 09b6c1129f ("powerpc/xmon: Fix compile error with PPC_8xx=y")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YySE6FHiOcbWWR+9@work
2022-11-24 23:31:47 +11:00
Kajol Jain 4ac9d3187c powerpc/kvm: Remove unused references for MMCR3/SIER2/SIER3 registers
Commit 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore
in C") removed the PMU save/restore functions from assembly code and
implemented these functions in C, for power9 and later platforms.

After the code refactoring, Performance Monitoring Unit (PMU) registers
became part of "p9_host_os_sprs" structure and now this structure is
used to save/restore pmu host registers, for power9 and later platfroms.
But we still have old unused registers references. Patch removes unused
host_mmcr references for Monitor Mode Control Register 3 (MMCR3)/
Sampled Instruction Event Register 2 (SIER2)/ SIER3 registers from
"struct kvmppc_host_state".

Fixes: 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916105736.268153-3-disgoel@linux.vnet.ibm.com
2022-11-24 23:31:47 +11:00
Disha Goel 2223552256 powerpc/kvm: Remove unused macros from asm-offset
The kvm code was refactored to convert some of kvm assembly routines to C.
This includes commits which moved code path for the kvm guest entry/exit
for p7/8 from aseembly to C. As part of the code changes, usage of some of
the macros were removed. But definitions still exist in the assembly files.
Commits are listed below:

Commit 2e1ae9cd56 ("KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU")
Commit 9769a7fd79 ("KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path")
Commit fae5c9f366 ("KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path")
Commit 57dc0eed73 ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C")

Many of the asm-offset macro definitions were missed to remove. Patch
fixes by removing the unused macros.

Signed-off-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916105736.268153-2-disgoel@linux.vnet.ibm.com
2022-11-24 23:31:47 +11:00
Xiu Jianfeng d87a233717 powerpc/pasemi: Add __init/__exit annotations to module init/exit funcs
Add missing __init/__exit annotations to module init/exit funcs.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220911084344.196353-1-xiujianfeng@huawei.com
2022-11-24 23:31:47 +11:00
Nicholas Piggin b86cf14f24 powerpc: add compile-time support for lbarx, lharx
ISA v2.06 (POWER7 and up) as well as e6500 support lbarx and lharx.
Add a compile option that allows code to use it, and add support in
cmpxchg and xchg 8 and 16 bit values without shifting and masking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220909052312.63916-1-npiggin@gmail.com
2022-11-24 23:31:47 +11:00
Naveen N. Rao 7af82ff90a powerpc/ftrace: Ignore weak functions
Extend commit b39181f7c6 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to
avoid adding weak function") to ppc32 and ppc64 -mprofile-kernel by
defining FTRACE_MCOUNT_MAX_OFFSET.

For ppc64 -mprofile-kernel ABI, we can have two instructions at function
entry for TOC setup followed by 'mflr r0' and 'bl _mcount'. So, the
mcount location is at most the 4th instruction in a function. For ppc32,
mcount location is always the 3rd instruction in a function, preceded by
'mflr r0' and 'stw r0,4(r1)'.

With this patch, and with ppc64le_guest_defconfig and some ftrace/bpf
config items enabled:
  # grep __ftrace_invalid_address available_filter_functions | wc -l
  79

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220809105425.424045-1-naveen.n.rao@linux.vnet.ibm.com
2022-11-24 23:31:46 +11:00
Deming Wang 14b5d59a26 powerpc/pseries: Fix formatting to make code look more beautiful
Operators should be separated by spaces in tce_buildmulti_pSeriesLP()

Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220701094553.1722-1-wangdeming@inspur.com
2022-11-24 23:31:46 +11:00
Deming Wang 932c6dea4f powerpc/xive: remove unused parameter
The parameter xc to xive_cleanup_single_escalation() is unused, so we
can remove it.

Signed-off-by: Deming Wang <wangdeming@inspur.com>
[mpe: Reword change log, unwrap lines < 90 columns]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220413105507.1729-1-wangdeming@inspur.com
2022-11-24 23:12:18 +11:00
Randy Dunlap 4562bffb83 powerpc/mpc52xx_lpbfifo: fix all kernel-doc warnings
Fix multiple kernel-doc warnings in mpc52xx_lpbfifo.c:

arch/powerpc/platforms/52xx/mpc52xx_lpbfifo.c:377: warning: expecting prototype for mpc52xx_lpbfifo_bcom_poll(). Prototype was for mpc52xx_lpbfifo_poll() instead

mpc52xx_lpbfifo.c:221: warning: No description found for return value of 'mpc52xx_lpbfifo_irq'
mpc52xx_lpbfifo.c:327: warning: No description found for return value of 'mpc52xx_lpbfifo_bcom_irq'
mpc52xx_lpbfifo.c:398: warning: No description found for return value of 'mpc52xx_lpbfifo_submit'

mpc52xx_lpbfifo.c:64: warning: Function parameter or member 'req' not described in 'mpc52xx_lpbfifo_kick'
mpc52xx_lpbfifo.c:220: warning: contents before sections
mpc52xx_lpbfifo.c:223: warning: Function parameter or member 'irq' not described in 'mpc52xx_lpbfifo_irq'
mpc52xx_lpbfifo.c:223: warning: Function parameter or member 'dev_id' not described in 'mpc52xx_lpbfifo_irq'
mpc52xx_lpbfifo.c:328: warning: contents before sections
mpc52xx_lpbfifo.c:331: warning: Function parameter or member 'irq' not described in 'mpc52xx_lpbfifo_bcom_irq'
mpc52xx_lpbfifo.c:331: warning: Function parameter or member 'dev_id' not described in 'mpc52xx_lpbfifo_bcom_irq'

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221124061918.1967-1-rdunlap@infradead.org
2022-11-24 23:12:18 +11:00
Christophe Leroy e75d07bd83 powerpc: Remove find_current_mm_pte()
Last usage of find_current_mm_pte() was removed by
commit 15759cb054 ("powerpc/perf/callchain: Use
__get_user_pages_fast in read_user_stack_slow")

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec79f462a3bfa8365b7df505e574d5d85246bc68.1646818177.git.christophe.leroy@csgroup.eu
2022-11-24 23:12:18 +11:00
Christophe JAILLET 5836947613 powerpc/52xx: Fix a resource leak in an error handling path
The error handling path of mpc52xx_lpbfifo_probe() has a request_irq()
that is not balanced by a corresponding free_irq().

Add the missing call, as already done in the remove function.

Fixes: 3c9059d79f ("powerpc/5200: add LocalPlus bus FIFO device driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dec1496d46ccd5311d0f6e9f9ca4238be11bf6a6.1643440531.git.christophe.jaillet@wanadoo.fr
2022-11-24 23:12:18 +11:00
Christophe Leroy 89d21e259a powerpc/bpf/32: Fix Oops on tail call tests
test_bpf tail call tests end up as:

  test_bpf: #0 Tail call leaf jited:1 85 PASS
  test_bpf: #1 Tail call 2 jited:1 111 PASS
  test_bpf: #2 Tail call 3 jited:1 145 PASS
  test_bpf: #3 Tail call 4 jited:1 170 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
  test_bpf: #5 Tail call load/store jited:1
  BUG: Unable to handle kernel data access on write at 0xf1b4e000
  Faulting instruction address: 0xbe86b710
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=4K MMU=Hash PowerMac
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
  Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
  NIP:  be86b710 LR: be857e88 CTR: be86b704
  REGS: f1b4df20 TRAP: 0300   Not tainted  (6.1.0-rc4+)
  MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28008242  XER: 00000000
  DAR: f1b4e000 DSISR: 42000000
  GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
  GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
  GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
  GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
  NIP [be86b710] 0xbe86b710
  LR [be857e88] __run_one+0xec/0x264 [test_bpf]
  Call Trace:
  [f1b4dfe0] [00000002] 0x2 (unreliable)
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  ---[ end trace 0000000000000000 ]---

This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b6 ("bpf, tests: Add load store
test case for tail call")

This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.

This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.

Let's take a new approach. Use register r4 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. This means the input parameter must be in r3, which is more
correct as it is a 32 bits parameter, then tail call better match with
normal BPF function entry, the down side being that we move that input
parameter back and forth between r3 and r4. That can be optimised later.

Doing that also has the advantage of maximising the common parts between
tail calls and a normal function exit.

With the fix, tail call tests are now successfull:

  test_bpf: #0 Tail call leaf jited:1 53 PASS
  test_bpf: #1 Tail call 2 jited:1 115 PASS
  test_bpf: #2 Tail call 3 jited:1 154 PASS
  test_bpf: #3 Tail call 4 jited:1 165 PASS
  test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
  test_bpf: #5 Tail call load/store jited:1 141 PASS
  test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
  test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
  test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
  test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
  test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Fixes: 51c66ad849 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu
2022-11-24 23:05:10 +11:00
Christophe JAILLET a96b20758b KVM: PPC: Book3S HV: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/52e843a460bc374973149b8da0bd04f9761b80b7.1657382184.git.christophe.jaillet@wanadoo.fr
2022-11-24 21:57:50 +11:00
Deming Wang 6fa1efeaa6 KVM: PPC: Book3s: Use arg->size directly in kvm_vm_ioctl_create_spapr_tce()
The size variable is just a copy of args->size, neither size nor args
are modifed, so just use args->size directly.

Signed-off-by: Deming Wang <wangdeming@inspur.com>
[mpe: Reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220703172932.11329-1-wangdeming@inspur.com
2022-11-24 21:56:07 +11:00
Zhang Jiaming 392a58f1ea KVM: PPC: Book3S HV: XIVE: Fix spelling mistakes
Change 'subsquent' to 'subsequent'.
Change 'accross' to 'across'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220623102031.15359-1-jiaming@nfschina.com
2022-11-24 21:54:00 +11:00
XueBing Chen 61119786de KVM: PPC: Use __func__ to get function's name
Prefer using '"%s...", __func__' to get current function's name in
output messages.

Signed-off-by: XueBing Chen <chenxuebing@jari.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/13b2c857.beb.181725bad35.Coremail.chenxuebing@jari.cn
2022-11-24 21:53:04 +11:00
Li Chen cade589fdf kexec: replace crash_mem_range with range
We already have struct range, so just use it.

Link: https://lkml.kernel.org/r/20220929042936.22012-4-bhe@redhat.com
Signed-off-by: Li Chen <lchen@ambarella.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Chen Lifu <chenlifu@huawei.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Jianglei Nie <niejianglei2021@163.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18 13:55:07 -08:00
Mark Rutland 94d095ffa0 ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.

This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.

Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Mark Rutland 0ef86097f1 ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18 13:56:41 +00:00
Sathvika Vasireddy c984aef8c8 objtool/powerpc: Add --mcount specific implementation
This patch enables objtool --mcount on powerpc, and adds implementation
specific to powerpc.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-17-sv@linux.ibm.com
2022-11-18 19:00:16 +11:00
Sathvika Vasireddy e52ec98c5a objtool/powerpc: Enable objtool to be built on ppc
This patch adds [stub] implementations for required functions, inorder
to enable objtool build on powerpc.

[Christophe Leroy: powerpc: Add missing asm/asm.h for objtool,
Use local variables for type and imm in arch_decode_instruction(),
Adapt len for prefixed instructions.]

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-16-sv@linux.ibm.com
2022-11-18 19:00:16 +11:00
Sathvika Vasireddy d0160bd5d3 powerpc/vdso: Skip objtool from running on VDSO files
Do not run objtool on VDSO files, by using OBJECT_FILES_NON_STANDARD.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-8-sv@linux.ibm.com
2022-11-18 19:00:06 +11:00
Christophe Leroy 2da3776167 powerpc/32: Fix objtool unannotated intra-function call warnings
Fix several annotations in assembly files on PPC32.

[Sathvika Vasireddy: Changed subject line and removed Kconfig change to
 enable objtool, as it is a part of "objtool/powerpc: Enable objtool to
 be built on ppc" patch in this series.]

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-7-sv@linux.ibm.com
2022-11-18 19:00:06 +11:00
Jason A. Donenfeld b9b01a5625 random: use random.trust_{bootloader,cpu} command line option only
It's very unusual to have both a command line option and a compile time
option, and apparently that's confusing to people. Also, basically
everybody enables the compile time option now, which means people who
want to disable this wind up having to use the command line option to
ensure that anyway. So just reduce the number of moving pieces and nix
the compile time option in favor of the more versatile command line
option.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:18:10 +01:00
Jason A. Donenfeld 622754e84b stackprotector: actually use get_random_canary()
The RNG always mixes in the Linux version extremely early in boot. It
also always includes a cycle counter, not only during early boot, but
each and every time it is invoked prior to being fully initialized.
Together, this means that the use of additional xors inside of the
various stackprotector.h files is superfluous and over-complicated.
Instead, we can get exactly the same thing, but better, by just calling
`get_random_canary()`.

Acked-by: Guo Ren <guoren@kernel.org> # for csky
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:18:10 +01:00
Jason A. Donenfeld 8032bf1233 treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
  (E)

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18 02:15:15 +01:00
Thomas Gleixner c4bc51b1dd powerpc/pseries/msi: Use msi_domain_ops:: Msi_post_free()
Use the new msi_post_free() callback which is invoked after the interrupts
have been freed to tell the hypervisor about the shutdown.

This allows to remove the exposure of __msi_domain_free_irqs().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122014.120489922@linutronix.de
2022-11-17 15:15:19 +01:00
Nicholas Piggin eb761a1760 powerpc: Fix writable sections being moved into the rodata region
.data.rel.ro*  catches .data.rel.root_cpuacct, and the kernel crashes on
a store in css_clear_dir. At least we know read-only data protection is
working...

Fixes: b6adc6d6d3 ("powerpc/build: move .data.rel.ro, .sdata2 to read-only")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221116043954.3307852-1-npiggin@gmail.com
2022-11-16 21:37:14 +11:00
Sergey Shtylyov 18b9fe54d9 powerpc: ptrace: user_regset_copyin_ignore() always returns 0
user_regset_copyin_ignore() always returns 0, so checking its result seems
pointless -- don't do this anymore...

[akpm@linux-foundation.org: fix gpr32_set_common()]
Link: https://lkml.kernel.org/r/20221014212235.10770-11-s.shtylyov@omp.ru
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-15 14:30:40 -08:00
Sathvika Vasireddy 8d0c21b506 powerpc: Curb objtool unannotated intra-function call warnings
objtool throws the following unannotated intra-function call warnings:
arch/powerpc/kernel/entry_64.o: warning: objtool: .text+0x4: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe64: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xee4: unannotated intra-function call

Fix these warnings by annotating intra-function calls, using
ANNOTATE_INTRA_FUNCTION_CALL macro, to indicate that the branch targets
are valid.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-5-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Sathvika Vasireddy 29a011fc79 powerpc: Fix objtool unannotated intra-function call warnings
Objtool throws unannotated intra-function call warnings in the following
assembly files:

arch/powerpc/kernel/vector.o: warning: objtool: .text+0x53c: unannotated intra-function call

arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x60: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x124: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5d4: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5dc: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xcb8: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xd0c: unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x1030: unannotated intra-function call

arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x358: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x728: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4d94: unannotated intra-function call
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4ec4: unannotated intra-function call

arch/powerpc/kvm/book3s_hv_interrupts.o: warning: objtool: .text+0x6c: unannotated intra-function call
arch/powerpc/kernel/misc_64.o: warning: objtool: .text+0x64: unannotated intra-function call

Objtool does not add STT_NOTYPE symbols with size 0 to the rbtree, which
is why find_call_destination() function is not able to find the
destination symbol for 'bl' instruction. For such symbols, objtool is
throwing unannotated intra-function call warnings in assembly files. Fix
these warnings by annotating those symbols with SYM_FUNC_START_LOCAL and
SYM_FUNC_END macros, inorder to set symbol type to STT_FUNC and symbol
size accordingly.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-4-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Sathvika Vasireddy 01f2cf0b99 powerpc: Override __ALIGN and __ALIGN_STR macros
In a subsequent patch, we would want to annotate powerpc assembly functions
with SYM_FUNC_START_LOCAL macro. This macro depends on __ALIGN macro.

The default expansion of __ALIGN macro is:
        #define __ALIGN      .align 4,0x90

So, override __ALIGN and __ALIGN_STR macros to use the same alignment as
that of the existing _GLOBAL macro. Also, do not pad with 0x90, because
repeated 0x90s are not a nop or trap on powerpc.

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-3-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Sathvika Vasireddy 93e3f45a26 powerpc: Fix __WARN_FLAGS() for use with Objtool
Commit 1e688dd2a3 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") updated __WARN_FLAGS() to use asm
goto, and added a call to 'unreachable()' after the asm goto for optimal
code generation. With CONFIG_OBJTOOL enabled, 'annotate_unreachable()'
statement in 'unreachable()' tries to note down the location of the
subsequent instruction in a separate elf section to aid code flow
analysis. However, on powerpc, this results in gcc emitting a call to a
symbol of size 0. This results in objtool complaining of "unannotated
intra-function call" since the target symbol is not a valid function
call destination.

Objtool wants this annotation for code flow analysis, which we are not
yet enabling on powerpc. As such, expand the call to 'unreachable()' in
__WARN_FLAGS() without annotate_unreachable():
        barrier_before_unreachable();
        __builtin_unreachable();

This still results in optimal code generation for __WARN_FLAGS(), while
getting rid of the objtool warning.

We still need barrier_before_unreachable() to work around gcc bugs 82365
and 106751:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751

Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-2-sv@linux.ibm.com
2022-11-15 20:11:47 +11:00
Paolo Bonzini d663b8a285 KVM: replace direct irq.h inclusion
virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source
directory (i.e. not from arch/*/include) for the sole purpose of retrieving
irqchip_in_kernel.

Making the function inline in a header that is already included,
such as asm/kvm_host.h, is not possible because it needs to look at
struct kvm which is defined after asm/kvm_host.h is included.  So add a
kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is
only performance critical on arm64 and x86, and the non-inline function
is enough on all other architectures.

irq.h can then be deleted from all architectures except x86.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:37 -05:00
Peter Xu c8b88b332b kvm: Add interruptible flag to __gfn_to_pfn_memslot()
Add a new "interruptible" flag showing that the caller is willing to be
interrupted by signals during the __gfn_to_pfn_memslot() request.  Wire it
up with a FOLL_INTERRUPTIBLE flag that we've just introduced.

This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
SIGIPI) even during e.g. handling an userfaultfd page fault.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221011195809.557016-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09 12:31:27 -05:00
Kefeng Wang e025ab842e mm: remove kern_addr_valid() completely
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address.  So as there is no need for kern_addr_valid(), let's
remove it.

Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de>			[parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Acked-by: Guo Ren <guoren@kernel.org>			[csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:18 -08:00
Mike Kravetz 57a196a584 hugetlb: simplify hugetlb handling in follow_page_mask
During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified.  At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages.  ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error.  Instead, at each
level of the page table a check is made for a hugetlb entry.  If a hugetlb
entry is found, a call to a routine associated with that entry is made.

Currently, there are two checks for hugetlb entries at each page table
level.  The first check is of the form:

        if (p?d_huge())
                page = follow_huge_p?d();

the second check is of the form:

        if (is_hugepd())
                page = follow_huge_pd().

We can replace these checks, as well as the special handling routines such
as follow_huge_p?d() and follow_huge_pd() with a single routine to handle
hugetlb vmas.

A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask.  hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries.  huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.

[1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/

[mike.kravetz@oracle.com: remove vma (pmd sharing) per Peter]
  Link: https://lkml.kernel.org/r/20221028181108.119432-1-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: remove left over hugetlb_vma_unlock_read()]
  Link: https://lkml.kernel.org/r/20221030225825.40872-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220919021348.22151-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 17:37:10 -08:00
Jakub Kicinski fbeb229a66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03 13:21:54 -07:00
Michael Ellerman 02a771c9a6 powerpc/32: Select ARCH_SPLIT_ARG64
On 32-bit kernels, 64-bit syscall arguments are split into two
registers. For that to work with syscall wrappers, the prototype of the
syscall must have the argument split so that the wrapper macro properly
unpacks the arguments from pt_regs.

The fanotify_mark() syscall is one such syscall, which already has a
split prototype, guarded behind ARCH_SPLIT_ARG64.

So select ARCH_SPLIT_ARG64 to get that prototype and fix fanotify_mark()
on 32-bit kernels with syscall wrappers.

Note also that fanotify_mark() is the only usage of ARCH_SPLIT_ARG64.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221101034852.2340319-1-mpe@ellerman.id.au
2022-11-01 15:27:12 +11:00
Andreas Schwab ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Michael Ellerman 2153fc9623 powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
There's a build failure for Book3E without AltiVec:
  Error: cc1: error: AltiVec not supported in this target
  make[6]: *** [/linux/scripts/Makefile.build:250:
  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o] Error 1

This happens because the amdgpu build is only gated by
PPC_LONG_DOUBLE_128, but that symbol can be enabled even though AltiVec
is disabled.

The only user of PPC_LONG_DOUBLE_128 is amdgpu, so just add a dependency
on AltiVec to that symbol to fix the build.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221027125626.1383092-1-mpe@ellerman.id.au
2022-10-31 17:39:44 +11:00
Peter Zijlstra bd27568117 perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses
(task) contexts to track events. Most notable is the single hardware
PMU task context, which has resulted in a number of yucky things (both
proposed and merged).

Notably:
 - HW breakpoint PMU
 - ARM big.little PMU / Intel ADL PMU
 - Intel Branch Monitoring PMU
 - AMD IBS PMU
 - S390 cpum_cf PMU
 - PowerPC trace_imc PMU

*Current design:*

Currently we have a per task and per cpu perf_event_contexts:

  task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--> pmu ---'
                                              v           ^
                                         perf_event ------'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of
events for that PMU. The task related perf_event_context's have a
pointer back to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to
that PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc.

Each perf_event is then associated with its PMU and one
perf_event_context.

*Proposed design:*

New design proposed by this patch reduce to a single task context and
a single CPU context but adds some intermediate data-structures:

  task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
       ^                           |   ^ ^
       `---------------------------'   | |
                                       | |    perf_cpu_pmu_context <--.
                                       | `----.    ^                  |
                                       |      |    |                  |
                                       |      v    v                  |
                                       | ,--> perf_event_pmu_context  |
                                       | |                            |
                                       | |                            |
                                       v v                            |
                                  perf_event ---> pmu ----------------'

With the new design, perf_event_context will hold all events for all
pmus in the (respective pinned/flexible) rbtrees. This can be achieved
by adding pmu to rbtree key:

  {cpu, pmu, cgroup, group_index}

Each perf_event_context carries a list of perf_event_pmu_context which
is used to hold per-pmu-per-context state. For example, it keeps track
of currently active events for that pmu, a pmu specific task_ctx_data,
a flag to tell whether rotation is required or not etc.

Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu
state like hrtimer details to drive the event rotation, a pointer to
perf_event_pmu_context of currently running task and some other
ancillary information.

Each perf_event is associated to it's pmu, perf_event_context and
perf_event_pmu_context.

Further optimizations to current implementation are possible. For
example, ctx_resched() can be optimized to reschedule only single pmu
events.

Much thanks to Ravi for picking this up and pushing it towards
completion.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
2022-10-27 20:12:16 +02:00
Jakub Kicinski d5e2d038db eth: fealnx: delete the driver for Myson MTD-800
The git history for this driver seems to be completely
automated / tree wide changes. I can't find any boards
or systems which would use this chip. Google search
shows pictures of towel warmers and no networking products.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221025184254.1717982-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-27 13:32:08 +02:00
Nicholas Piggin 65722736c3 powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
Commit a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when
returning to soft-masked context") fixed the problem of pending irqs
being cleared when clearing the HARD_DIS bit, but then it didn't clear
the bit at all. This change clears HARD_DIS without affecting other bits
in the mask.

When an interrupt hits in a soft-masked section that has MSR[EE]=1, it
can hard disable and set PACA_IRQS_HARD_DIS, which must be cleared when
returning to the EE=1 caller (unless it was set due to a MUST_HARD_MASK
interrupt becoming pending). Failure to clear this leaves the
returned-to context running with MSR[EE]=1 and PACA_IRQS_HARD_DIS, which
confuses irq assertions and could be dangerous for code that might test
the flag.

This was observed in a hash MMU kernel where a kernel hash fault hits in
a local_irqs_disabled region that has EE=1. The hash fault also runs
with EE=1, then as it returns, a decrementer hits in the restart section
and the irq restart code hard-masks which sets the PACA_IRQ_HARD_DIS
flag, which is not clear when the original context is returned to.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: a4cb3651a1 ("powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221022052207.471328-1-npiggin@gmail.com
2022-10-27 00:38:35 +11:00
Jakub Kicinski 94adb5e29e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 17:49:10 -07:00
Sean Anderson 4e31b808fa powerpc: dts: qoriq: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is
present it's used for MACs 1 through 4).

Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Sean Anderson 36926a7d70 powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
fragments, and mark the QMAN ports as 10G.

Fixes: da414bb923 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-19 13:25:09 +01:00
Nicholas Piggin dc398a084d powerpc/64s/interrupt: Perf NMI should not take normal exit path
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Squash in fixups from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006140413.126443-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin a073672eb0 powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
NMI PMIs really should not return using the normal interrupt_return
function. If such a PMI hits in code returning to user with the context
switched to user mode, this warning can fire. This was enough to cause
crashes when reproducing on 64s, because another perf interrupt would
hit while reporting bug, and that would cause another bug, and so on
until smashing the stack.

Work around that particular crash for now by just disabling that context
warning for PMIs. This is a hack and not a complete fix, there could be
other such problems lurking in corners. But it does fix the known crash.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin e59b3399fd KVM: PPC: BookS PR-KVM and BookE do not support context tracking
The context tracking code in PR-KVM and BookE implementations is not
complete, and can cause host crashes if context tracking is enabled.

Make these implementations depend on !CONTEXT_TRACKING_USER.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221014030729.2077151-2-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin 00ff1eaac1 powerpc: Fix reschedule bug in KUAP-unlocked user copy
schedule must not be explicitly called while KUAP is unlocked, because
the AMR register will not be saved across the context switch on
64s (preemption is allowed because that is driven by interrupts which do
save the AMR).

exit_vmx_usercopy() runs inside an unlocked user access region, and it
calls preempt_enable() which will call schedule() if need_resched() was
set while non-preemptible. This can cause tasks to run unprotected when
the should not, and can cause the user copy to be improperly blocked
when scheduling back to it.

Fix this by avoiding the explicit resched for preempt kernels by
generating an interrupt to reschedule the context if need_resched() got
set.

Reported-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-3-npiggin@gmail.com
2022-10-18 22:46:19 +11:00
Nicholas Piggin 2b2095f3a6 powerpc/64s: Fix hash__change_memory_range preemption warning
stop_machine_cpuslocked takes a mutex so it must be called in a
preemptible context, so it can't simply be fixed by disabling
preemption.

This is not a bug, because CPU hotplug is locked, so this processor will
call in to the stop machine function. So raw_smp_processor_id() could be
used. This leaves a small chance that this thread will be migrated to
another CPU, so the master work would be done by a CPU from a different
context. Better for test coverage to make that a common case by just
having the first CPU to call in become the master.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-2-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin b9ef323ea1 powerpc/64s: Disable preemption in hash lazy mmu mode
apply_to_page_range on kernel pages does not disable preemption, which
is a requirement for hash's lazy mmu mode, which keeps track of the
TLBs to flush with a per-cpu array.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-1-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin b12eb279ff powerpc/64s: make linear_map_hash_lock a raw spinlock
This lock is taken while the raw kfence_freelist_lock is held, so it
must also be a raw spinlock, as reported by lockdep when raw lock
nesting checking is enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-3-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin 35159b5717 powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe
With kfence enabled, there are several cases where HPTE and TLBIE locks
are called from softirq context, for example:

  WARNING: inconsistent lock state
  6.0.0-11845-g0cbbc95b12ac #1 Tainted: G                 N
  --------------------------------
  inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
  c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600
  {IN-SOFTIRQ-W} state was registered at:
    .lock_acquire+0x20c/0x520
    ._raw_spin_lock+0x4c/0x70
    .native_hpte_invalidate+0x62c/0x840
    .hash__kernel_map_pages+0x450/0x640
    .kfence_protect+0x58/0xc0
    .kfence_guarded_free+0x374/0x5a0
    .__slab_free+0x3d0/0x630
    .put_cred_rcu+0xcc/0x120
    .rcu_core+0x3c4/0x14e0
    .__do_softirq+0x1dc/0x7dc
    .do_softirq_own_stack+0x40/0x60

Fix this by consistently disabling irqs while taking either of these
locks. Don't just disable bh because several of the more common cases
already disable irqs, so this just makes the locks always irq-safe.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-2-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Nicholas Piggin be83d5485d powerpc/64s: Add lockdep for HPTE lock
Add lockdep annotation for the HPTE bit-spinlock. Modern systems don't
take the tlbie lock, so this shows up some of the same lockdep warnings
that were being reported by the ppc970. And they're not taken in exactly
the same places so this is nice to have in its own right.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013230710.1987253-1-npiggin@gmail.com
2022-10-18 22:46:18 +11:00
Haren Myneni 2147783d6b powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU
The hypervisor assigns VAS (Virtual Accelerator Switchboard)
windows depends on cores configured in LPAR. The kernel uses
OF reconfig notifier to reconfig VAS windows for DLPAR CPU event.
In the case of shared CPU mode partition, the hypervisor assigns
VAS windows depends on CPU entitled capacity, not based on vcpus.
When the user changes CPU entitled capacity for the partition,
drmgr uses /proc/ppc64/lparcfg interface to notify the kernel.

This patch adds the following changes to update VAS resources
for shared mode:
- Call vas reconfig windows from lparcfg_write()
- Ignore reconfig changes in the VAS notifier

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Rework error handling, report any errors as EIO]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/efa9c16e4a78dda4567a16f13dabfd73cb4674a2.camel@linux.ibm.com
2022-10-18 22:46:18 +11:00
Haren Myneni 89ed0b769d powerpc/pseries/vas: Add VAS IRQ primary handler
irq_default_primary_handler() can be used only with IRQF_ONESHOT
flag, but the flag disables IRQ before executing the thread handler
and enables it after the interrupt is handled. But this IRQ disable
sets the VAS IRQ OFF state in the hypervisor. In case if NX faults
during this window, the hypervisor will not deliver the fault
interrupt to the partition and the user space may wait continuously
for the CSB update. So use VAS specific IRQ handler instead of
calling the default primary handler.

Increment pending_faults counter in IRQ handler and the bottom
thread handler will process all faults based on this counter.
In case if the another interrupt is received while the thread is
running, it will be processed using this counter. The synchronization
of top and bottom handlers will be done with IRQTF_RUNTHREAD flag
and will re-enter to bottom half if this flag is set.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaad8813b4762a6753cfcd0b605a7574a5192ec7.camel@linux.ibm.com
2022-10-18 22:46:18 +11:00
Linus Torvalds f1947d7c8a Random number generator fixes for Linux 6.1-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
 A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
 qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
 R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
 lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
 MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
 8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
 XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
 Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
 vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
 4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
 lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
 =M+mV
 -----END PGP SIGNATURE-----

Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull more random number generator updates from Jason Donenfeld:
 "This time with some large scale treewide cleanups.

  The intent of this pull is to clean up the way callers fetch random
  integers. The current rules for doing this right are:

   - If you want a secure or an insecure random u64, use get_random_u64()

   - If you want a secure or an insecure random u32, use get_random_u32()

     The old function prandom_u32() has been deprecated for a while
     now and is just a wrapper around get_random_u32(). Same for
     get_random_int().

   - If you want a secure or an insecure random u16, use get_random_u16()

   - If you want a secure or an insecure random u8, use get_random_u8()

   - If you want secure or insecure random bytes, use get_random_bytes().

     The old function prandom_bytes() has been deprecated for a while
     now and has long been a wrapper around get_random_bytes()

   - If you want a non-uniform random u32, u16, or u8 bounded by a
     certain open interval maximum, use prandom_u32_max()

     I say "non-uniform", because it doesn't do any rejection sampling
     or divisions. Hence, it stays within the prandom_*() namespace, not
     the get_random_*() namespace.

     I'm currently investigating a "uniform" function for 6.2. We'll see
     what comes of that.

  By applying these rules uniformly, we get several benefits:

   - By using prandom_u32_max() with an upper-bound that the compiler
     can prove at compile-time is ≤65536 or ≤256, internally
     get_random_u16() or get_random_u8() is used, which wastes fewer
     batched random bytes, and hence has higher throughput.

   - By using prandom_u32_max() instead of %, when the upper-bound is
     not a constant, division is still avoided, because
     prandom_u32_max() uses a faster multiplication-based trick instead.

   - By using get_random_u16() or get_random_u8() in cases where the
     return value is intended to indeed be a u16 or a u8, we waste fewer
     batched random bytes, and hence have higher throughput.

  This series was originally done by hand while I was on an airplane
  without Internet. Later, Kees and I worked on retroactively figuring
  out what could be done with Coccinelle and what had to be done
  manually, and then we split things up based on that.

  So while this touches a lot of files, the actual amount of code that's
  hand fiddled is comfortably small"

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  prandom: remove unused functions
  treewide: use get_random_bytes() when possible
  treewide: use get_random_u32() when possible
  treewide: use get_random_{u8,u16}() when possible, part 2
  treewide: use get_random_{u8,u16}() when possible, part 1
  treewide: use prandom_u32_max() when possible, part 2
  treewide: use prandom_u32_max() when possible, part 1
2022-10-16 15:27:07 -07:00
Linus Torvalds 5e714bf171 - Alistair Popple has a series which addresses a race which causes page
refcounting errors in ZONE_DEVICE pages.
 
 - Peter Xu fixes some userfaultfd test harness instability.
 
 - Various other patches in MM, mainly fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0j6igAKCRDdBJ7gKXxA
 jnGxAP99bV39ZtOsoY4OHdZlWU16BUjKuf/cb3bZlC2G849vEwD+OKlij86SG20j
 MGJQ6TfULJ8f1dnQDd6wvDfl3FMl7Qc=
 =tbdp
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - fix a race which causes page refcounting errors in ZONE_DEVICE pages
   (Alistair Popple)

 - fix userfaultfd test harness instability (Peter Xu)

 - various other patches in MM, mainly fixes

* tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits)
  highmem: fix kmap_to_page() for kmap_local_page() addresses
  mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page
  mm/selftest: uffd: explain the write missing fault check
  mm/hugetlb: use hugetlb_pte_stable in migration race check
  mm/hugetlb: fix race condition of uffd missing/minor handling
  zram: always expose rw_page
  LoongArch: update local TLB if PTE entry exists
  mm: use update_mmu_tlb() on the second thread
  kasan: fix array-bounds warnings in tests
  hmm-tests: add test for migrate_device_range()
  nouveau/dmem: evict device private memory during release
  nouveau/dmem: refactor nouveau_dmem_fault_copy_one()
  mm/migrate_device.c: add migrate_device_range()
  mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page()
  mm/memremap.c: take a pgmap reference on page allocation
  mm: free device private pages have zero refcount
  mm/memory.c: fix race when faulting a device private page
  mm/damon: use damon_sz_region() in appropriate place
  mm/damon: move sz_damon_region to damon_sz_region
  lib/test_meminit: add checks for the allocation functions
  ...
2022-10-14 12:28:43 -07:00
Linus Torvalds 70609c1495 powerpc fixes for 6.1 #2
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned register-pairs.
    Notably this broke ftruncate64 & pread/write64, which can lead to file corruption.
 
  - Fix lost interrupts when returning to soft-masked context on 64-bit.
 
  - Fix build failure when CONFIG_DTL=n.
 
 Thanks to: Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd Bergmann, Sachin Sant.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNJQp8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPd3D/9+uPapAs6/ZGkrbojseiF/fZNyIfTF
 HcLZ3iWUP9+APuaRxd1m3HbBoWdgFQ9yC5abYnX1z+E/yEhbTaOof9Gu1vXfxbIF
 Orlc+X12uZMrdU1L2qlgk/89QLeKOTlp/b2eUXJ332Vl4DfJDoknMsqTqbMz9NrL
 EDj6vmG3JzJgKHmplwxd7d4CExQNh9khZr3lZ6Ae/hC1cZ1PVXD67akwxsc0i8E2
 dBGzwsP1SYXPrBMXONT1GNs/whXrKBHNyLWFueab3p0FsSkbc4sY/HaAPzjq7YzK
 aPV0FQi2YpIZ5IIunZA+vW+tQ8EvyOpOUPMGb7iiN/31YcFbf/k6Y92AlCFlHRq3
 9vvsNGzDeFBfvH2wWcR1XaAMQHmBpC4L+dYbCKIyuK9DjgZtpZUV3acagRMFXqMN
 TeN9h2GXAYxCdNc27I5v3wAvbNcx69E4Bs5F7voKGD4pb7V/4d/li2oix6v0FKIy
 qAt1PGhD4IgPE3oJOEOYsJRVQNyDfG4eKlFh5ijyyc5hQeGNeKiAH+G+a61R3ueo
 GvVOdkR/1QxK1XkbgrMyclb0nHN1kEjRHeIcMFW1+dMKA9UvOlJ6u8nkOFRvA3js
 r00O1uE7DT5WzdGXgEHCCDd7PybBKxpM32m4/vzgDM7xG+DHxp3j7LTQsDOfkFsH
 bkc31LqrAw3yDQ==
 =B+7s
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
   register-pairs. Notably this broke ftruncate64 & pread/write64, which
   can lead to file corruption.

 - Fix lost interrupts when returning to soft-masked context on 64-bit.

 - Fix build failure when CONFIG_DTL=n.

Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.

* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_DTL=n build
  powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
  powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
2022-10-14 11:16:18 -07:00
Nicholas Piggin 90d5ce82e1 powerpc/pseries: Fix CONFIG_DTL=n build
The recently moved dtl code must be compiled-in if
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y even if CONFIG_DTL=n.

Fixes: 6ba5aa541a ("powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013073131.1485742-1-npiggin@gmail.com
2022-10-13 22:30:07 +11:00
Nicholas Piggin a4cb3651a1 powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
It's possible for an interrupt returning to an irqs-disabled context to
lose a pending soft-masked irq because it branches to part of the exit
code for irqs-enabled contexts, which is meant to clear only the
PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This
just looks like a simple thinko from a recent commit (if there was no
hard mask pending, there would be no reason to clear it anyway).

This also adds comment to the code that actually does need to clear the
flag.

Fixes: e485f6c751 ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
2022-10-13 22:29:49 +11:00
Alistair Popple ef23345089 mm: free device private pages have zero refcount
Since 27674ef6c7 ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use.  However before handing them back to the
owning device driver we add an extra reference count such that free pages
have a reference count of one.

This makes it difficult to tell if a page is free or not because both free
and in use pages will have a non-zero refcount.  Instead we should return
pages to the drivers page allocator with a zero reference count.  Kernel
code can then safely use kernel functions such as get_page_unless_zero().

Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00
Alistair Popple 16ce101db8 mm/memory.c: fix race when faulting a device private page
Patch series "Fix several device private page reference counting issues",
v2

This series aims to fix a number of page reference counting issues in
drivers dealing with device private ZONE_DEVICE pages.  These result in
use-after-free type bugs, either from accessing a struct page which no
longer exists because it has been removed or accessing fields within the
struct page which are no longer valid because the page has been freed.

During normal usage it is unlikely these will cause any problems.  However
without these fixes it is possible to crash the kernel from userspace. 
These crashes can be triggered either by unloading the kernel module or
unbinding the device from the driver prior to a userspace task exiting. 
In modules such as Nouveau it is also possible to trigger some of these
issues by explicitly closing the device file-descriptor prior to the task
exiting and then accessing device private memory.

This involves some minor changes to both PowerPC and AMD GPU code. 
Unfortunately I lack hardware to test either of those so any help there
would be appreciated.  The changes mimic what is done in for both Nouveau
and hmm-tests though so I doubt they will cause problems.


This patch (of 8):

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called.  However no
reference is taken on the faulting page.  Therefore a concurrent migration
of the device private page can free the page and possibly the underlying
pgmap.  This results in a race which can crash the kernel due to the
migrate_to_ram() function pointer becoming invalid.  It also means drivers
can't reliably read the zone_device_data field because the page may have
been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl to
ensure it has not been freed.  Unfortunately the elevated reference count
will cause the migration required to handle the fault to fail.  To avoid
this failure pass the faulting page into the migrate_vma functions so that
if an elevated reference count is found it can be checked to see if it's
expected or not.

[mpe@ellerman.id.au: fix build]
  Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12 18:51:49 -07:00
Linus Torvalds 676cb49573 - hfs and hfsplus kmap API modernization from Fabio Francesco
- Valentin Schneider makes crash-kexec work properly when invoked from
   an NMI-time panic.
 
 - ntfs bugfixes from Hawkins Jiawei
 
 - Jiebin Sun improves IPC msg scalability by replacing atomic_t's with
   percpu counters.
 
 - nilfs2 cleanups from Minghao Chi
 
 - lots of other single patches all over the tree!
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0Yf0gAKCRDdBJ7gKXxA
 joapAQDT1d1zu7T8yf9cQXkYnZVuBKCjxKE/IsYvqaq1a42MjQD/SeWZg0wV05B8
 DhJPj9nkEp6R3Rj3Mssip+3vNuceAQM=
 =lUQY
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - hfs and hfsplus kmap API modernization (Fabio Francesco)

 - make crash-kexec work properly when invoked from an NMI-time panic
   (Valentin Schneider)

 - ntfs bugfixes (Hawkins Jiawei)

 - improve IPC msg scalability by replacing atomic_t's with percpu
   counters (Jiebin Sun)

 - nilfs2 cleanups (Minghao Chi)

 - lots of other single patches all over the tree!

* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
  proc: test how it holds up with mapping'less process
  mailmap: update Frank Rowand email address
  ia64: mca: use strscpy() is more robust and safer
  init/Kconfig: fix unmet direct dependencies
  ia64: update config files
  nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
  fork: remove duplicate included header files
  init/main.c: remove unnecessary (void*) conversions
  proc: mark more files as permanent
  nilfs2: remove the unneeded result variable
  nilfs2: delete unnecessary checks before brelse()
  checkpatch: warn for non-standard fixes tag style
  usr/gen_init_cpio.c: remove unnecessary -1 values from int file
  ipc/msg: mitigate the lock contention with percpu counter
  percpu: add percpu_counter_add_local and percpu_counter_sub_local
  fs/ocfs2: fix repeated words in comments
  relay: use kvcalloc to alloc page array in relay_alloc_page_array
  proc: make config PROC_CHILDREN depend on PROC_FS
  fs: uninline inode_maybe_inc_iversion()
  ...
2022-10-12 11:00:22 -07:00
Nicholas Piggin e237506238 powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.

The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.

With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.

Fix this by having 32-bit kernels share those syscall definitions with
compat.

Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-13 00:49:58 +11:00
Jason A. Donenfeld 197173db99 treewide: use get_random_bytes() when possible
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:58 -06:00
Jason A. Donenfeld 81895a65ec treewide: use prandom_u32_max() when possible, part 1
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value & (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:55 -06:00
Joel Stanley ae5b6779fa powerpc: Fix 85xx build
The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.

Fixes: 8afc66e8d4 ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-11 10:13:34 -07:00
Linus Torvalds 27bc50fc90 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative
   reports (or any positive ones, come to that).
 
 - Also the Maple Tree from Liam R.  Howlett.  An overlapping range-based
   tree for vmas.  It it apparently slight more efficient in its own right,
   but is mainly targeted at enabling work to reduce mmap_lock contention.
 
   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.
 
   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
   This has yet to be addressed due to Liam's unfortunately timed
   vacation.  He is now back and we'll get this fixed up.
 
 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer.  It uses
   clang-generated instrumentation to detect used-unintialized bugs down to
   the single bit level.
 
   KMSAN keeps finding bugs.  New ones, as well as the legacy ones.
 
 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.
 
 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
   file/shmem-backed pages.
 
 - userfaultfd updates from Axel Rasmussen
 
 - zsmalloc cleanups from Alexey Romanov
 
 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
 
 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.
 
 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.
 
 - memcg cleanups from Kairui Song.
 
 - memcg fixes and cleanups from Johannes Weiner.
 
 - Vishal Moola provides more folio conversions
 
 - Zhang Yi removed ll_rw_block() :(
 
 - migration enhancements from Peter Xu
 
 - migration error-path bugfixes from Huang Ying
 
 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths.  For optimizations by PMEM drivers, DRM
   drivers, etc.
 
 - vma merging improvements from Jakub Matěn.
 
 - NUMA hinting cleanups from David Hildenbrand.
 
 - xu xin added aditional userspace visibility into KSM merging activity.
 
 - THP & KSM code consolidation from Qi Zheng.
 
 - more folio work from Matthew Wilcox.
 
 - KASAN updates from Andrey Konovalov.
 
 - DAMON cleanups from Kaixu Xia.
 
 - DAMON work from SeongJae Park: fixes, cleanups.
 
 - hugetlb sysfs cleanups from Muchun Song.
 
 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
 joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
 bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
 =xfWx
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP & KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
2022-10-10 17:53:04 -07:00
Linus Torvalds 3604a7f568 This update includes the following changes:
API:
 
 - Feed untrusted RNGs into /dev/random.
 - Allow HWRNG sleeping to be more interruptible.
 - Create lib/utils module.
 - Setting private keys no longer required for akcipher.
 - Remove tcrypt mode=1000.
 - Reorganised Kconfig entries.
 
 Algorithms:
 
 - Load x86/sha512 based on CPU features.
 - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher.
 
 Drivers:
 
 - Add HACE crypto driver aspeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmM785cACgkQxycdCkmx
 i6dveBAAmGVYtrPmcGfA6CmzZ8ps9KdZxhjHjzLKwuqrOMulZvE2IYeUV4QtNqpQ
 6NLY2+TkqL0XIbCXoByIk32lMYIlXBaJdMYdHHDTeo7E2wqZn/46SPSWeNKazyJx
 dkL8Oj62nqDc2s0LOi3vLvod+sENFQ69R+vkHOa0fZhX0UBsac3NIXo+74Y2A7bE
 0+iQFKTWdNnoQzQ0j4q8WMiolKYh21iPZ9l5sjgMgichLCaE6PrITlRcaWrtPhey
 U1OmJtbTPsg+5X1r9KyLtoAXtBDONl66GQyne+p/ZYD8cMhxomjJaPlMhwWE/n4d
 d2KJKvoXoPPo4c+yNIS9hBav07ZriPl0q0jd2M1rd6oYTmFpaodTgIBfjvxO+wfV
 GoqDS8PEc42U1uwkuKC/cvfr6pB8WiybfXy+vSXBm/jUgIOO3y+eqsC8Jx9ZoQeG
 F+d34PYfJrJbmDRtcA6ZKdzN0OmKq7aCilx1kGKGPg0D+uq64FBo7zsT6XzTK8HL
 2Za9AACPn87xLQwGrKDSBfyrlSSIJm2FaIIPayUXHEo7cyoiZwbTpXRRJ1mDR+v9
 jzI+xPEXCthtjysuRmufNhTkiZUv3lZ8ORfQ0QFKR53tjZUm+dVQo0V/N/ZSXoSV
 SyRvXYO+ToXePAofNWl1LcO1grX/vxtFNedMkDLHXooRcnCaIYo=
 =rq2f
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Feed untrusted RNGs into /dev/random
   - Allow HWRNG sleeping to be more interruptible
   - Create lib/utils module
   - Setting private keys no longer required for akcipher
   - Remove tcrypt mode=1000
   - Reorganised Kconfig entries

  Algorithms:
   - Load x86/sha512 based on CPU features
   - Add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher

  Drivers:
   - Add HACE crypto driver aspeed"

* tag 'v6.1-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits)
  crypto: aspeed - Remove redundant dev_err call
  crypto: scatterwalk - Remove unused inline function scatterwalk_aligned()
  crypto: aead - Remove unused inline functions from aead
  crypto: bcm - Simplify obtain the name for cipher
  crypto: marvell/octeontx - use sysfs_emit() to instead of scnprintf()
  hwrng: core - start hwrng kthread also for untrusted sources
  crypto: zip - remove the unneeded result variable
  crypto: qat - add limit to linked list parsing
  crypto: octeontx2 - Remove the unneeded result variable
  crypto: ccp - Remove the unneeded result variable
  crypto: aspeed - Fix check for platform_get_irq() errors
  crypto: virtio - fix memory-leak
  crypto: cavium - prevent integer overflow loading firmware
  crypto: marvell/octeontx - prevent integer overflows
  crypto: aspeed - fix build error when only CRYPTO_DEV_ASPEED is enabled
  crypto: hisilicon/qm - fix the qos value initialization
  crypto: sun4i-ss - use DEFINE_SHOW_ATTRIBUTE to simplify sun4i_ss_debugfs
  crypto: tcrypt - add async speed test for aria cipher
  crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher
  crypto: aria - prepare generic module for optimized implementations
  ...
2022-10-10 13:04:25 -07:00
Linus Torvalds d4013bc4d4 bitmap patches for v6.1-rc1
From Phil Auld:
 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
 
 From me:
 cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess
 
 This series cleans that mess and adds new config FORCE_NR_CPUS that
 allows to optimize cpumask subsystem if the number of CPUs is known
 at compile-time.
 
 From me:
 lib: optimize find_bit() functions
 
 Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros.
 
 From me:
 lib/find: add find_nth_bit()
 
 Adds find_nth_bit(), which is ~70 times faster than bitcounting with
 for_each() loop:
         for_each_set_bit(bit, mask, size)
                 if (n-- == 0)
                         return bit;
 
 Also adds bitmap_weight_and() to let people replace this pattern:
 	tmp = bitmap_alloc(nbits);
 	bitmap_and(tmp, map1, map2, nbits);
 	weight = bitmap_weight(tmp, nbits);
 	bitmap_free(tmp);
 with a single bitmap_weight_and() call.
 
 From me:
 cpumask: repair cpumask_check()
 
 After switching cpumask to use nr_cpu_ids, cpumask_check() started
 generating many false-positive warnings. This series fixes it.
 
 From Valentin Schneider:
 bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot()
 
 Extends the API with one more function and applies it in sched/core.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof
 vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow
 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD
 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6
 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc
 /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001
 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm
 r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV
 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3
 GR7sMQae
 =Q/+F
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
2022-10-10 12:49:34 -07:00
Linus Torvalds 8afc66e8d4 Kbuild updates for v6.1
- Remove potentially incomplete targets when Kbuid is interrupted by
    SIGINT etc. in case GNU Make may miss to do that when stderr is piped
    to another program.
 
  - Rewrite the single target build so it works more correctly.
 
  - Fix rpm-pkg builds with V=1.
 
  - List top-level subdirectories in ./Kbuild.
 
  - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in kallsyms.
 
  - Avoid two different modules in lib/zstd/ having shared code, which
    potentially causes building the common code as build-in and modular
    back-and-forth.
 
  - Unify two modpost invocations to optimize the build process.
 
  - Remove head-y syntax in favor of linker scripts for placing particular
    sections in the head of vmlinux.
 
  - Bump the minimal GNU Make version to 3.82.
 
  - Clean up misc Makefiles and scripts.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmM+4vcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGY2IQAInr0JUNnkkxwUSXtOcQuA3IK8RJ
 FbU9HXJRoV9H+7+l3SMlN7mIbrs5eE5fTY3iwQ3CVe139d1+1q7nvTMRv8owywJx
 GBgzswncuu1lk7iQQ//CxiqMwSCG8GJdYn1uDVy4I5jg3o+DtFZJtyq2Wb7pqsMm
 ZhZ4PozRN+idYQJSF6Vx/zEVLHI7quMBwfe4CME8/0Kg2+hnYzbXV/aUf0ED2emq
 zdCMDQgIOK5AhY+8qgMXKYnBUJMTqBp6LoR4p3ApfUkwRFY0sGa0/LK3U/B22OE7
 uWyR4fCUExGyerlcHEVev+9eBfmsLLPyqlchNwpSDOPf5OSdnKmgqJEBR/Cvx0eh
 URerPk7EHxyH3G8yi+cU2GtofNTGc5RHPRgJE2ADsQEi5TAUKGmbXMlsFRL/51Vn
 lTANZObBNa1d4enljF6TfTL5nuccOa+DKvXnH9fQ49t0QdtSikv6J/lGwilwm1Sr
 BctmCsySPuURZfkpI9OQnLuouloMXl9f7Q/+S39haS/tSgvPpyITyO71nxDnXn/s
 BbFObZJUk9QkqOACjBP1hNErTLt83uBxQ9z+rDCw/SbLIe4nw0wyneuygfHI5rI8
 3RZB2DbGauuJHX2Zs6YGS14SLSY33IsLqKR1/Vy3LrPvOHuEvNiOR8LITq5E0YCK
 OffK2Y5cIlXR0QWf
 =DHiN
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove potentially incomplete targets when Kbuid is interrupted by
   SIGINT etc in case GNU Make may miss to do that when stderr is piped
   to another program.

 - Rewrite the single target build so it works more correctly.

 - Fix rpm-pkg builds with V=1.

 - List top-level subdirectories in ./Kbuild.

 - Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in
   kallsyms.

 - Avoid two different modules in lib/zstd/ having shared code, which
   potentially causes building the common code as build-in and modular
   back-and-forth.

 - Unify two modpost invocations to optimize the build process.

 - Remove head-y syntax in favor of linker scripts for placing
   particular sections in the head of vmlinux.

 - Bump the minimal GNU Make version to 3.82.

 - Clean up misc Makefiles and scripts.

* tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
  docs: bump minimal GNU Make version to 3.82
  ia64: simplify esi object addition in Makefile
  Revert "kbuild: Check if linker supports the -X option"
  kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
  kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
  zstd: Fixing mixed module-builtin objects
  kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
  kallsyms: take the input file instead of reading stdin
  kallsyms: drop duplicated ignore patterns from kallsyms.c
  kbuild: reuse mksysmap output for kallsyms
  mksysmap: update comment about __crc_*
  kbuild: remove head-y syntax
  kbuild: use obj-y instead extra-y for objects placed at the head
  kbuild: hide error checker logs for V=1 builds
  kbuild: re-run modpost when it is updated
  kbuild: unify two modpost invocations
  kbuild: move vmlinux.o rule to the top Makefile
  kbuild: move .vmlinux.objs rule to Makefile.modpost
  kbuild: list sub-directories in ./Kbuild
  Makefile.compiler: replace cc-ifversion with compiler-specific macros
  ...
2022-10-10 12:00:45 -07:00
Linus Torvalds 3871d93b82 Perf events updates for v6.1:
- PMU driver updates:
 
      - Add AMD Last Branch Record Extension Version 2 (LbrExtV2)
        feature support for Zen 4 processors.
 
      - Extend the perf ABI to provide branch speculation information,
        if available, and use this on CPUs that have it (eg. LbrExtV2).
 
      - Improve Intel PEBS TSC timestamp handling & integration.
 
      - Add Intel Raptor Lake S CPU support.
 
      - Add 'perf mem' and 'perf c2c' memory profiling support on
        AMD CPUs by utilizing IBS tagged load/store samples.
 
      - Clean up & optimize various x86 PMU details.
 
  - HW breakpoints:
 
      - Big rework to optimize the code for systems with hundreds of CPUs and
        thousands of breakpoints:
 
         - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
 	  per-CPU rwsem that is read-locked during most of the key operations.
 
 	- Improve the O(#cpus * #tasks) logic in toggle_bp_slot()
 	  and fetch_bp_busy_slots().
 
 	- Apply micro-optimizations & cleanups.
 
   - Misc cleanups & enhancements.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/2pMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIMA/+J+MCEVTt9kwZeBtHoPX7iZ5gnq1+McoQ
 f6ALX19AO/ZSuA7EBA3cS3Ny5eyGy3ofYUnRW+POezu9CpflLW/5N27R2qkZFrWC
 A09B86WH676ZrmXt+oI05rpZ2y/NGw4gJxLLa4/bWF3g9xLfo21i+YGKwdOnNFpl
 DEdCVHtjlMcOAU3+on6fOYuhXDcYd7PKGcCfLE7muOMOAtwyj0bUDBt7m+hneZgy
 qbZHzDU2DZ5L/LXiMyuZj5rC7V4xUbfZZfXglG38YCW1WTieS3IjefaU2tREhu7I
 rRkCK48ULDNNJR3dZK8IzXJRxusq1ICPG68I+nm/K37oZyTZWtfYZWehW/d/TnPa
 tUiTwimabz7UUqaGq9ZptxwINcAigax0nl6dZ3EseeGhkDE6j71/3kqrkKPz4jth
 +fCwHLOrI3c4Gq5qWgPvqcUlUneKf3DlOMtzPKYg7sMhla2zQmFpYCPzKfm77U/Z
 BclGOH3FiwaK6MIjPJRUXTePXqnUseqCR8PCH/UPQUeBEVHFcMvqCaa15nALed8x
 dFi76VywR9mahouuLNq6sUNePlvDd2B124PygNwegLlBfY9QmKONg9qRKOnQpuJ6
 UprRJjLOOucZ/N/jn6+ShHkqmXsnY2MhfUoHUoMQ0QAI+n++e+2AuePo251kKWr8
 QlqKxd9PMQU=
 =LcGg
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "PMU driver updates:

   - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature
     support for Zen 4 processors.

   - Extend the perf ABI to provide branch speculation information, if
     available, and use this on CPUs that have it (eg. LbrExtV2).

   - Improve Intel PEBS TSC timestamp handling & integration.

   - Add Intel Raptor Lake S CPU support.

   - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs
     by utilizing IBS tagged load/store samples.

   - Clean up & optimize various x86 PMU details.

  HW breakpoints:

   - Big rework to optimize the code for systems with hundreds of CPUs
     and thousands of breakpoints:

      - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
        per-CPU rwsem that is read-locked during most of the key
        operations.

      - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and
        fetch_bp_busy_slots().

      - Apply micro-optimizations & cleanups.

  - Misc cleanups & enhancements"

* tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
  perf: Fix pmu_filter_match()
  perf: Fix lockdep_assert_event_ctx()
  perf/x86/amd/lbr: Adjust LBR regardless of filtering
  perf/x86/utils: Fix uninitialized var in get_branch_type()
  perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
  perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
  perf/x86/amd: Support PERF_SAMPLE_ADDR
  perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
  perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
  perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
  perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
  perf/x86/uncore: Add new Raptor Lake S support
  perf/x86/cstate: Add new Raptor Lake S support
  perf/x86/msr: Add new Raptor Lake S support
  perf/x86: Add new Raptor Lake S support
  bpf: Check flags for branch stack in bpf_read_branch_records helper
  perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
  perf: Use sample_flags for raw_data
  perf: Use sample_flags for addr
  ...
2022-10-10 09:27:46 -07:00
Linus Torvalds 4899a36f91 powerpc updates for 6.1
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
 
  - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
 
  - Add support for syscall wrappers.
 
  - Add support for KFENCE on 64-bit.
 
  - Update 64-bit HV KVM to use the new guest state entry/exit accounting API.
 
  - Support execute-only memory when using the Radix MMU (P9 or later).
 
  - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
 
  - Updates to our linker script to move more data into read-only sections.
 
  - Allow the VDSO to be randomised on 32-bit.
 
  - Many other small features and fixes.
 
 Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Christophe
 Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas, Gaosheng Cui, Gustavo A. R. Silva,
 Haren Myneni, Hari Bathini, Jilin Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof
 Kozlowski, Laurent Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
 Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali Rohár, Rohan McLure,
 Russell Currey, Sachin Sant, Segher Boessenkool, Shrikanth Hegde, Tyrel Datwyler, Wolfram
 Sang, ye xingchen, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmNCpBMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDx3EACCf86iumFF3RyvENtDwoTRgH3H0z2E
 /ZC4LKrtxgaPFJzKUT4F0kLK85Hw5GzMEKK42NIhAB0o5vFwmEzxOtnlHOyEufAm
 EDIZDIfxV2J9Qx/cW2DSojPj/o9O6noXwhw9SBqMwiDWd8gXmNgOUEklAO7aR7Vq
 Ne2N2FLMNthZydCoHR6dAEjfe2ceFXP5cALwzQO+ILDdZQ0UcF2Yq4yw/gEDoCrB
 FH7mmE7UaQQHvYzo85VTZu7XfUys1P7kUcnhVurOg7/07ITnvnQR+itKZXC+bSft
 1K7ULtjd2QiCgxZA/apFc3lO46kqHVFsB3onRQw12/Ku5vfGFfY0L0iK97OgM4s0
 0u4r+J7A+MM5YBJVVjwZ6woYO5CWMHYKBZepxOpcvftPxj1LNkiHsryqKILGISEC
 aIY/lI0hpeNU4QshDMXzSTgeb/VF9O5cGPncTPkOFbXxD4RpVyz8tSngsG1+D8lj
 S6B2h3k4A14rnblLOxP22jcedBlTYQcRQS4vwr0a7+63QTjfSJ12xT3ucIAKU9f7
 65rVSS/igbrfxqHDmrd60WWZBMXeK0Zy7YIG6iYPTxpP31eFpSp9wtDlV7V2+EH2
 F2p+TJY8aTA8UW+2L5gigN3RsBeeEB8zxJkB14ivICM7+XzVu11PxPDqjDZYkfzC
 ueKKvCcHhHAYqQ==
 =TFBA
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove our now never-true definitions for pgd_huge() and p4d_leaf().

 - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.

 - Add support for syscall wrappers.

 - Add support for KFENCE on 64-bit.

 - Update 64-bit HV KVM to use the new guest state entry/exit accounting
   API.

 - Support execute-only memory when using the Radix MMU (P9 or later).

 - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.

 - Updates to our linker script to move more data into read-only
   sections.

 - Allow the VDSO to be randomised on 32-bit.

 - Many other small features and fixes.

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.

* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  KVM: PPC: Book3S HV: Fix stack frame regs marker
  powerpc: Don't add __powerpc_ prefix to syscall entry points
  powerpc/64s/interrupt: Fix stack frame regs marker
  powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
  powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
  powerpc/pseries: Add firmware details to the hardware description
  powerpc/powernv: Add opal details to the hardware description
  powerpc: Add device-tree model to the hardware description
  powerpc/64: Add logical PVR to the hardware description
  powerpc: Add PVR & CPU name to hardware description
  powerpc: Add hardware description string
  powerpc/configs: Enable PPC_UV in powernv_defconfig
  powerpc/configs: Update config files for removed/renamed symbols
  powerpc/mm: Fix UBSAN warning reported on hugetlb
  powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
  powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
  powerpc: Drops STABS_DEBUG from linker scripts
  powerpc/64s: Remove lost/old comment
  powerpc/64s: Remove old STAB comment
  powerpc: remove orphan systbl_chk.sh
  ...
2022-10-09 14:05:15 -07:00
Linus Torvalds ef688f8b8c The first batch of KVM patches, mostly covering x86, which I
am sending out early due to me travelling next week.  There is a
 lone mm patch for which Andrew gave an informal ack at
 https://lore.kernel.org/linux-mm/20220817102500.440c6d0a3fce296fdf91bea6@linux-foundation.org.
 
 I will send the bulk of ARM work, as well as other
 architectures, at the end of next week.
 
 ARM:
 
 * Account stage2 page table allocations in memory stats.
 
 x86:
 
 * Account EPT/NPT arm64 page table allocations in memory stats.
 
 * Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR accesses.
 
 * Drop eVMCS controls filtering for KVM on Hyper-V, all known versions of
   Hyper-V now support eVMCS fields associated with features that are
   enumerated to the guest.
 
 * Use KVM's sanitized VMCS config as the basis for the values of nested VMX
   capabilities MSRs.
 
 * A myriad event/exception fixes and cleanups.  Most notably, pending
   exceptions morph into VM-Exits earlier, as soon as the exception is
   queued, instead of waiting until the next vmentry.  This fixed
   a longstanding issue where the exceptions would incorrecly become
   double-faults instead of triggering a vmexit; the common case of
   page-fault vmexits had a special workaround, but now it's fixed
   for good.
 
 * A handful of fixes for memory leaks in error paths.
 
 * Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow.
 
 * Never write to memory from non-sleepable kvm_vcpu_check_block()
 
 * Selftests refinements and cleanups.
 
 * Misc typo cleanups.
 
 Generic:
 
 * remove KVM_REQ_UNHALT
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmM2zwcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNpbwf+MlVeOlzE5SBdrJ0TEnLmKUel1lSz
 QnZzP5+D65oD0zhCilUZHcg6G4mzZ5SdVVOvrGJvA0eXh25ruLNMF6jbaABkMLk/
 FfI1ybN7A82hwJn/aXMI/sUurWv4Jteaad20JC2DytBCnsW8jUqc49gtXHS2QWy4
 3uMsFdpdTAg4zdJKgEUfXBmQviweVpjjl3ziRyZZ7yaeo1oP7XZ8LaE1nR2l5m0J
 mfjzneNm5QAnueypOh5KhSwIvqf6WHIVm/rIHDJ1HIFbgfOU0dT27nhb1tmPwAcE
 +cJnnMUHjZqtCXteHkAxMClyRq0zsEoKk0OGvSOOMoq3Q0DavSXUNANOig==
 =/hqX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The first batch of KVM patches, mostly covering x86.

  ARM:

   - Account stage2 page table allocations in memory stats

  x86:

   - Account EPT/NPT arm64 page table allocations in memory stats

   - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
     accesses

   - Drop eVMCS controls filtering for KVM on Hyper-V, all known
     versions of Hyper-V now support eVMCS fields associated with
     features that are enumerated to the guest

   - Use KVM's sanitized VMCS config as the basis for the values of
     nested VMX capabilities MSRs

   - A myriad event/exception fixes and cleanups. Most notably, pending
     exceptions morph into VM-Exits earlier, as soon as the exception is
     queued, instead of waiting until the next vmentry. This fixed a
     longstanding issue where the exceptions would incorrecly become
     double-faults instead of triggering a vmexit; the common case of
     page-fault vmexits had a special workaround, but now it's fixed for
     good

   - A handful of fixes for memory leaks in error paths

   - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow

   - Never write to memory from non-sleepable kvm_vcpu_check_block()

   - Selftests refinements and cleanups

   - Misc typo cleanups

  Generic:

   - remove KVM_REQ_UNHALT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  KVM: remove KVM_REQ_UNHALT
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: make vendor code check for all nested events
  mailmap: Update Oliver's email address
  KVM: x86: Allow force_emulation_prefix to be written without a reload
  KVM: selftests: Add an x86-only test to verify nested exception queueing
  KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
  KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
  KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
  KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
  KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
  ...
2022-10-09 09:39:55 -07:00
Linus Torvalds 6181073dd6 TTY/Serial driver update for 6.1-rc1
Here is the big set of TTY and Serial driver updates for 6.1-rc1.
 
 Lots of cleanups in here, no real new functionality this time around,
 with the diffstat being that we removed more lines than we added!
 
 Included in here are:
 	- termios unification cleanups from Al Viro, it's nice to
 	  finally get this work done
 	- tty serial transmit cleanups in various drivers in preparation
 	  for more cleanup and unification in future releases (that work
 	  was not ready for this release.)
 	- n_gsm fixes and updates
 	- ktermios cleanups and code reductions
 	- dt bindings json conversions and updates for new devices
 	- some serial driver updates for new devices
 	- lots of other tiny cleanups and janitorial stuff.  Full
 	  details in the shortlog.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY0BSdA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylucQCfaXIrYuh2AHcb6+G+Nqp1xD2BYaEAoIdLyOCA
 a2yziLrDF6us2oav6j4x
 =Wv+X
 -----END PGP SIGNATURE-----

Merge tag 'tty-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big set of TTY and Serial driver updates for 6.1-rc1.

  Lots of cleanups in here, no real new functionality this time around,
  with the diffstat being that we removed more lines than we added!

  Included in here are:

   - termios unification cleanups from Al Viro, it's nice to finally get
     this work done

   - tty serial transmit cleanups in various drivers in preparation for
     more cleanup and unification in future releases (that work was not
     ready for this release)

   - n_gsm fixes and updates

   - ktermios cleanups and code reductions

   - dt bindings json conversions and updates for new devices

   - some serial driver updates for new devices

   - lots of other tiny cleanups and janitorial stuff. Full details in
     the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (102 commits)
  serial: cpm_uart: Don't request IRQ too early for console port
  tty: serial: do unlock on a common path in altera_jtaguart_console_putc()
  tty: serial: unify TX space reads under altera_jtaguart_tx_space()
  tty: serial: use FIELD_GET() in lqasc_tx_ready()
  tty: serial: extend lqasc_tx_ready() to lqasc_console_putchar()
  tty: serial: allow pxa.c to be COMPILE_TESTed
  serial: stm32: Fix unused-variable warning
  tty: serial: atmel: Add COMMON_CLK dependency to SERIAL_ATMEL
  serial: 8250: Fix restoring termios speed after suspend
  serial: Deassert Transmit Enable on probe in driver-specific way
  serial: 8250_dma: Convert to use uart_xmit_advance()
  serial: 8250_omap: Convert to use uart_xmit_advance()
  MAINTAINERS: Solve warning regarding inexistent atmel-usart binding
  serial: stm32: Deassert Transmit Enable on ->rs485_config()
  serial: ar933x: Deassert Transmit Enable on ->rs485_config()
  tty: serial: atmel: Use FIELD_PREP/FIELD_GET
  tty: serial: atmel: Make the driver aware of the existence of GCLK
  tty: serial: atmel: Only divide Clock Divisor if the IP is USART
  tty: serial: atmel: Separate mode clearing between UART and USART
  dt-bindings: serial: atmel,at91-usart: Add gclk as a possible USART clock
  ...
2022-10-07 16:36:24 -07:00
Nicholas Piggin 376b3275c1 KVM: PPC: Book3S HV: Fix stack frame regs marker
The hard-coded marker is out of date now, fix it using the nice define.

Fixes: 17773afdcd ("powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER")
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006143345.129077-1-npiggin@gmail.com
2022-10-07 21:30:25 +11:00
Linus Torvalds 4c0ed7d8d6 whack-a-mole: constifying struct path *
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxmRQAKCRBZ7Krx/gZQ
 6+/kAQD2xyf+i4zOYVBr1NB3qBbhVS1zrni1NbC/kT3dJPgTvwEA7z7eqwnrN4zg
 scKFP8a3yPoaQBfs4do5PolhuSr2ngA=
 =NBI+
 -----END PGP SIGNATURE-----

Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs constification updates from Al Viro:
 "whack-a-mole: constifying struct path *"

* tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs: constify path
  spufs: constify path
  nd_jump_link(): constify path
  audit_init_parent(): constify path
  __io_setxattr(): constify path
  do_proc_readlink(): constify path
  overlayfs: constify path
  fs/notify: constify path
  may_linkat(): constify path
  do_sys_name_to_handle(): constify path
  ->getprocattr(): attribute name is const char *, TYVM...
2022-10-06 17:31:02 -07:00
Michael Ellerman 9474689020 powerpc: Don't add __powerpc_ prefix to syscall entry points
When using syscall wrappers the __SYSCALL_DEFINEx() and related macros
add a "__powerpc_" prefix to all syscall entry points.

So for example sys_mmap becomes __powerpc_sys_mmap.

This risks breaking workflows and tools that expect the old naming
scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no
longer works.

There seems to be no compelling reason to add the "__powerpc_" prefix,
other than that it follows what some other arches do (x86, arm64, s390).

But unlike other arches powerpc doesn't always enable syscall wrappers,
so the syscall entry points can change name depending on CONFIG options.

For those reasons drop the "__powerpc_" prefix, reverting to the
existing naming.

Doing so reveals two prototypes in signal.h that have the incorrect type
when syscall wrappers are enabled. There are already prototypes for both
functions in syscalls.h, so drop the ones from signal.h.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
2022-10-07 00:59:54 +11:00
Linus Torvalds b86406d42a * 'remove' callback converted to return void. Big change with trivial
fixes all over the tree. Other subsystems depending on this change
   have been asked to pull an immutable topic branch for this.
 * new driver for Microchip PCI1xxxx switch
 * heavy refactoring of the Mellanox BlueField driver
 * we prefer async probe in the i801 driver now
 * the rest is usual driver updates (support for more SoCs, some
   refactoring, some feature additions)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmM7T3IACgkQFA3kzBSg
 KbYnAxAAn2SXzpUuuJ05hhk/y89RWHhzSilU+7d+egYfQJlbXUl2WzYx/Wu1BSZM
 ciyXuJFIiTywdUiX1r1VeMO80zmQQZXAUG7VygAtOSk7iPSd/qTyL+7J+k1DXADI
 hGR+pZLBVfTFyY3d1qHnwKFkzByvQjc2raARv9g7kDxkSQa8xI/sXScmhGYtrLch
 DUYUK1F3Sdqbk0FsudJ5Jvd7bZCSS+n+jSR+mrZaOXbkUD4JmDUauW8pAS6UI9in
 CxnjZoOLMHdAmC9ADanLeDRXxKz23uNU/9vdZ1/DMYnNsF/TnyWl6Rz/3BFE3YFk
 Vq7A1XAK4b3oJAgM92mdvKSkmzBIzkmj02vaVyuNPtRgHZo5MsIcEnWiBhymZY5g
 W6BPrjt/8YKRKeNlP/nrZmageklepsXZbUrNQt1ws8i4bbT+CKInKbjKLnBfDgVz
 5VSd8M9+y2Jd/JaJhMt9TBNmP0W2RrThxLF06Hux1ue7k4maE7Eljvkzcd4GJ6Un
 HYePZMhwCx3aeYsFmFT/V3kHFsfyHUlIFy/vgXTEICsKUpyj/dX96ANWhe+tJdcX
 Cknmc+XOVGPm0LPPju4M8WScMjSqNODm1yfDWUe2cRKlxzI45v6x4Oxl8rWD9hb4
 KKMGXit0LOtWETlHALffwFCifs6DdaaA0IMUtMQUj8egvys0enE=
 =arni
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:

 - 'remove' callback converted to return void. Big change with trivial
   fixes all over the tree. Other subsystems depending on this change
   have been asked to pull an immutable topic branch for this.

 - new driver for Microchip PCI1xxxx switch

 - heavy refactoring of the Mellanox BlueField driver

 - we prefer async probe in the i801 driver now

 - the rest is usual driver updates (support for more SoCs, some
   refactoring, some feature additions)

* tag 'i2c-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (37 commits)
  i2c: pci1xxxx: prevent signed integer overflow
  i2c: acpi: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
  i2c: i801: Prefer async probe
  i2c: designware-pci: Use standard pattern for memory allocation
  i2c: designware-pci: Group AMD NAVI quirk parts together
  i2c: microchip: pci1xxxx: Add driver for I2C host controller in multifunction endpoint of pci1xxxx switch
  docs: i2c: slave-interface: return errno when handle I2C_SLAVE_WRITE_REQUESTED
  i2c: mlxbf: remove device tree support
  i2c: mlxbf: support BlueField-3 SoC
  i2c: cadence: Add standard bus recovery support
  i2c: mlxbf: add multi slave functionality
  i2c: mlxbf: support lock mechanism
  macintosh/ams: Adapt declaration of ams_i2c_remove() to earlier change
  i2c: riic: Use devm_platform_ioremap_resource()
  i2c: mlxbf: remove IRQF_ONESHOT
  dt-bindings: i2c: rockchip: add rockchip,rk3128-i2c
  dt-bindings: i2c: renesas,rcar-i2c: Add r8a779g0 support
  i2c: tegra: Add GPCDMA support
  i2c: scmi: Convert to be a platform driver
  i2c: rk3x: Add rv1126 support
  ...
2022-10-04 18:54:33 -07:00
Nicholas Piggin b2e82e495a powerpc/64s/interrupt: Fix stack frame regs marker
The value of the stack frame regs marker that gets saved on the stack in
interrupt entry code does not match the regs marker value, which breaks
stack frame marker matching.

This stray instruction looks to have been introduced in a mismerge.

Fixes: bf75a3258a ("powerpc/64s/interrupt: move early boot ILE fixup into a macro")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Mismerge by yours truly -_-]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004132952.984341-1-npiggin@gmail.com
2022-10-05 11:09:22 +11:00
Nicholas Piggin 0fa6831811 powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
irq soft-masking means that when Linux irqs are disabled, the MSR[EE]
value can change from 1 to 0 asynchronously: if a masked interrupt of
the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled,
the masked handler will return with MSR[EE]=0.

This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can
be called with local irqs disabled, unless a hard_irq_disable has been
done.

Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
2022-10-04 23:16:20 +11:00
Nicholas Piggin 8154850b28 powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
This new assertion added is generally harmless and gets fixed up
naturally, but it does indicate a problem with MSR manipulation
somewhere.

Fixes: c39fb71a54 ("powerpc/64s/interrupt: masked handler debug check for previous hard disable")
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-1-npiggin@gmail.com
2022-10-04 23:16:20 +11:00
Johannes Weiner e55b9f9686 mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
Since 2d1c498072 ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:36 -07:00
Masahiro Yamada ce697ccee1 kbuild: remove head-y syntax
Kbuild puts the objects listed in head-y at the head of vmlinux.
Conventionally, we do this for head*.S, which contains the kernel entry
point.

A counter approach is to control the section order by the linker script.
Actually, the code marked as __HEAD goes into the ".head.text" section,
which is placed before the normal ".text" section.

I do not know if both of them are needed. From the build system
perspective, head-y is not mandatory. If you can achieve the proper code
placement by the linker script only, it would be cleaner.

I collected the current head-y objects into head-object-list.txt. It is
a whitelist. My hope is it will be reduced in the long run.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-10-02 18:06:03 +09:00
Masahiro Yamada 3216484550 kbuild: use obj-y instead extra-y for objects placed at the head
The objects placed at the head of vmlinux need special treatments:

 - arch/$(SRCARCH)/Makefile adds them to head-y in order to place
   them before other archives in the linker command line.

 - arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of
   obj-y to avoid them going into built-in.a.

This commit gets rid of the latter.

Create vmlinux.a to collect all the objects that are unconditionally
linked to vmlinux. The objects listed in head-y are moved to the head
of vmlinux.a by using 'ar m'.

With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y
for builtin objects.

There is no *.o that is directly linked to vmlinux. Drop unneeded code
in scripts/clang-tools/gen_compile_commands.py.

$(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested
by Nathan Chancellor [1].

[1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2022-10-02 18:04:05 +09:00
Michael Ellerman 8535a1afff powerpc/pseries: Add firmware details to the hardware description
Add firmware version details to the hardware description, which is
printed at boot and in case of an oops.

Use /hypervisor if we find it, though currently it only exists if we're
running under qemu.

Look for "ibm,powervm-partition" which is specified in PAPR+ v2.11 and
tells us we're running under PowerVM.

Failing that look for "ibm,fw-net-version" which is seen on PowerVM
going back to at least Power6.

eg: Hardware name: ... of:IBM,FW860.42 (SV860_138) hv:phyp

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-6-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman 37576cb096 powerpc/powernv: Add opal details to the hardware description
Add OPAL version details to the hardware description, which is printed
at boot and in case of an oops.

eg: Hardware name: ... opal:v6.2

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-5-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman 5412297079 powerpc: Add device-tree model to the hardware description
Add the model of the machine we're on to the hardware description, which
is printed at boot and in case of an oops.

eg: Hardware name: IBM,8247-22L

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-4-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman 48b7019b6a powerpc/64: Add logical PVR to the hardware description
If we detect a logical PVR add that to the hardware description, which
is printed at boot and in case of an oops.

eg: Hardware name: ... 0xf000004

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-3-mpe@ellerman.id.au
2022-09-30 18:35:53 +10:00
Michael Ellerman bd649d40e0 powerpc: Add PVR & CPU name to hardware description
Add the PVR and CPU name to the hardware description, which is printed
at boot and in case of an oops.

eg: Hardware name: ... POWER8E (raw) 0x4b0201

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-2-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman 41dc056391 powerpc: Add hardware description string
Create a hardware description string, which we will use to record
various details of the hardware platform we are running on.

Print the accumulated description at boot, and use it to set the generic
description which is printed in oopses.

To begin with add ppc_md.name, aka the "machine description".

Example output at boot with the full series applied:

  Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries
  printk: bootconsole [udbg0] enabled

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman d91c3f15fc powerpc/configs: Enable PPC_UV in powernv_defconfig
Make sure the ultravisor code at least gets some build testing by
enabling it in powernv_defconfig.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220929051517.1903079-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Lukas Bulwahn d210ee3fdf powerpc/configs: Update config files for removed/renamed symbols
Clean up config files by:
  - removing configs that were deleted in the past
  - removing configs not in tree and without recently pending patches
  - adding new configs that are replacements for old configs in the file

For some detailed information, see:
  https://lore.kernel.org/kernel-janitors/20220929090645.1389-1-lukas.bulwahn@gmail.com/

Renamed:
  - CONFIG_PPC_PTDUMP -> CONFIG_GENERIC_PTDUMP
    e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")

Removed:
  - CONFIG_BLK_DEV_CRYPTOLOOP
    47e9624616 ("block: remove support for cryptoloop and the xor transfer")

  - CONFIG_CRYPTO_RMD128
    b21b9a5e0a ("crypto: rmd128 - remove RIPE-MD 128 hash algorithm")

  - CONFIG_CRYPTO_RMD256
    c15d4167f0 ("crypto: rmd256 - remove RIPE-MD 256 hash algorithm")

  - CONFIG_CRYPTO_RMD320
    93f6420292 ("crypto: rmd320 - remove RIPE-MD 320 hash algorithm")

  - CONFIG_CRYPTO_SALSA20
    663f63ee6d ("crypto: salsa20 - remove Salsa20 stream cipher algorithm")

  - CONFIG_CRYPTO_TGR192
    87cd723f89 ("crypto: tgr192 - remove Tiger 128/160/192 hash algorithms")

  - CONFIG_HARDENED_USERCOPY_PAGESPAN
    1109a5d907 ("usercopy: Remove HARDENED_USERCOPY_PAGESPAN")

  - CONFIG_RAPIDIO_TSI568, CONFIG_RAPIDIO_TSI57X
    612d490419 ("rapidio: remove not used code about RIO_VID_TUNDRA")

  - CONFIG_RAW_DRIVER
    603e4922f1 ("remove the raw driver")

  - CONFIG_ROCKETPORT
    3b00b6af7a ("tty: rocket, remove the driver")

  - CONFIG_ENABLE_MUST_CHECK
    1967939462 ("Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK")

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
[mpe: Add documentation of relevant commit for each symbol change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220929101502.32527-1-lukas.bulwahn@gmail.com
2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V 7dd3a7b90b powerpc/mm: Fix UBSAN warning reported on hugetlb
Powerpc architecture supports 16GB hugetlb pages with hash translation.
For 4K page size, this is implemented as a hugepage directory entry at
PGD level and for 64K it is implemented as a huge page pte at PUD level

With 16GB hugetlb size, offset within a page is greater than 32 bits.
Hence switch to use unsigned long type when using hugepd_shift.

In order to keep things simpler, we make sure we always use unsigned
long type when using hugepd_shift() even though all the hugetlb page
size won't require that.

The hugetlb_free_p*d_range changes are all related to nohash usage where
we can have multiple pgd entries pointing to the same hugepd entries.
Hence on book3s64 where we can have > 4GB hugetlb page size we will
always find more < next even if we compute the value of more correctly.

Hence there is no functional change in this patch except that it fixes
the below warning.

  UBSAN: shift-out-of-bounds in arch/powerpc/mm/hugetlbpage.c:499:21
  shift exponent 34 is too large for 32-bit type 'int'
  CPU: 39 PID: 1673 Comm: a.out Not tainted 6.0.0-rc2-00327-gee88a56e8517-dirty #1
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    ubsan_epilogue+0x18/0x70
    __ubsan_handle_shift_out_of_bounds+0x1bc/0x390
    hugetlb_free_pgd_range+0x5d8/0x600
    free_pgtables+0x114/0x290
    exit_mmap+0x150/0x550
    mmput+0xcc/0x210
    do_exit+0x420/0xdd0
    do_group_exit+0x4c/0xd0
    sys_exit_group+0x24/0x30
    system_call_exception+0x250/0x600
    system_call_common+0xec/0x250

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop generic change to be sent separately, change 1ULL to 1UL]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908072440.258301-1-aneesh.kumar@linux.ibm.com
2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V 7b31f7dadd powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
For both CONFIG_NUMA enabled/disabled use mem_topology_setup() to
update max/min_low_pfn.

This also adds min_low_pfn update to CONFIG_NUMA which was initialized
to zero before. (mpe: Though MEMORY_START is == 0 for PPC64=y which is
all possible NUMA=y systems)

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220704063851.295482-1-aneesh.kumar@linux.ibm.com
2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V d368e0c478 powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
This function does the hash page table update. Hence rename it to
indicate this better to avoid confusion with flush_pmd_tlb_range()

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop unnecessary extern]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220907081941.209501-1-aneesh.kumar@linux.ibm.com
2022-09-30 18:35:52 +10:00
Michael Ellerman 7673335e2a powerpc: Drops STABS_DEBUG from linker scripts
No toolchain we support should be generating stabs debug information
anymore. Drop the sections entirely from our linker scripts.

We removed all the manual stabs annotations in commit
1231816373 ("powerpc/32: Remove remaining .stabs annotations").

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130951.1732983-1-mpe@ellerman.id.au
2022-09-30 18:35:52 +10:00
Michael Ellerman 0c36099642 powerpc/64s: Remove lost/old comment
The bulk of this was moved/reworded in:
  57f266497d ("powerpc: Use gas sections for arranging exception vectors")

And now appears around line 700 in arch/powerpc/kernel/exceptions-64s.S.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130941.1732818-1-mpe@ellerman.id.au
2022-09-30 18:35:51 +10:00
Michael Ellerman 57a8e4b26e powerpc/64s: Remove old STAB comment
This used to be about the 0x4300 handler, but that was moved in commit
da2bc4644c ("powerpc/64s: Add new exception vector macros").

Note that "STAB" here refers to "Segment Table" not the debug format.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220928130912.1732466-1-mpe@ellerman.id.au
2022-09-30 18:35:51 +10:00
Nicholas Piggin a08661af4c powerpc: remove orphan systbl_chk.sh
arch/powerpc/kernel/systbl_chk.sh has not been referenced since commit
ab66dcc76d ("powerpc: generate uapi header and system call table
files"). Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220929032120.3592593-1-npiggin@gmail.com
2022-09-30 18:35:51 +10:00
Haren Myneni f3e5d9e53e powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL
Generally the hypervisor decides to allocate a window on different
VAS instances. But if user space wishes to allocate on the current VAS
instance where the process is executing, the kernel has to pass
associativity domain IDs to allocate VAS window HCALL.

To determine the associativity domain IDs for the current CPU,
smp_processor_id() is passed to node associativity HCALL which may
return H_P2 (-55) error during DLPAR CPU event. This is because Linux
CPU numbers (smp_processor_id()) are not the same as the hypervisor's
view of CPU numbers.

Fix the issue by passing hard_smp_processor_id() with
VPHN_FLAG_VCPU flag (PAPR 14.11.6.1 H_HOME_NODE_ASSOCIATIVITY).

Fixes: b22f2d88e4 ("powerpc/pseries/vas: Integrate API with open/close windows")
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Update change log to mention Linux vs HV CPU numbers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/55380253ea0c11341824cd4c0fc6bbcfc5752689.camel@linux.ibm.com
2022-09-30 18:35:51 +10:00
Nicholas Piggin e4335f5319 KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA
PAPR specifies accumulated virtual processor wait intervals that relate
to partition scheduling interval times. Implement these counters in the
same way as they are repoted by dtl.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908132545.4085849-5-npiggin@gmail.com
2022-09-30 18:35:38 +10:00
Michael Ellerman 9511b5a033 Merge branch 'topic/ppc-kvm' into next
Merge some KVM commits we are keeping in our topic branch.
2022-09-30 18:35:16 +10:00
Jakub Kicinski accc3b4a57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29 14:30:51 -07:00
Peter Zijlstra a1ebcd5943 Linux 6.0-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMwwY4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGdlwH/0ESzdb6F9zYWwHR
 E08har56/IfwjOsn1y+JuHibpwUjzskLzdwIfI5zshSZAQTj5/UyC0P7G/wcYh/Z
 INh1uHGazmDUkx4O3lwuWLR+mmeUxZRWdq4NTwYDRNPMSiPInVxz+cZJ7y0aPr2e
 wii7kMFRHgXmX5DMDEwuHzehsJF7vZrp8zBu2DqzVUGnbwD50nPbyMM3H4g9mute
 fAEpDG0X3+smqMaKL+2rK0W/Av/87r3U8ZAztBem3nsCJ9jT7hqMO1ICcKmFMviA
 DTERRMwWjPq+mBPE2CiuhdaXvNZBW85Ds81mSddS6MsO6+Tvuzfzik/zSLQJxlBi
 vIqYphY=
 =NqG+
 -----END PGP SIGNATURE-----

Merge branch 'v6.0-rc7'

Merge upstream to get RAPTORLAKE_S

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-29 12:20:50 +02:00
Masahiro Yamada 2df8220cc5 kbuild: build init/built-in.a just once
Kbuild builds init/built-in.a twice; first during the ordinary
directory descending, second from scripts/link-vmlinux.sh.

We do this because UTS_VERSION contains the build version and the
timestamp. We cannot update it during the normal directory traversal
since we do not yet know if we need to update vmlinux. UTS_VERSION is
temporarily calculated, but omitted from the update check. Otherwise,
vmlinux would be rebuilt every time.

When Kbuild results in running link-vmlinux.sh, it increments the
version number in the .version file and takes the timestamp at that
time to really fix UTS_VERSION.

However, updating the same file twice is a footgun. To avoid nasty
timestamp issues, all build artifacts that depend on init/built-in.a
are atomically generated in link-vmlinux.sh, where some of them do not
need rebuilding.

To fix this issue, this commit changes as follows:

[1] Split UTS_VERSION out to include/generated/utsversion.h from
    include/generated/compile.h

    include/generated/utsversion.h is generated just before the
    vmlinux link. It is generated under include/generated/ because
    some decompressors (s390, x86) use UTS_VERSION.

[2] Split init_uts_ns and linux_banner out to init/version-timestamp.c
    from init/version.c

    init_uts_ns and linux_banner contain UTS_VERSION. During the ordinary
    directory descending, they are compiled with __weak and used to
    determine if vmlinux needs relinking. Just before the vmlinux link,
    they are compiled without __weak to embed the real version and
    timestamp.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-09-29 04:40:15 +09:00
Haren Myneni 335e1a9104 powerpc: Ignore DSI error caused by the copy/paste instruction
The data storage interrupt (DSI) error will be generated when the
paste operation is issued on the suspended Nest Accelerator (NX)
window due to NX state changes. The hypervisor expects the
partition to ignore this error during page fault handling.
To differentiate DSI caused by an actual HW configuration or by
the NX window, a new “ibm,pi-features” type value is defined.
Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
indicate this DSI error. If this error is not ignored, the user
space can get SIGBUS when the NX request is issued.

This patch adds changes to read ibm,pi-features property and ignore
DSI error during page fault handling if MMU_FTR_NX_DSI is defined.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Mention PAPR version in comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com
2022-09-28 22:52:32 +10:00
Michael Ellerman 19c95df127 powerpc: Reverse stack frame marker on little endian
On little endian the stack frame marker appears reversed when dumping
memory sequentially, as is typical in xmon or gdb, eg:

  c000000004733e40 0000000000000000 0000000000000000  |................|
  c000000004733e50 0000000000000000 0000000000000000  |................|
  c000000004733e60 0000000000000000 0000000000000000  |................|
  c000000004733e70 5347455200000000 0000000000000000  |SGER............|
  c000000004733e80 a700000000000000 708897f7ff7f0000  |........p.......|
  c000000004733e90 0073428fff7f0000 208997f7ff7f0000  |.sB..... .......|
  c000000004733ea0 0100000000000000 ffffffffffffffff  |................|
  c000000004733eb0 0000000000000000 0000000000000000  |................|

To make it easier to recognise, reverse the value on little endian, so
it always appears as "REGS", eg:

  c000000004733e70 5245475300000000 0000000000000000  |REGS............|

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-2-mpe@ellerman.id.au
2022-09-28 22:21:17 +10:00
Michael Ellerman bbd7170908 powerpc: Make stack frame marker upper case
Now that the stack frame regs marker is only 32-bits it is not as
obvious in memory dumps and easier to miss, eg:

  c000000004733e40 0000000000000000 0000000000000000  |................|
  c000000004733e50 0000000000000000 0000000000000000  |................|
  c000000004733e60 0000000000000000 0000000000000000  |................|
  c000000004733e70 7367657200000000 0000000000000000  |sger............|
  c000000004733e80 a700000000000000 708897f7ff7f0000  |........p.......|
  c000000004733e90 0073428fff7f0000 208997f7ff7f0000  |.sB..... .......|
  c000000004733ea0 0100000000000000 ffffffffffffffff  |................|
  c000000004733eb0 0000000000000000 0000000000000000  |................|

So make it upper case to make it stand out a bit more:

  c000000004733e70 5347455200000000 0000000000000000  |SGER............|

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-1-mpe@ellerman.id.au
2022-09-28 22:21:11 +10:00
Li Huafei 97f88a3d72 powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe()
I found a null pointer reference in arch_prepare_kprobe():

  # echo 'p cmdline_proc_show' > kprobe_events
  # echo 'p cmdline_proc_show+16' >> kprobe_events
  Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
  BUG: Kernel NULL pointer dereference on read at 0x00000000
  Faulting instruction address: 0xc000000000050bfc
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 122 Comm: sh Not tainted 6.0.0-rc3-00007-gdcf8e5633e2e #10
  NIP:  c000000000050bfc LR: c000000000050bec CTR: 0000000000005bdc
  REGS: c0000000348475b0 TRAP: 0300   Not tainted  (6.0.0-rc3-00007-gdcf8e5633e2e)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 88002444  XER: 20040006
  CFAR: c00000000022d100 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
  ...
  NIP arch_prepare_kprobe+0x10c/0x2d0
  LR  arch_prepare_kprobe+0xfc/0x2d0
  Call Trace:
    0xc0000000012f77a0 (unreliable)
    register_kprobe+0x3c0/0x7a0
    __register_trace_kprobe+0x140/0x1a0
    __trace_kprobe_create+0x794/0x1040
    trace_probe_create+0xc4/0xe0
    create_or_delete_trace_kprobe+0x2c/0x80
    trace_parse_run_command+0xf0/0x210
    probes_write+0x20/0x40
    vfs_write+0xfc/0x450
    ksys_write+0x84/0x140
    system_call_exception+0x17c/0x3a0
    system_call_vectored_common+0xe8/0x278
  --- interrupt: 3000 at 0x7fffa5682de0
  NIP:  00007fffa5682de0 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c000000034847e80 TRAP: 3000   Not tainted  (6.0.0-rc3-00007-gdcf8e5633e2e)
  MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44002408  XER: 00000000

The address being probed has some special:

  cmdline_proc_show: Probe based on ftrace
  cmdline_proc_show+16: Probe for the next instruction at the ftrace location

The ftrace-based kprobe does not generate kprobe::ainsn::insn, it gets
set to NULL. In arch_prepare_kprobe() it will check for:

  ...
  prev = get_kprobe(p->addr - 1);
  preempt_enable_no_resched();
  if (prev && ppc_inst_prefixed(ppc_inst_read(prev->ainsn.insn))) {
  ...

If prev is based on ftrace, 'ppc_inst_read(prev->ainsn.insn)' will occur
with a null pointer reference. At this point prev->addr will not be a
prefixed instruction, so the check can be skipped.

Check if prev is ftrace-based kprobe before reading 'prev->ainsn.insn'
to fix this problem.

Fixes: b4657f7650 ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
[mpe: Trim oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220923093253.177298-1-lihuafei1@huawei.com
2022-09-28 22:19:52 +10:00
ye xingchen 5e4952656b ocxl: Remove the unneeded result variable
Return the value opal_npu_spa_clear_cache() directly instead of storing
it in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220906072006.337099-1-ye.xingchen@zte.com.cn
2022-09-28 19:22:14 +10:00
ye xingchen 91986d7f03 powerpc/pseries/vas: Remove the unneeded result variable
Return the value vas_register_coproc_api() directly instead of storing it
in another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220825072657.229168-1-ye.xingchen@zte.com.cn
2022-09-28 19:22:14 +10:00
Nathan Lynch b37ac1894a powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
At boot time, it is not necessary to delay between polls of
cpu_callin_map when waiting for a kicked CPU to come up. Remove the
delay intervals, but preserve the overall deadline (five seconds).

At run time, the first poll result is usually negative and we incur a
sleeping wait. If we spin on the callin word for a short time first,
we can reduce __cpu_up() from dozens of milliseconds to under 1ms in
the common case on a P9 LPAR:

$ ppc64_cpu --smt=off
$ bpftrace -e 'kprobe:__cpu_up {
                 @start[tid] = nsecs;
               }
               kretprobe:__cpu_up /@start[tid]/ {
                 @us = hist((nsecs - @start[tid]) / 1000);
                 delete(@start[tid]);
               }' -c 'ppc64_cpu --smt=on'

Before:

@us:
[16K, 32K)        85 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32K, 64K)        13 |@@@@@@@                                             |

After:

@us:
[128, 256)        95 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)         3 |@                                                   |

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926220250.157022-1-nathanl@linux.ibm.com
2022-09-28 19:22:14 +10:00
Nathan Lynch b8f3e48834 powerpc/rtas: block error injection when locked down
The error injection facility on pseries VMs allows corruption of
arbitrary guest memory, potentially enabling a sufficiently privileged
user to disable lockdown or perform other modifications of the running
kernel via the rtas syscall.

Block the PAPR error injection facility from being opened or called
when locked down.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com
2022-09-28 19:22:14 +10:00
Nathan Lynch 99df7a2810 powerpc/pseries: block untrusted device tree changes when locked down
The /proc/powerpc/ofdt interface allows the root user to freely alter
the in-kernel device tree, enabling arbitrary physical address writes
via drivers that could bind to malicious device nodes, thus making it
possible to disable lockdown.

Historically this interface has been used on the pseries platform to
facilitate the runtime addition and removal of processor, memory, and
device resources (aka Dynamic Logical Partitioning or DLPAR). Years
ago, the processor and memory use cases were migrated to designs that
happen to be lockdown-friendly: device tree updates are communicated
directly to the kernel from firmware without passing through untrusted
user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
remains the sole legitimate user of /proc/powerpc/ofdt, but it is
already broken in lockdown since it uses /dev/mem to allocate argument
buffers for the rtas syscall. So only illegitimate uses of the
interface should see a behavior change when running on a locked down
kernel.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926131643.146502-2-nathanl@linux.ibm.com
2022-09-28 19:22:14 +10:00
Pali Rohár 6bd7ff497b powerpc/udbg: Remove extern function prototypes
'extern' keyword is pointless and deprecated for function prototypes.

Signed-off-by: Pali Rohár <pali@kernel.org>
Suggested-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220822231751.16973-1-pali@kernel.org
2022-09-28 19:22:14 +10:00
Pali Rohár 110a58b9f9 powerpc/boot: Explicitly disable usage of SPE instructions
uImage boot wrapper should not use SPE instructions, like kernel itself.
Boot wrapper has already disabled Altivec and VSX instructions but not SPE.
Options -mno-spe and -mspe=no already set when compilation of kernel, but
not when compiling uImage wrapper yet. Fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220827134454.17365-1-pali@kernel.org
2022-09-28 19:22:14 +10:00
Pali Rohár c102432005 powerpc: Include e500v1_power_isa.dtsi for remaining e500v1 platforms
There are still some board device tree files without Power ISA properties
which have Freescale e500v1 cores, namely those which are based on
Freescale mpc8540, mpc8541, mpc8555 and mpc8560 processors.

So include newly introduced e500v1_power_isa.dtsi file in devices tree
files with those processors.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902212103.22534-2-pali@kernel.org
2022-09-28 19:22:14 +10:00
Pali Rohár 37b9345ce7 powerpc: Fix SPE Power ISA properties for e500v1 platforms
Commit 2eb2800643 ("powerpc/e500v2: Add Power ISA properties to comply
with ePAPR 1.1") introduced new include file e500v2_power_isa.dtsi and
should have used it for all e500v2 platforms. But apparently it was used
also for e500v1 platforms mpc8540, mpc8541, mpc8555 and mpc8560.

e500v1 cores compared to e500v2 do not support double precision floating
point SPE instructions. Hence power-isa-sp.fd should not be set on e500v1
platforms, which is in e500v2_power_isa.dtsi include file.

Fix this issue by introducing a new e500v1_power_isa.dtsi include file and
use it in all e500v1 device tree files.

Fixes: 2eb2800643 ("powerpc/e500v2: Add Power ISA properties to comply with ePAPR 1.1")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220902212103.22534-1-pali@kernel.org
2022-09-28 19:22:13 +10:00
Athira Rajeev b9c001276d powerpc/perf: Fix branch_filter support for multiple filters
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type
ie branch filters are supported. The branch filters are requested via
event attribute "branch_sample_type". Multiple branch filters can be
passed in event attribute. eg:

  $ perf record -b -o- -B --branch-filter any,ind_call true

None of the Power PMUs support having multiple branch filters at
the same time. Branch filters for branch stack sampling is set via MMCRA
IFM bits [32:33]. But currently when requesting for multiple filter
types, the "perf record" command does not report any error.

eg:
  $ perf record -b -o- -B --branch-filter any,save_type true
  $ perf record -b -o- -B --branch-filter any,ind_call true

The "bhrb_filter_map" function in PMU driver code does the validity
check for supported branch filters. But this check is done for single
filter. Hence "perf record" will proceed here without reporting any
error.

Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch
filters are requested in the event attr.

After the fix:
  $ perf record --branch-filter any,ind_call -- ls
  Error:
  cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
  Try 'perf stat'

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel<disgoel@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
[mpe: Tweak comment and change log wording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin e1100cee05 powerpc/64s/interrupt: halt early boot interrupts if paca is not set up
Ensure r13 is zero from very early in boot until it gets set to the
boot paca pointer. This allows early program and mce handlers to halt
if there is no valid paca, rather than potentially run off into the
weeds. This preserves register and memory contents for low level
debugging tools.

Nothing could be printed to console at this point in any case because
even udbg is only set up after the boot paca is set, so this shouldn't
be missed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-6-npiggin@gmail.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin 519b2e317e powerpc/64: don't set boot CPU's r13 to paca until the structure is set up
The idea is to get to the point where if r13 is non-zero, then it should
contain a reasonable paca. This can be used in early boot program check
and machine check handlers to avoid running off into the weeds if they
hit before r13 has a paca.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-5-npiggin@gmail.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin b830c8754e powerpc/64: avoid using r13 in relocate
relocate() uses r13 in early boot before it is used for the paca. Use
a different register for this so r13 is kept unchanged until it is
set to the paca pointer.

Avoid r14 as well while we're here, there's no reason not to use the
volatile registers which is a bit less surprising, and r14 could be used
as another fixed reg one day.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-4-npiggin@gmail.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin 2f5182cffa powerpc/64s: early boot machine check handler
Use the early boot interrupt fixup in the machine check handler to allow
the machine check handler to run before interrupt endian is set up.
Branch to an early boot handler that just does a basic crash, which
allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot
CPU earlier, and the machine check stack is temporarily set to the
middle of the init task stack.

This allows machine checks (e.g., due to invalid data access in real
mode) to print something useful earlier in boot (as soon as udbg is set
up, if CONFIG_PPC_EARLY_DEBUG=y).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com
2022-09-28 19:22:13 +10:00
Nicholas Piggin bf75a3258a powerpc/64s/interrupt: move early boot ILE fixup into a macro
In preparation for using this sequence in machine check interrupt, move
it into a macro, with a small change to make it position independent.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-2-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 3569d84bb2 powerpc/64e: provide an addressing macro for use with TOC in alternate register
The interrupt entry code carefully saves a minimal number of registers,
so in some places the TOC is required, it is loaded into a different
register, so provide a macro that can supply an alternate TOC register.

This continues to use got addressing because TOC-relative results in
"got/toc optimization is not supported" messages by the linker. Having
r2 be one of the saved registers and using that for TOC addressing may
be the best way to avoid that and switch this to TOC addressing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 8e93fb33c8 powerpc/64: provide a helper macro to load r2 with the kernel TOC
A later change stops the kernel using r2 and loads it with a poison
value.  Provide a PACATOC loading abstraction which can hide this
detail.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 754f611774 powerpc/64: switch asm helpers from GOT to TOC relative addressing
There is no need to use GOT addressing within the kernel.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-4-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin dab3b8f4fd powerpc/64: asm use consistent global variable declaration and access
Use helper macros to access global variables, and place them in .data
sections rather than in .toc. Putting addresses in TOC is not required
because the kernel is linked with a single TOC.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-3-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 17773afdcd powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER
Using a 32-bit constant for this marker allows it to be loaded with
two ALU instructions, like 32-bit. This avoids a TOC entry and a
TOC load that depends on the r2 value that has just been loaded from
the PACA.

This changes the value for 32-bit as well, so both have the same
value in the low 4 bytes and 64-bit has 0 in the top bytes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-2-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 4b2a9315f2 powerpc/64s: POWER10 CPU Kconfig build option
This adds basic POWER10_CPU option, which builds with -mcpu=power10.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220923033004.536127-1-npiggin@gmail.com
2022-09-28 19:22:12 +10:00
Haren Myneni 465dda9d32 powerpc/pseries: Move vas_migration_handler early during migration
When the migration is initiated, the hypervisor changes VAS
mappings as part of pre-migration event. Then the OS gets the
migration event which closes all VAS windows before the migration
starts. NX generates continuous faults until windows are closed
and the user space can not differentiate these NX faults coming
from the actual migration. So to reduce this time window, close
VAS windows first in pseries_migrate_partition().

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8efade91dda831c9ed4abb226dab627da594c5f.camel@linux.ibm.com
2022-09-28 19:22:12 +10:00
Nicholas Piggin 1da5351f9e powerpc/64/irq: tidy soft-masked irq replay and improve documentation
irq replay is quite complicated because of softirq processing which
itself enables and disables irqs. Several considerations need to be
accounted for due to this, and they are not clearly documented.

Refactor the irq replay code a bit to tidy and deduplicate some common
functions. Add comments, debug checks.

This has a minor functional change that irq tracing enable/disable is
done after each interrupt replayed, rather than after a batch. It also
re-sets state to IRQS_ALL_DISABLED after an interrupt, which doesn't
matter much because interrupts are hard disabled at this point, but it
is more consistent with how interrupt handlers are called.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-8-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin f7bff6e775 powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry
BUG/WARN are handled with a program interrupt which can turn into an
infinite recursion when there are bugs in interrupt handler entry
(which can be irritated by bugs in other parts of the code).

There is one feeble attempt to avoid this recursion, but it misses
several cases. Make a tidier macro for this and switch most bugs in
the interrupt entry wrapper over to use it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-7-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin c39fb71a54 powerpc/64s/interrupt: masked handler debug check for previous hard disable
Prior changes eliminated cases of masked PACA_IRQ_MUST_HARD_MASK
interrupts that re-fire due to MSR[EE] being enabled while they are
pending. Add a debug check in the masked interrupt handler to catch
if this occurs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-6-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin 9524f2278f powerpc/64s: Fix irq state management in runlatch functions
When irqs are soft-disabled, MSR[EE] is volatile and can change from
1 to 0 asynchronously (if a PACA_IRQ_MUST_HARD_MASK interrupt hits).
So it can not be used to check hard IRQ enabled status, except to
confirm it is disabled.

ppc64_runlatch_on/off functions use MSR this way to decide whether to
re-enable MSR[EE] after disabling it, which leads to MSR[EE] being
enabled when it shouldn't be (when a PACA_IRQ_MUST_HARD_MASK had
disabled it between reading the MSR and clearing EE).

This has been tolerated in the kernel previously, and it doesn't seem
to cause a problem, but it is unexpected and may trip warnings or cause
other problems as we tighten up this state management. Fix this by only
re-enabling if PACA_IRQ_HARD_DIS is clear.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-5-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin e485f6c751 powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
If a synchronous interrupt (e.g., hash fault) is taken inside an
irqs-disabled region which has MSR[EE]=1, then an asynchronous interrupt
that is PACA_IRQ_MUST_HARD_MASK (e.g., PMI) is taken inside the
synchronous interrupt handler, then the synchronous interrupt will
return with MSR[EE]=1 and the asynchronous interrupt fires again.

If the asynchronous interrupt is a PMI and the original context does not
have PMIs disabled (only Linux IRQs), the asynchronous interrupt will
fire despite having the PMI marked soft pending. This can confuse the
perf code and cause warnings.

This patch changes the interrupt return so that irqs-disabled MSR[EE]=1
contexts will be returned to with MSR[EE]=0 if a PACA_IRQ_MUST_HARD_MASK
interrupt has become pending in the meantime.

The longer explanation for what happens:
1. local_irq_disable()
2. Hash fault interrupt fires, do_hash_fault handler runs
3. interrupt_enter_prepare() sets IRQS_ALL_DISABLED
4. interrupt_enter_prepare() sets MSR[EE]=1
5. PMU interrupt fires, masked handler runs
6. Masked handler marks PMI pending
7. Masked handler returns with PACA_IRQ_HARD_DIS set, MSR[EE]=0
8. do_hash_fault interrupt return handler runs
9. interrupt_exit_kernel_prepare() clears PACA_IRQ_HARD_DIS
10. interrupt returns with MSR[EE]=1
11. PMU interrupt fires, perf handler runs

Fixes: 4423eb5ae3 ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-4-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin 799f7063c7 powerpc/64: mark irqs hard disabled in boot paca
This prevents interrupts in early boot (e.g., program check) from
enabling MSR[EE], potentially causing endian mismatch or other
crashes when reporting early boot traps.

Fixes: 4423eb5ae3 ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-3-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Piggin 56adbb7a8b powerpc/64/interrupt: Fix false warning in context tracking due to idle state
Commit 171476775d ("context_tracking: Convert state to atomic_t")
added a CONTEXT_IDLE state which can be encountered by interrupts from
kernel mode in the idle thread, causing a false positive warning.

Fixes: 171476775d ("context_tracking: Convert state to atomic_t")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926054305.2671436-2-npiggin@gmail.com
2022-09-28 19:22:11 +10:00
Nicholas Miehlbradt a5edf9815d powerpc/64s: Enable KFENCE on book3s64
KFENCE support was added for ppc32 in commit 90cbac0e99
("powerpc: Enable KFENCE for PPC32").
Enable KFENCE on ppc64 architecture with hash and radix MMUs.
It uses the same mechanism as debug pagealloc to
protect/unprotect pages. All KFENCE kunit tests pass on both
MMUs.

KFENCE memory is initially allocated using memblock but is
later marked as SLAB allocated. This necessitates the change
to __pud_free to ensure that the KFENCE pages are freed
appropriately.

Based on previous work by Christophe Leroy and Jordan Niethe.

Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926075726.2846-4-nicholas@linux.ibm.com
2022-09-28 19:22:10 +10:00
Christophe Leroy d7902d31cb powerpc/64s: Allow double call of kernel_[un]map_linear_page()
If the page is already mapped resp. already unmapped, bail out.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926075726.2846-3-nicholas@linux.ibm.com
2022-09-28 19:22:10 +10:00
Christophe Leroy 3e791d0f32 powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils
debug_pagealloc_enabled() is always defined and constant folds to
'false' when CONFIG_DEBUG_PAGEALLOC is not enabled.

Remove the #ifdefs, the code and associated static variables will
be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is
not defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926075726.2846-2-nicholas@linux.ibm.com
2022-09-28 19:22:10 +10:00
Nicholas Miehlbradt 5e8b2c4dd3 powerpc/64s: Add DEBUG_PAGEALLOC for radix
There is support for DEBUG_PAGEALLOC on hash but not on radix.
Add support on radix.

Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926075726.2846-1-nicholas@linux.ibm.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin 7fd123e544 powerpc/64s: update cpu selection options
Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so
make that clear in the option name. The POWER5_CPU option is dropped
because it's uncommon, and GENERIC_CPU covers it.

-mtune= before power8 is dropped because the minimum gcc version
supports power8, and tuning is made consistent between big and little
endian.

A 970 option is added for PowerPC 970 / G5 because they still have a
user base, and -mtune=power8 does not generate good code for the 970.

This also updates the ISA versions document to add Power4/Power4+
because I didn't realise Power4+ used 2.01.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921014103.587954-2-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin 58ec7f06b7 powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5.
POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added
the popcntb instruction which a compiler might use.

Use -mcpu=power4.

Fixes: 471d7ff8b5 ("powerpc/64s: Remove POWER4 support")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921014103.587954-1-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin 9c7bfc2dc2 powerpc/64s: Make POWER10 and later use pause_short in cpu_relax loops
We want to move away from using SMT priority updates for cpu_relax, and
use a 'wait' instruction which is similar to x86. As well as being a
much better fit for what everybody else uses and tests with, priority
nops are stateful which is nasty (interrupts have to consider they might
be taken at a different priority), and they're expensive to execute,
similar to a mtSPR which can effect other threads in the pipe.

This has shown to give results that are less affected by code alignment
on benchmarks that cause a lot of spin waiting (e.g., rwsem contention
on unixbench filesystem benchmarks) on POWER10.

QEMU TCG only supports this instruction correctly since v7.1, versions
without the fix may cause hangs whne running POWER10 CPUs.

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix checkpatch warnings RE the macros]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920122259.363092-2-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin dabeb572ad powerpc: add ISA v3.0 / v3.1 wait opcode macro
The wait instruction encoding changed between ISA v2.07 and ISA v3.0.
In v3.1 the instruction gained a new field.

Update the PPC_WAIT macro to the current encoding. Rename the older
incompatible one with a _v203 suffix as it was introduced in v2.03
(the WC field was introduced in v2.07 but the kernel only uses WC=0).

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com
2022-09-28 19:22:10 +10:00
Nicholas Piggin c84550203b powerpc/time: avoid programming DEC at the start of the timer interrupt
Setting DEC to maximum at the start of the timer interrupt is not
necessary and can be avoided for performance when MSR[EE] is not
enabled during the handler as explained in commit 0faf20a1ad
("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless
perf is in use"), where this change was first attempted.

The idea is that the timer interrupt runs with MSR[EE]=0, and at the end
of the interrupt DEC is programmed to the next timer interval, so there
is no need to clear the decrementer exception before then.

When the above commit was merged, that was not quite true. The low res
timer subsystem had some cases in the oneshot timer code where if the
tick was to be stopped and no timers active, the clock device would not
get the ->set_state_oneshot_stopped() call, so DEC would not get
reprogrammed, and this would hang taking continual timer interrupts.

So this was reverted in commit d2b9be1f4a ("powerpc/time: Always set
decrementer in timer_interrupt()"), which was a partial revert of the
above commit.

Commit 62c1256d54 ("timers/nohz: Switch to ONESHOT_STOPPED in the
low-res handler when the tick is stopped") was later merged to fix this
missing case in the timer subsystem, so now the behaviour can be
restored.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220909142457.278032-1-npiggin@gmail.com
2022-09-28 19:22:09 +10:00
Pali Rohár b19448fe84 powerpc: Add support for early debugging via Serial 16550 console
Currently powerpc early debugging contains lot of platform specific
options, but does not support standard UART / serial 16550 console.

Later legacy_serial.c code supports registering UART as early debug console
from device tree but it is not early during booting, but rather later after
machine description code finishes.

So for real early debugging via UART is current code unsuitable.

Add support for new early debugging option CONFIG_PPC_EARLY_DEBUG_16550
which enable Serial 16550 console on address defined by new option
CONFIG_PPC_EARLY_DEBUG_16550_PHYSADDR and by stride by option
CONFIG_PPC_EARLY_DEBUG_16550_STRIDE.

With this change it is possible to debug powerpc machine descriptor code.
For example this early debugging code can print on serial console also
"No suitable machine description found" error which is done before
legacy_serial.c code.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220822231501.16827-1-pali@kernel.org
2022-09-28 19:22:09 +10:00
Hari Bathini bd7dc90e52 powerpc/64/kdump: Limit kdump base to 512MB
Since commit e641eb03ab ("powerpc: Fix up the kdump base cap to
128M") memory for kdump kernel has been reserved at an offset of 128MB.
This held up well for a long time before running into boot failure on
LPARs having a lot of cores. Commit 7c5ed82b80 ("powerpc: Set
crashkernel offset to mid of RMA region") fixed this boot failure by
moving the offset to mid of RMA region. This change meant the offset is
either 256MB or 512MB on LPARs as ppc64_rma_size was 512MB or 1024MB
owing to commit 103a8542cb ("powerpc/book3s64/ radix: Fix boot
failure with large amount of guest memory").

But ppc64_rma_size can be larger as well with newer f/w. So, limit
crashkernel reservation offset to 512MB to avoid running into boot
failures during kdump kernel boot, due to RTAS or other allocation
restrictions.

Also, while here, use SZ_128M instead of opening coding it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220912065031.57416-1-hbathini@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure 7e92e01b72 powerpc: Provide syscall wrapper
Implement syscall wrapper as per s390, x86, arm64. When enabled
cause handlers to accept parameters from a stack frame rather than
from user scratch register state. This allows for user registers to be
safely cleared in order to reduce caller influence on speculation
within syscall routine. The wrapper is a macro that emits syscall
handler symbols that call into the target handler, obtaining its
parameters from a struct pt_regs on the stack.

As registers are already saved to the stack prior to calling
system_call_exception, it appears that this function is executed more
efficiently with the new stack-pointer convention than with parameters
passed by registers, avoiding the allocation of a stack frame for this
method. On a 32-bit system, we see >20% performance increases on the
null_syscall microbenchmark, and on a Power 8 the performance gains
amortise the cost of clearing and restoring registers which is
implemented at the end of this series, seeing final result of ~5.6%
performance improvement on null_syscall.

Syscalls are wrapped in this fashion on all platforms except for the
Cell processor as this commit does not provide SPU support. This can be
quickly fixed in a successive patch, but requires spu_sys_callback to
allocate a pt_regs structure to satisfy the wrapped calling convention.

Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmai.com>
[mpe: Make incompatible with COMPAT to retain clearing of high bits of args]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure f8971c627b powerpc: Change system_call_exception calling convention
Change system_call_exception arguments to pass a pointer to a stack
frame container caller state, as well as the original r0, which
determines the number of the syscall. This has been observed to yield
improved performance to passing them by registers, circumventing the
need to allocate a stack frame.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Retain clearing of high bits of args for compat tasks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-21-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure 8640de0dee powerpc: Use common syscall handler type
Cause syscall handlers to be typed as follows when called indirectly
throughout the kernel. This is to allow for better type checking.

typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long,
                           unsigned long, unsigned long, unsigned long);

Since both 32 and 64-bit abis allow for at least the first six
machine-word length parameters to a function to be passed by registers,
even handlers which admit fewer than six parameters may be viewed as
having the above type.

Coercing syscalls to syscall_fn requires a cast to void* to avoid
-Wcast-function-type.

Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce
explicit cast on systems with SPUs.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-19-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure 39859aea41 powerpc: Enable compile-time check for syscall handlers
The table of syscall handlers and registered compatibility syscall
handlers has in past been produced using assembly, with function
references resolved at link time. This moves link-time errors to
compile-time, by rewriting systbl.S in C, and including the
linux/syscalls.h, linux/compat.h and asm/syscalls.h headers for
prototypes.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-18-rmclure@linux.ibm.com
2022-09-28 19:22:09 +10:00
Rohan McLure 8cd1def4b8 powerpc: Include all arch-specific syscall prototypes
Forward declare all syscall handler prototypes where a generic prototype
is not provided in either linux/syscalls.h or linux/compat.h in
asm/syscalls.h. This is required for compile-time type-checking for
syscall handlers, which is implemented later in this series.

32-bit compatibility syscall handlers are expressed in terms of types in
ppc32.h. Expose this header globally.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use standard include guard naming for syscalls_32.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-17-rmclure@linux.ibm.com
2022-09-28 19:22:08 +10:00
Rohan McLure dec20c50df powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
Arch-specific implementations of syscall handlers are currently used
over generic implementations for the following reasons:

1. Semantics unique to powerpc
2. Compatibility syscalls require 'argument padding' to comply with
   64-bit argument convention in ELF32 abi.
3. Parameter types or order is different in other architectures.

These syscall handlers have been defined prior to this patch series
without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
custom input and output types. We remove every such direct definition in
favour of the aforementioned macros.

Also update syscalls.tbl in order to refer to the symbol names generated
by each of these macros. Since ppc64_personality can be called by both
64 bit and 32 bit binaries through compatibility, we must generate both
both compat_sys_ and sys_ symbols for this handler.

As an aside:
A number of architectures including arm and powerpc agree on an
alternative argument order and numbering for most of these arch-specific
handlers. A future patch series may allow for asm/unistd.h to signal
through its defines that a generic implementation of these syscall
handlers with the correct calling convention be emitted, through the
__ARCH_WANT_COMPAT_SYS_... convention.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com
2022-09-28 19:22:08 +10:00
Rohan McLure ac17defbeb powerpc: Provide do_ppc64_personality helper
Avoid duplication in future patch that will define the ppc64_personality
syscall handler in terms of the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE
macros, by extracting the common body of ppc64_personality into a helper
function.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-15-rmclure@linux.ibm.com
2022-09-28 19:22:08 +10:00
Rohan McLure b7fa9ce86d powerpc: Remove direct call to mmap2 syscall handlers
Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Move the compatibility syscall definition for mmap2 to
syscalls.c, so that all mmap implementations can share a helper function.

Remove 'inline' on static mmap helper.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix compat_sys_mmap2() prototype and offset handling]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-14-rmclure@linux.ibm.com
2022-09-28 19:21:26 +10:00
Nicholas Piggin 1a5486b3c3 KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl
Stolen time logging in dtl was removed from the P9 path, so guests had
no stolen time accounting. Add it back in a simpler way that still
avoids locks and per-core accounting code.

Fixes: ecb6a7207f ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908132545.4085849-4-npiggin@gmail.com
2022-09-28 01:07:19 +10:00
Nicholas Piggin b31bc24a49 KVM: PPC: Book3S HV: Update guest state entry/exit accounting to new API
Update the guest state and timing entry/exit accounting to use the new
API, which was introduced following issues found[1]. KVM HV does
possibly call instrumented code inside the guest context, and it does
call srcu inside the guest context which is fragile at best.

Switch to the new API, moving the guest context inside the
srcu_read_lock/unlock region.

[1] https://lore.kernel.org/lkml/20220201132926.3301912-1-mark.rutland@arm.com/

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908132545.4085849-3-npiggin@gmail.com
2022-09-28 01:07:19 +10:00
Nicholas Piggin c953f7500b KVM: PPC: Book3S HV P9: Fix irq disabling in tick accounting
kvmhv_run_single_vcpu() disables PMIs as well as Linux irqs,
however the tick time accounting code enables and disables irqs and
not PMIs within this region. By chance this might not actually cause
a bug, but it is clearly an incorrect use of the APIs.

Fixes: 2251fbe763 ("KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908132545.4085849-2-npiggin@gmail.com
2022-09-28 01:07:19 +10:00
Nicholas Piggin bc91c04bff KVM: PPC: Book3S HV P9: Clear vcpu cpu fields before enabling host irqs
On guest entry, vcpu->cpu and vcpu->arch.thread_cpu are set after
disabling host irqs. On guest exit there is a window whre tick time
accounting briefly enables irqs before these fields are cleared.

Move them up to ensure they are cleared before host irqs are run.
This is possibly not a problem, but is more symmetric and makes the
fields less surprising.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220908132545.4085849-1-npiggin@gmail.com
2022-09-28 01:07:19 +10:00
Fabiano Rosas 0a5bfb824a KVM: PPC: Book3S HV: Fix decrementer migration
We used to have a workaround[1] for a hang during migration that was
made ineffective when we converted the decrementer expiry to be
relative to guest timebase.

The point of the workaround was that in the absence of an explicit
decrementer expiry value provided by userspace during migration, KVM
needs to initialize dec_expires to a value that will result in an
expired decrementer after subtracting the current guest timebase. That
stops the vcpu from hanging after migration due to a decrementer
that's too large.

If the dec_expires is now relative to guest timebase, its
initialization needs to be guest timebase-relative as well, otherwise
we end up with a decrementer expiry that is still larger than the
guest timebase.

1- https://git.kernel.org/torvalds/c/5855564c8ab2

Fixes: 3c1a4322bb ("KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220816222517.1916391-1-farosas@linux.ibm.com
2022-09-28 01:07:19 +10:00
Matthew Wilcox (Oracle) 405e669172 powerpc: remove mmap linked list walks
Use the VMA iterator instead.

Link: https://lkml.kernel.org/r/20220906194824.2110408-34-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 19:46:19 -07:00
Linus Torvalds 3800a713b6 26 hotfixes. 8 are for issues which were introduced during this -rc
cycle, 18 are for earlier issues, and are cc:stable.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYzH+NgAKCRDdBJ7gKXxA
 ju4AAQDrFWErVp+ra5P66SSbiFmm8NAW1awt4nHwAPcihNf3yQD/eQcB3w2q0Dm1
 9HjsyEVkTYIeaJSAbCraDnMwUdWTIgY=
 =p5+0
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull last (?) hotfixes from Andrew Morton:
 "26 hotfixes.

  8 are for issues which were introduced during this -rc cycle, 18 are
  for earlier issues, and are cc:stable"

* tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits)
  x86/uaccess: avoid check_object_size() in copy_from_user_nmi()
  mm/page_isolation: fix isolate_single_pageblock() isolation behavior
  mm,hwpoison: check mm when killing accessing process
  mm/hugetlb: correct demote page offset logic
  mm: prevent page_frag_alloc() from corrupting the memory
  mm: bring back update_mmu_cache() to finish_fault()
  frontswap: don't call ->init if no ops are registered
  mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
  mm: fix madivse_pageout mishandling on non-LRU page
  powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
  mm: gup: fix the fast GUP race against THP collapse
  mm: fix dereferencing possible ERR_PTR
  vmscan: check folio_test_private(), not folio_get_private()
  mm: fix VM_BUG_ON in __delete_from_swap_cache()
  tools: fix compilation after gfp_types.h split
  mm/damon/dbgfs: fix memory leak when using debugfs_lookup()
  mm/migrate_device.c: copy pte dirty bit to page
  mm/migrate_device.c: add missing flush_cache_page()
  mm/migrate_device.c: flush TLB while holding PTL
  x86/mm: disable instrumentations of mm/pgprot.c
  ...
2022-09-26 13:23:15 -07:00
Andrew Morton 6d751329e7 Merge branch 'mm-hotfixes-stable' into mm-stable 2022-09-26 13:13:15 -07:00
Yang Shi bedf034169 powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will
move to use RCU instead of disabling local interrupts in fast-GUP.  Using
an IPI is the old-styled way of serializing against fast-GUP although it
still works as expected now.

And fast-GUP now fixed the potential race with THP collapse by checking
whether PMD is changed or not.  So IPI broadcast in radix pmd collapse
flush is not necessary anymore.  But it is still needed for hash TLB.

Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26 12:14:33 -07:00
Paolo Bonzini c59fb12758 KVM: remove KVM_REQ_UNHALT
KVM_REQ_UNHALT is now unnecessary because it is replaced by the return
value of kvm_vcpu_block/kvm_vcpu_halt.  Remove it.

No functional change intended.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20220921003201.1441511-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26 12:37:21 -04:00
Rohan McLure 4df0221f9d powerpc: Remove direct call to personality syscall handler
Syscall handlers should not be invoked internally by their symbol names,
as these symbols defined by the architecture-defined SYSCALL_DEFINE
macro. Fortunately, in the case of ppc64_personality, its call to
sys_personality can be replaced with an invocation to the
equivalent ksys_personality inline helper in <linux/syscalls.h>.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-13-rmclure@linux.ibm.com
2022-09-26 23:00:16 +10:00
Rohan McLure b6b1334c95 powerpc/32: Remove powerpc select specialisation
Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.

The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.


As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.

Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/
Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure c2e7a19827 powerpc: Use generic fallocate compatibility syscall
The powerpc fallocate compat syscall handler is identical to the
generic implementation provided by commit 59c10c52f5 ("riscv:
compat: syscall: Add compat_sys_call_table implementation"), and as
such can be removed in favour of the generic implementation.

A future patch series will replace more architecture-defined syscall
handlers with generic implementations, dependent on introducing generic
implementations that are compatible with powerpc and arm's parameter
reorderings.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-11-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 016ff72bd2 powerpc: Fix fallocate and fadvise64_64 compat parameter combination
As reported[1] by Arnd, the arch-specific fadvise64_64 and fallocate
compatibility handlers assume parameters are passed with 32-bit
big-endian ABI. This affects the assignment of odd-even parameter pairs
to the high or low words of a 64-bit syscall parameter.

Fix fadvise64_64 fallocate compat handlers to correctly swap upper/lower
32 bits conditioned on endianness.

A future patch will replace the arch-specific compat fallocate with an
asm-generic implementation. This patch is intended for ease of
back-port.

[1]: https://lore.kernel.org/all/be29926f-226e-48dc-871a-e29a54e80583@www.fastmail.com/

Fixes: 57f48b4b74 ("powerpc/compat_sys: swap hi/lo parts of 64-bit syscall args in LE mode")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-9-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 620f5c59c8 powerpc/64s: Fix comment on interrupt handler prologue
Interrupt handlers on 64s systems will often need to save register state
from the interrupted process to make space for loading special purpose
registers or for internal state.

Fix a comment documenting a common code path macro in the beginning of
interrupt handlers where r10 is saved to the PACA to afford space for
the value of the CFAR. Comment is currently written as if r10-r12 are
saved to PACA, but in fact only r10 is saved, with r11-r12 saved much
later. The distance in code between these saves has grown over the many
revisions of this macro. Fix this by signalling with a comment where
r11-r12 are saved to the PACA.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-8-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 53ecaa6778 powerpc/64e: Clarify register saves and clears with {SAVE,ZEROIZE}_GPRS
The common interrupt handler prologue macro and the bad_stack
trampolines include consecutive sequences of register saves, and some
register clears. Neaten such instances by expanding use of the SAVE_GPRS
macro and employing the ZEROIZE_GPR macro when appropriate.

Also simplify an invocation of SAVE_GPRS targetting all non-volatile
registers to SAVE_NVGPRS.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-7-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 15ba74502c powerpc/32: Clarify interrupt restores with REST_GPR macro in entry_32.S
Restoring the register state of the interrupted thread involves issuing
a large number of predictable loads to the kernel stack frame. Issue the
REST_GPR{,S} macros to clearly signal when this is happening, and bunch
together restores at the end of the interrupt handler where the saved
value is not consumed earlier in the handler code.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-6-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 2b1dac4b5f powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers
Use the convenience macros for saving/clearing/restoring gprs in keeping
with syscall calling conventions. The plural variants of these macros
can store a range of registers for concision.

This works well when the user gpr value we are hoping to save is still
live. In the syscall interrupt handlers, user register state is
sometimes juggled between registers. Hold-off from issuing the SAVE_GPR
macro for applicable neighbouring lines to highlight the delicate
register save logic.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-5-rmclure@linux.ibm.com
2022-09-26 23:00:15 +10:00
Rohan McLure 9d54a5ce3a powerpc: Add ZEROIZE_GPRS macros for register clears
Provide register zeroing macros, following the same convention as
existing register stack save/restore macros, to be used in later
change to concisely zero a sequence of consecutive gprs.

The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping
with the naming of the accompanying restore and save macros, and usage
of zeroize to describe this operation elsewhere in the kernel.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-4-rmclure@linux.ibm.com
2022-09-26 23:00:14 +10:00
Rohan McLure 2c27d4a419 powerpc: Save caller r3 prior to system_call_exception
This reverts commit 8875f47b76 ("powerpc/syscall: Save r3 in regs->orig_r3
").

Save caller's original r3 state to the kernel stackframe before entering
system_call_exception. This allows for user registers to be cleared by
the time system_call_exception is entered, reducing the influence of
user registers on speculation within the kernel.

Prior to this commit, orig_r3 was saved at the beginning of
system_call_exception. Instead, save orig_r3 while the user value is
still live in r3.

Also replicate this early save in 32-bit. A similar save was removed in
commit 6f76a01173 ("powerpc/syscall: implement system call entry/exit
logic in C for PPC32") when 32-bit adopted system_call_exception. Revert
its removal of orig_r3 saves.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-3-rmclure@linux.ibm.com
2022-09-26 23:00:14 +10:00
Rohan McLure 5ba6c9a912 powerpc: Remove asmlinkage from syscall handler definitions
The asmlinkage macro has no special meaning in powerpc, and prior to
this patch is used sporadically on some syscall handler definitions. On
architectures that do not define asmlinkage, it resolves to extern "C"
for C++ compilers and a nop otherwise. The current invocations of
asmlinkage provide far from complete support for C++ toolchains, and so
the macro serves no purpose in powerpc.

Remove all invocations of asmlinkage in arch/powerpc. These incidentally
only occur in syscall definitions and prototypes.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220921065605.1051927-2-rmclure@linux.ibm.com
2022-09-26 23:00:14 +10:00
Christophe Leroy 4af8354553 powerpc/irq: Refactor irq_soft_mask_{set,or}_return()
This partialy reapply commit ef5b570d37 ("powerpc/irq: Don't
open code irq_soft_mask helpers") which was reverted by
commit 684c68d92e ("Revert "powerpc/irq: Don't open code
irq_soft_mask helpers"")

irq_soft_mask_set_return() and irq_soft_mask_or_return()
are overset of irq_soft_mask_set().

Have them use irq_soft_mask_set() instead of duplicating it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18473da42362ee8f07bce36b9caef8ba77d7633f.1663656054.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy 605ba9ee8a powerpc: Remove impossible mmu_psize_defs[] on nohash
Today there is:

  if e500 or 8xx
    if e500
      mmu_psize_defs[] =
    else if 8xx
      mmu_psize_defs[] =
    else
      mmu_psize_defs[] =
    endif
  endif

The else leg is dead definition.

Drop that else leg and rewrite as:

  if e500
    mmu_psize_defs[] =
  endif
  if 8xx
    mmu_psize_defs[] =
  endif

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/030a843449f348c0b709ca5349640624f36a016f.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy 6556fd1a1e powerpc: Cleanup idle for e500
e500 idle setup is a bit messy.

e500_idle() is used for PPC32 while book3e_idle() is used for PPC64.
As they are mutually exclusive, call them all e500_idle().

Use CONFIG_MPC_85xx instead of PPC32 + E500 in Makefile and rename
idle_e500.c to idle_85xx.c .

Rename idle_book3e.c to idle_64e.c and remove #ifdef PPC64 in
as it's only built on PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8039301334e948974c85ec5ef2db37751075185b.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy 73d1149879 powerpc: Simplify redundant Kconfig tests
PPC_85xx implies PPC32 so no need to check PPC32 in addition.

PPC64 && !PPC_BOOK3E_64 means PPC_BOOK3S_64.

PPC_BOOK3E_64 implies PPC_E500.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/244cce3e603f2b79796314c0c1c46cab927b9adc.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy 772fd56dec powerpc: Replace PPC_85xx || PPC_BOOKE_64 by PPC_E500
PPC_E500 is the same as PPC_85xx || PPC_BOOKE_64

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af79696f8cb8536fb4e20c0d98a6bf159a9e371b.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy aa5f59df20 powerpc: Remove CONFIG_PPC_BOOK3E_MMU
CONFIG_PPC_BOOK3E_MMU is redundant with CONFIG_PPC_E500.

Remove it.

Also rename mmu-book3e.h to mmu-e500.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c5549cd59a131204ff94ab909cad2e2dad4ddf2f.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:14 +10:00
Christophe Leroy 3e7318584d powerpc: Remove CONFIG_PPC_FSL_BOOK3E
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Remove it.

And rename five files accordingly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Rename include guards to match new file names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/795cb93b88c9a0279289712e674f39e3b108a1b4.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy 688de017ef powerpc: Change CONFIG_E500 to CONFIG_PPC_E500
It will be used outside arch/powerpc, make it clear its a
powerpc configuration item.

And we already have CONFIG_PPC_E500MC, so that will make
it more consistent.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e63b22083c11c4300f4a82d3123a46e5fdd54fa6.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy 1df399012b powerpc: Remove redundant selection of E500 and E500MC
PPC_85xx and PPC_BOOK3E_64 already select E500 so no need
to select it again by PPC_QEMU_E500 and CORENET_GENERIC
as they depend on PPC_85xx || PPC_BOOK3E_64.

PPC_BOOK3E_64 already selects E500MC so no need to
select it again by PPC_QEMU_E500 if PPC64, PPC_BOOK3E_64
is the only way into PPC_QEMU_E500 with PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/44f03fa1506892fabf626dceb2f47a049908b6af.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy e0d68273d7 powerpc: Remove CONFIG_PPC_BOOK3E
CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64.

The later is more explicit about the fact that it's a 64 bits target.

Remove CONFIG_PPC_BOOK3E.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d0891490813c19cdcfc04678f512ea68cba3e64.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:13 +10:00
Christophe Leroy d7216567c6 powerpc/cputable: Split cpu_specs[] for mpc85xx and e500mc
e500v1/v2 and e500mc are said to be mutually exclusive in Kconfig.

Split e500 cpu_specs[] and then restrict the non e500mc to PPC32
which is then 85xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Tweak formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/553b901ea91e393df231103da4b018e9b251b0e9.1663606876.git.christophe.leroy@csgroup.eu
2022-09-26 23:00:05 +10:00
Christophe Leroy dfc3095cec powerpc: Remove CONFIG_FSL_BOOKE
PPC_85xx is PPC32 only.
PPC_85xx always selects E500 and is the only PPC32 that
selects E500.
FSL_BOOKE is selected when E500 and PPC32 are selected.

So FSL_BOOKE is redundant with PPC_85xx.

Remove FSL_BOOKE.

And rename four files accordingly.

cpu_setup_fsl_booke.S is not renamed because it is linked to
PPC_FSL_BOOK3E and not to FSL_BOOKE as suggested by its name.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/08e3e15594e66d63b9e89c5b4f9c35153913c28f.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 22:47:37 +10:00
Christophe Leroy e320a76db4 powerpc/cputable: Split cpu_specs[] out of cputable.h
cpu_specs[] is full of #ifdefs depending on the different
types of CPU.

CPUs are mutually exclusive, it is therefore possible to split
cpu_specs[] into smaller more readable pieces.

Create cpu_specs_XXX.h that will each be dedicated on one
of the following mutually exclusive families:
- 40x
- 44x
- 47x
- 8xx
- e500
- book3s/32
- book3s/64

In book3s/32, the block for 603 has been moved in front in order
to not have two 604 blocks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix CONFIG_47x to be CONFIG_PPC_47x, tweak some formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a44b865e0318286155273b10cdf524ab697928c1.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 22:47:13 +10:00
Christophe Leroy 76b719881a powerpc/cputable: Move __cpu_setup() prototypes out of cputable.h
Move all prototypes out of cputable.h

For that rename cpu_setup_power.h to cpu_setup.h and move all
prototypes in it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Standardise cpu_spec *spec formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f45118489ee450db654db8bbcdfd8f5907337c22.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 22:26:49 +10:00
Christophe Leroy afd2288a4c powerpc/cputable: Remove __machine_check_early_realmode_p{7/8/9} prototypes
__machine_check_early_realmode_p{7/8/9} are already in mce.h
which is included. Remove them from cputable.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b77fc0f90e3a9c065324cbff549b718ccf0809f8.1663606875.git.christophe.leroy@csgroup.eu
2022-09-26 21:00:42 +10:00