"rmt_hit" is accounted into two metrics: one is accounted into the
metrics "LLC Ld Miss" (see the function llc_miss() for calculation
"llcmiss"); and it's accounted into metrics "LLC Load Hit". Thus,
for the literal meaning, it is contradictory that "rmt_hit" is
accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit"
(LLC hit).
Thus this is easily to introduce confusion: "LLC Load Hit" gives
impression that all items belong to it are LLC hit; in fact "rmt_hit"
is LLC miss and remote cache hit.
To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is
moved out from it and changes "LLC Load Hit" to contain two items:
LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm")
For output alignment, adjusts the header for "LLC Load Hit".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace the header string "Lcl" with "LclHit", which is more explicit
to express the event type is LLC local hit.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively,
suppose if we want to extend the tool to display these two dimensions
under any one metrics, users cannot understand the semantics if only
based on the header string 'Lcl' or 'Rmt'.
To explicit express the meaning for HITM items, this patch changes the
headers string as "LclHitm" and "RmtHitm", the strings are more readable
and this allows to extend metrics for using HITM items.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metrics "LLC Load Hitm" contains two items: one is "local Hitm" and
another is "remote Hitm".
"local Hitm" means: L3 HIT and was serviced by another processor core
with a cross core snoop where modified copies were found; it's no doubt
that "local Hitm" belongs to LLC access.
But for "remote Hitm", based on the code in util/mem-events, it's the
event for remote cache HIT and was serviced by another processor core
with modified copies. Thus the remote Hitm is a remote cache's hit and
actually it's LLC load miss.
Now the display format gives users the impression that "local Hitm" and
"remote Hitm" both belong to the LLC load, but this is not the fact as
described.
This patch changes the header from "LLC Load Hitm" to "Load Hitm", this
can avoid the give the wrong impression that all Hitm belong to LLC.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metrics are not organized based on memory hierarchy, e.g. the tool
doesn't organize the metrics order based on memory nodes from the close
node (e.g. L1/L2 cache) to far node (e.g. L3 cache and DRAM).
To output metrics with more friendly form, this patch refines the
metrics order based on memory hierarchy:
"Core Load Hit" => "LLC Load Hit" => "LLC Ld Miss" => "Load Dram"
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To view the statistics with "breakdown" mode, it's good to show the
summary numbers for the total records, all stores and all loads, then
the sequential conlumns can be used to break into more detailed items.
To achieve this purpose, this patch displays the summary numbers for
records/stores/loads continuously and places them before breakdown
items, this can allow uses to easily read the summarized statistics.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We will not allow unitialized anon mmaps, but we need this define
to prevent build errors, e.g. the debian foot package.
Suggested-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
The objective of the tests is to check that ICMP errors generated while
crossing between VRFs are properly routed back to the source host.
The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the
output of the command to check that a ttl expired error is received.
The second ttl test runs traceroute from h1 to h2 and parses the output to
check for a hop on r1.
The mtu test sends a ping with a payload of 1450 from h1 to h2, through
r1 which has an interface with a mtu of 1400 and parses the output of the
command to check that a fragmentation needed error is received.
[ The IPv6 MTU test still fails with the symmetric routing setup. It
appears to be caused by source address selection picking ::1. Fixing
this is beyond the scope of this series. ]
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCX4a4sAAKCRCRxhvAZXjc
ohdRAP9fclgrRkTl3o4cgaK0PUMt8BZ5QCg/SPQrVT58AQlfSwEAsNtWAeo6U2z1
FLGuCoPBEW1Zghkj1lMbIhj5zyVaEQ8=
=Z7Q0
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
"This introduces a new extension to the pidfd_open() syscall. Users can
now raise the new PIDFD_NONBLOCK flag to support non-blocking pidfd
file descriptors. This has been requested for uses in async process
management libraries such as async-pidfd in Rust.
Ever since the introduction of pidfds and more advanced async io
various programming languages such as Rust have grown support for
async event libraries. These libraries are created to help build
epoll-based event loops around file descriptors. A common pattern is
to automatically make all file descriptors they manage to O_NONBLOCK.
For such libraries the EAGAIN error code is treated specially. When a
function is called that returns EAGAIN the function isn't called again
until the event loop indicates the the file descriptor is ready.
Supporting EAGAIN when waiting on pidfds makes such libraries just
work with little effort.
This introduces a new flag PIDFD_NONBLOCK that is equivalent to
O_NONBLOCK. This follows the same patterns we have for other (anon
inode) file descriptors such as EFD_NONBLOCK, IN_NONBLOCK,
SFD_NONBLOCK, TFD_NONBLOCK and the same for close-on-exec flags.
Passing a non-blocking pidfd to waitid() currently has no effect, i.e.
is not supported. There are users which would like to use waitid() on
pidfds that are O_NONBLOCK and mix it with pidfds that are blocking
and both pass them to waitid().
The expected behavior is to have waitid() return -EAGAIN for
non-blocking pidfds and to block for blocking pidfds without needing
to perform any additional checks for flags set on the pidfd before
passing it to waitid(). Non-blocking pidfds will return EAGAIN from
waitid() when no child process is ready yet. Returning -EAGAIN for
non-blocking pidfds makes it easier for event loops that handle EAGAIN
specially.
It also makes the API more consistent and uniform. In essence,
waitid() is treated like a read on a non-blocking pidfd or a recvmsg()
on a non-blocking socket.
With the addition of support for non-blocking pidfds we support the
same functionality that sockets do. For sockets() recvmsg() supports
MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are
per-call options. In contrast non-blocking pidfds and non-blocking
sockets are a setting on an open file description affecting all
threads in the calling process as well as other processes that hold
file descriptors referring to the same open file description. Both
behaviors, per call and per open file description, have genuine
use-cases.
The interaction with the WNOHANG flag is documented as follows:
- If a non-blocking pidfd is passed and WNOHANG is not raised we
simply raise the WNOHANG flag internally. When do_wait() returns
indicating that there are eligible child processes but none have
exited yet we set EAGAIN. If no child process exists we continue
returning ECHILD.
- If a non-blocking pidfd is passed and WNOHANG is raised waitid()
will continue returning 0, i.e. it will not set EAGAIN. This ensure
backwards compatibility with applications passing WNOHANG
explicitly with pidfds"
* tag 'threads-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: remove O_NONBLOCK before waiting for WSTOPPED
tests: add waitid() tests for non-blocking pidfds
tests: port pidfd_wait to kselftest harness
pidfd: support PIDFD_NONBLOCK in pidfd_open()
exit: support non-blocking pidfds
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXz5bNAAKCRCRxhvAZXjc
opfjAP9R/J72yxdd2CLGNZ96hyiRX1NgFDOVUhscOvujYJf8ZwD+OoLmKMvAyFW6
hnMhT1n9Q+aq194hyzChOLQaBTejBQ8=
=4WCX
-----END PGP SIGNATURE-----
Merge tag 'kernel-clone-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull kernel_clone() updates from Christian Brauner:
"During the v5.9 merge window we reworked the process creation
codepaths across multiple architectures. After this work we were only
left with the _do_fork() helper based on the struct kernel_clone_args
calling convention. As was pointed out _do_fork() isn't valid
kernelese especially for a helper that isn't just static.
This series removes the _do_fork() helper and introduces the new
kernel_clone() helper. The process creation cleanup didn't change the
name to something more reasonable mainly because _do_fork() was used
in quite a few places. So sending this as a separate series seemed the
better strategy.
I originally intended to send this early in the v5.9 development cycle
after the merge window had closed but given that this was touching
quite a few places I decided to defer this until the v5.10 merge
window"
* tag 'kernel-clone-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
sched: remove _do_fork()
tracing: switch to kernel_clone()
kgdbts: switch to kernel_clone()
kprobes: switch to kernel_clone()
x86: switch to kernel_clone()
sparc: switch to kernel_clone()
nios2: switch to kernel_clone()
m68k: switch to kernel_clone()
ia64: switch to kernel_clone()
h8300: switch to kernel_clone()
fork: introduce kernel_clone()
This kselftest fixes update consists of a selftests harness fix to
flush stdout before forking to avoid parent and child printing
duplicates messages. This is evident when test output is redirected
to a file.
The second fix is a tools/ wide change to avoid comma separated statements
from Joe Perches. This fix spans tools/lib, tools/power/cpupower, and
selftests.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl+GAdIACgkQCwJExA0N
QxyraBAAwsPISUpSA34WLuUwNCddI3ttNW2R63ZrdKSy7QlreM02zG9qyEDPwFil
GLlgXfUE8QOI7rfiqSwr1elzS07bDdel6UcxTuhuy5KPs2+yieGGZ5lllsVY6gJu
kF5m7setmdmHQr76HCyyGddwdpCTpz7sP3BJzmYn2iAWAQMwtZBXOEgmnf2yiskX
SHF/f3Bvrnm+BtbzZEa+ysHpL72AlpKrGuLQAnNOCp/DomKEtRACTNxIzKFeO++r
uelbHO/MzdaGmrCxy3J/RWz3llQVnj6aafZFaqAV7ReWi/OOYTsV48pAHRm8TTv8
1LvVP48b7aCUc7QWu+d8SBSDfJQANI4tgcP0TI/hUboIuhU8bVkZAUF69txdFgzb
DopwQVybQq5yEqmPg1RzvccbDojxXq72BvZyqPBo8WmKHWOQXCo9A1owudmcqtob
WqTr1eAVAd6Rc2vcjkhnzYxQcb8A093ZfP1fyAZ5HQSH5No//4FP9pWBwMzpncKQ
GdHmMNBns6v1muWMBj6bgT4GA1sN765Kzt1StYSC257v6gtA8+xHo/PUfojZJxy9
bieAzuqE8n68IKKz4/Rk2JvfFBnaxDZyQUITOCrcoWJRk5apJc3T5+goq+Bep5Na
SOFbb0JvrGLBjX3bChmLIYVa7zQkupBgwWU8NPM1tYxce+pBS30=
=jgRu
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- a selftests harness fix to flush stdout before forking to avoid
parent and child printing duplicates messages. This is evident when
test output is redirected to a file.
- a tools/ wide change to avoid comma separated statements from Joe
Perches. This fix spans tools/lib, tools/power/cpupower, and
selftests.
* tag 'linux-kselftest-fixes-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
tools: Avoid comma separated statements
selftests/harness: Flush stdout before forking
- Add support for generic initiator-only proximity domains to
the ACPI NUMA code and the architectures using it (Jonathan
Cameron).
- Clean up some non-ACPICA code referring to debug facilities from
ACPICA that are not actually used in there (Hanjun Guo).
- Add new DPTF driver for the PCH FIVR participant (Srinivas
Pandruvada).
- Reduce overhead related to accessing GPE registers in ACPICA and
the OS interface layer and make it possible to access GPE registers
using logical addresses if they are memory-mapped (Rafael Wysocki).
- Update the ACPICA code in the kernel to upstream revision 20200925
including changes as follows:
* Add predefined names from the SMBus sepcification (Bob Moore).
* Update acpi_help UUID list (Bob Moore).
* Return exceptions for string-to-integer conversions in iASL (Bob
Moore).
* Add a new "ALL <NameSeg>" debugger command (Bob Moore).
* Add support for 64 bit risc-v compilation (Colin Ian King).
* Do assorted cleanups (Bob Moore, Colin Ian King, Randy Dunlap).
- Add new ACPI backlight whitelist entry for HP 635 Notebook (Alex
Hung).
- Move TPS68470 OpRegion driver to drivers/acpi/pmic/ and split out
Kconfig and Makefile specific for ACPI PMIC (Andy Shevchenko).
- Clean up the ACPI SoC driver for AMD SoCs (Hanjun Guo).
- Add missing config_item_put() to fix refcount leak (Hanjun Guo).
- Drop lefrover field from struct acpi_memory_device (Hanjun Guo).
- Make the ACPI extlog driver check for RDMSR failures (Ben
Hutchings).
- Fix handling of lid state changes in the ACPI button driver when
input device is closed (Dmitry Torokhov).
- Fix several assorted build issues (Barnabás Pőcze, John Garry,
Nathan Chancellor, Tian Tao).
- Drop unused inline functions and reduce code duplication by using
kobj_to_dev() in the NFIT parsing code (YueHaibing, Wang Qing).
- Serialize tools/power/acpi Makefile (Thomas Renninger).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+F4IkSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx1gIQAIZrt09fquEIZhYulGZAkuYhSX2U/DZt
poow5+TiGk36JNHlbZS19kZ3F0tJ1wA6CKSfF/bYyULxL+gYaUjdLXzv2kArTSAj
nzDXQ2CystpySZI/sEkl4QjsMg0xuZlBhlnCfNHzJw049TgdsJHnxMkJXb8T90A+
l2JKm2OpBkNvQGNpwd3djLg8xSDnHUmuevsWZPHDp92/fLMF9DUBk8dVuEwa0ndF
hAUpWm+EL1tJQnhNwtfV/Akd9Ypqgk/7ROFWFHGDtHMZGnBjpyXZw68vHMX7SL6N
Ej90GWGPHSJs/7Fsg4Hiaxxcph9WFNLPcpck5lVAMIrNHMKANjqQzCsmHavV/WTG
STC9/qwJauA1EOjovlmlCFHctjKE/ya6Hm299WTlfBqB+Lu1L3oMR2CC+Uj0YfyG
sv3264rJCsaSw610iwQOG807qHENopASO2q5DuKG0E9JpcaBUwn1N4qP5svvQciq
4aA8Ma6xM/QHCO4CS0Se9C0+WSVtxWwOUichRqQmU4E6u1sXvKJxTeWo79rV7PAh
L6BwoOxBLabEiyzpi6HPGs6DoKj/N6tOQenBh4ibdwpAwMtq7hIlBFa0bp19c2wT
vx8F2Raa8vbQ2zZ1QEiPZnPLJUoy2DgaCtKJ6E0FTDXNs3VFlWgyhIUlIRqk5BS9
OnAwVAUrTMkJ
=feLU
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These add support for generic initiator-only proximity domains to the
ACPI NUMA code and the architectures using it, clean up some
non-ACPICA code referring to debug facilities from ACPICA, reduce the
overhead related to accessing GPE registers, add a new DPTF (Dynamic
Power and Thermal Framework) participant driver, update the ACPICA
code in the kernel to upstream revision 20200925, add a new ACPI
backlight whitelist entry, fix a few assorted issues and clean up some
code.
Specifics:
- Add support for generic initiator-only proximity domains to the
ACPI NUMA code and the architectures using it (Jonathan Cameron)
- Clean up some non-ACPICA code referring to debug facilities from
ACPICA that are not actually used in there (Hanjun Guo)
- Add new DPTF driver for the PCH FIVR participant (Srinivas
Pandruvada)
- Reduce overhead related to accessing GPE registers in ACPICA and
the OS interface layer and make it possible to access GPE registers
using logical addresses if they are memory-mapped (Rafael Wysocki)
- Update the ACPICA code in the kernel to upstream revision 20200925
including changes as follows:
+ Add predefined names from the SMBus sepcification (Bob Moore)
+ Update acpi_help UUID list (Bob Moore)
+ Return exceptions for string-to-integer conversions in iASL (Bob
Moore)
+ Add a new "ALL <NameSeg>" debugger command (Bob Moore)
+ Add support for 64 bit risc-v compilation (Colin Ian King)
+ Do assorted cleanups (Bob Moore, Colin Ian King, Randy Dunlap)
- Add new ACPI backlight whitelist entry for HP 635 Notebook (Alex
Hung)
- Move TPS68470 OpRegion driver to drivers/acpi/pmic/ and split out
Kconfig and Makefile specific for ACPI PMIC (Andy Shevchenko)
- Clean up the ACPI SoC driver for AMD SoCs (Hanjun Guo)
- Add missing config_item_put() to fix refcount leak (Hanjun Guo)
- Drop lefrover field from struct acpi_memory_device (Hanjun Guo)
- Make the ACPI extlog driver check for RDMSR failures (Ben
Hutchings)
- Fix handling of lid state changes in the ACPI button driver when
input device is closed (Dmitry Torokhov)
- Fix several assorted build issues (Barnabás Pőcze, John Garry,
Nathan Chancellor, Tian Tao)
- Drop unused inline functions and reduce code duplication by using
kobj_to_dev() in the NFIT parsing code (YueHaibing, Wang Qing)
- Serialize tools/power/acpi Makefile (Thomas Renninger)"
* tag 'acpi-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPICA: Update version to 20200925 Version 20200925
ACPICA: Remove unnecessary semicolon
ACPICA: Debugger: Add a new command: "ALL <NameSeg>"
ACPICA: iASL: Return exceptions for string-to-integer conversions
ACPICA: acpi_help: Update UUID list
ACPICA: Add predefined names found in the SMBus sepcification
ACPICA: Tree-wide: fix various typos and spelling mistakes
ACPICA: Drop the repeated word "an" in a comment
ACPICA: Add support for 64 bit risc-v compilation
ACPI: button: fix handling lid state changes when input device closed
tools/power/acpi: Serialize Makefile
ACPI: scan: Replace ACPI_DEBUG_PRINT() with pr_debug()
ACPI: memhotplug: Remove 'state' from struct acpi_memory_device
ACPI / extlog: Check for RDMSR failure
ACPI: Make acpi_evaluate_dsm() prototype consistent
docs: mm: numaperf.rst Add brief description for access class 1.
node: Add access1 class to represent CPU to memory characteristics
ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
ACPI: Let ACPI know we support Generic Initiator Affinity Structures
x86: Support Generic Initiator only proximity domains
...
Rather calm cycle for PDx86, all these have been in for-next for
a couple of days with no bot complaints.
Highlights:
- PMC TigerLake fixes and new RocketLake support
- Various small fixes / updates in other drivers/tools
The following is an automated git shortlog grouped by driver:
MAINTAINERS:
- update X86 PLATFORM DRIVERS entry with new kernel.org git repo
- Update maintainers for pmc_core driver
hp-wmi:
- add support for thermal policy
intel_pmc_core:
- fix: Replace dev_dbg macro with dev_info()
- Add Intel RocketLake (RKL) support
- Clean up: Remove the duplicate comments and reorganize
- Fix the slp_s0 counter displayed value
- Fix TigerLake power gating status map
mlx-platform:
- Add capability field to platform FAN description
- Remove PSU EEPROM configuration
platform_data/mlxreg:
- Extend core platform structure
- Update module license
pmc_core:
- Use descriptive names for LPM registers
tools/power/x86/intel-speed-select:
- Update version for v5.10
- Fix missing base-freq core IDs
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAl+FmAsUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9x5awf/VWn0gcSw1i+y+Q/KPw/RCMXrEQQm
Bqyt++IDSonvjBUmxE7QYtPvHK7lecPLXUzLkcfSoUmEZSPGoqe4F5Hj+814lj8x
fveScf2DwUQyEfj26y4rmza1K4h7VohjJ7rQm0+t15KamrcogLiwqDpvel4v90lp
YVvJUxDBOJxCrMs5fAziZAP7FxD42d8j664DFCPONH3EsY/vZMfOnsDRKhjahtFp
LTtWXY5LyFf5HARKhubv/gmDddR7FzZB8/xc/G1CXpOmUBTcSgHgXH1OE/ypBXIe
LOdchGqL2WRTq71IUKsvEXYbLSOHOMbIfBr7eCwZRKfmQLjQ8HXqI7xl9A==
=luk4
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"Rather calm cycle for x86 platform drivers, all these have been in
for-next for a couple of days with no bot complaints.
Highlights:
- PMC TigerLake fixes and new RocketLake support
- various small fixes / updates in other drivers/tools"
* tag 'platform-drivers-x86-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
MAINTAINERS: update X86 PLATFORM DRIVERS entry with new kernel.org git repo
platform/x86: mlx-platform: Add capability field to platform FAN description
platform_data/mlxreg: Extend core platform structure
platform_data/mlxreg: Update module license
platform/x86: mlx-platform: Remove PSU EEPROM configuration
MAINTAINERS: Update maintainers for pmc_core driver
platform/x86: intel_pmc_core: fix: Replace dev_dbg macro with dev_info()
platform/x86: intel_pmc_core: Add Intel RocketLake (RKL) support
platform/x86: intel_pmc_core: Clean up: Remove the duplicate comments and reorganize
platform/x86: intel_pmc_core: Fix the slp_s0 counter displayed value
platform/x86: intel_pmc_core: Fix TigerLake power gating status map
platform/x86: pmc_core: Use descriptive names for LPM registers
tools/power/x86/intel-speed-select: Update version for v5.10
tools/power/x86/intel-speed-select: Fix missing base-freq core IDs
platform/x86: hp-wmi: add support for thermal policy
called SEV by also encrypting the guest register state, making the
registers inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared between
the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
in order for that exception mechanism to work, the early x86 init code
needed to be made able to handle exceptions, which, in itself, brings
a bunch of very nice cleanups and improvements to the early boot code
like an early page fault handler, allowing for on-demand building of the
identity mapping. With that, !KASLR configurations do not use the EFI
page table anymore but switch to a kernel-controlled one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly
separate from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and behind
static keys to minimize the performance impact on !SEV-ES setups.
Work by Joerg Roedel and Thomas Lendacky and others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
=toqi
-----END PGP SIGNATURE-----
Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
- Most of the changes are cleanups and reorganization to make the objtool code
more arch-agnostic. This is in preparation for non-x86 support.
Fixes:
- KASAN fixes.
- Handle unreachable trap after call to noreturn functions better.
- Ignore unreachable fake jumps.
- Misc smaller fixes & cleanups.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+FgwIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1juGw/6A6goA5/HHapM965yG1eY/rTLp3eIbcma
1ZbkUsP0YfT6wVUzw/sOeZzKNOwOq1FuMfkjuH2KcnlxlcMekIaKvLk8uauW4igM
hbFGuuZfZ0An5ka9iQ1W6HGdsuD3vVlN1w/kxdWk0c3lJCVQSTxdCfzF8fuF3gxX
lF3Bc1D/ZFcHIHT/hu/jeIUCgCYpD3qZDjQJBScSwVthZC+Fw6weLLGp2rKDaCao
HhSQft6MUfDrUKfH3LBIUNPRPCOrHo5+AX6BXxLXJVxqlwO/YU3e0GMwSLedMtBy
TASWo7/9GAp+wNNZe8EliyTKrfC3sLxN1QImfjuojxbBVXx/YQ/ToTt9fVGpF4Y+
XhhRFv9520v1tS2wPHIgQGwbh7EWG6mdrmo10RAs/31ViONPrbEZ4WmcA08b/5FY
KEkOVb18yfmDVzVZPpSc+HpIFkppEBOf7wPg27Bj3RTZmzIl/y+rKSnxROpsJsWb
R6iov7SFVET14lHl1G7tPNXfqRaS7HaOQIj3rSUyAP0ZfX+yIupVJp32dc6Ofg8b
SddUCwdIHoFdUNz4Y9csUCrewtCVJbxhV4MIdv0GpWbrgSw96RFZgetaH+6mGRpj
0Kh6M1eC3irDbhBuarWUBAr2doPAq4iOUeQU36Q6YSAbCs83Ws2uKOWOHoFBVwCH
uSKT0wqqG+E=
=KX5o
-----END PGP SIGNATURE-----
Merge tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
"Most of the changes are cleanups and reorganization to make the
objtool code more arch-agnostic. This is in preparation for non-x86
support.
Other changes:
- KASAN fixes
- Handle unreachable trap after call to noreturn functions better
- Ignore unreachable fake jumps
- Misc smaller fixes & cleanups"
* tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf build: Allow nested externs to enable BUILD_BUG() usage
objtool: Allow nested externs to enable BUILD_BUG()
objtool: Permit __kasan_check_{read,write} under UACCESS
objtool: Ignore unreachable trap after call to noreturn functions
objtool: Handle calling non-function symbols in other sections
objtool: Ignore unreachable fake jumps
objtool: Remove useless tests before save_reg()
objtool: Decode unwind hint register depending on architecture
objtool: Make unwind hint definitions available to other architectures
objtool: Only include valid definitions depending on source file type
objtool: Rename frame.h -> objtool.h
objtool: Refactor jump table code to support other architectures
objtool: Make relocation in alternative handling arch dependent
objtool: Abstract alternative special case handling
objtool: Move macros describing structures to arch-dependent code
objtool: Make sync-check consider the target architecture
objtool: Group headers to check in a single list
objtool: Define 'struct orc_entry' only when needed
objtool: Skip ORC entry creation for non-text sections
objtool: Move ORC logic out of check()
...
Merge misc updates from Andrew Morton:
"181 patches.
Subsystems affected by this patch series: kbuild, scripts, ntfs,
ocfs2, vfs, mm (slab, slub, kmemleak, dax, debug, pagecache, fadvise,
gup, swap, memremap, memcg, selftests, pagemap, mincore, hmm, dma,
memory-failure, vmallo and migration)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (181 commits)
mm/migrate: remove obsolete comment about device public
mm/migrate: remove cpages-- in migrate_vma_finalize()
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
memblock: use separate iterators for memory and reserved regions
memblock: implement for_each_reserved_mem_region() using __next_mem_region()
memblock: remove unused memblock_mem_size()
x86/setup: simplify reserve_crashkernel()
x86/setup: simplify initrd relocation and reservation
arch, drivers: replace for_each_membock() with for_each_mem_range()
arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
memblock: reduce number of parameters in for_each_mem_range()
memblock: make memblock_debug and related functionality private
memblock: make for_each_memblock_type() iterator private
mircoblaze: drop unneeded NUMA and sparsemem initializations
riscv: drop unneeded node initialization
h8300, nds32, openrisc: simplify detection of memory extents
arm64: numa: simplify dummy_numa_init()
arm, xtensa: simplify initialization of high memory pages
dma-contiguous: simplify cma_early_percent_memory()
KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
...
The event code for events referencing std arch events is incorrectly
evaluated in json_events().
The issue is that je.event is evaluated properly from try_fixup(), but
later NULLified from the real_event() call, as "event" may be NULL.
Fix by setting "event" same je.event in try_fixup().
Also remove support for overwriting event code for events using std arch
events, as it is not used.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/1602170368-11892-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We show the streams separately. They are divided into different sections.
1. "Matched hot streams"
2. "Hot streams in old perf data only"
3. "Hot streams in new perf data only".
For each stream, we report the cycles and hot percent (hits%).
For example,
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-7-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have used callchain_node->hit to measure the hot level of one stream.
This patch calculates the sum of hits of total streams.
Thus in next patch, we can use following formula to report hot percent
for one stream.
hot percent = callchain_node->hit / sum of total hits
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In previous patch, we have created an evsel_streams for one event, and
top N hottest streams will be saved in a stream array in evsel_streams.
This patch compares total streams among two evsel_streams.
Once two streams are fully matched, they will be linked as a pair. From
the pair, we can know which streams are matched.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-5-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stream is the branch history which is aggregated by the branch records
from perf samples. Now we support the callchain as stream.
If the callchain entries of one stream are fully matched with the
callchain entries of another stream, we think two streams are matched.
For example,
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
----------------------- -----------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
Above two streams are matched (we don't consider the case that source
code is changed).
The matching logic is, compare the chain string first. If it's not
matched, fallback to dso address comparison.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In previous patch, we have created evsel_streams array.
This patch returns the specified evsel_streams according to the
evsel_idx.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We define a stream as the branch history which is aggregated by the
branch records from perf samples. For example, the callchains aggregated
from the branch records are considered as streams. By browsing the hot
stream, we can understand the hot code path.
Now we only support the callchain for stream. For measuring the hot
level for a stream, we use the callchain_node->hit, higher is hotter.
There may be many callchains sampled so we only focus on the top N
hottest callchains. N is a user defined parameter or predefined default
value (nr_streams_max).
This patch creates an evsel_streams array per event, and saves the top N
hottest streams in a stream array.
So now we can get the per-event top N hottest streams.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Document the higher level --insn-trace etc. perf script options.
Include the howto how to build xed into the manpage
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20201014035346.4772-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter suggested that using the exclusive mode in perf could avoid some
problems with bad scheduling of groups. Exclusive is implemented in the
kernel, but wasn't exposed by the perf tool, so hard to use without
custom low level API users.
Add support for marking groups or events with :e for exclusive in the
perf tool. The implementation is basically the same as the existing
pinned attribute.
Committer testing:
# perf test "parse event"
6: Parse event definition strings : Ok
# perf test -v "parse event" |& grep :u*e
running test 56 'instructions:uep'
running test 57 '{cycles,cache-misses,branch-misses}:e'
#
#
# grep "model name" -m1 /proc/cpuinfo
model name : AMD Ryzen 9 3900X 12-Core Processor
#
# perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1
Performance counter stats for 'system wide':
<not counted> cycles (0.00%)
<not counted> cache-misses (0.00%)
<not counted> branch-misses (0.00%)
1.001269893 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
# echo 0 > /proc/sys/kernel/nmi_watchdog
# perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1
Performance counter stats for 'system wide':
1,298,663,141 cycles
30,962,215 cache-misses
5,325,150 branch-misses
1.001474934 seconds time elapsed
#
# The output for asking for precise events on AMD needs to improve, it
# supposedly works only for system wide or per CPU
#
# perf stat -a -e '{cycles,cache-misses,branch-misses}:uep' sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
/bin/dmesg | grep -i perf may provide additional information.
# perf stat -a -e '{cycles,cache-misses,branch-misses}:ue' sleep 1
Performance counter stats for 'system wide':
746,363,126 cycles
16,881,611 cache-misses
2,871,259 branch-misses
1.001636066 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201014144255.22699-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a test for the build id cache that adds a binary with sha1 and md5
build ids and verifies it's added properly.
The test updates build id cache with 'perf record' and 'perf buildid-cache -a'.
Committer testing:
# perf test "build id"
82: build id cache operations : Ok
#
# perf test -v "build id"
82: build id cache operations :
--- start ---
test child forked, pid 447218
test binaries: /tmp/perf.ex.SHA1.B8I /tmp/perf.ex.MD5.7Nv
Adding d1abc1eb7568358cf23c959566f23462461834d1 /tmp/perf.ex.SHA1.B8I: Ok
build id: d1abc1eb7568358cf23c959566f23462461834d1
link: /tmp/perf.debug.sS2/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1
file: /tmp/perf.debug.sS2/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf
OK for /tmp/perf.ex.SHA1.B8I
Adding a50e350e97c43b4708d09bcd85ebfff7 /tmp/perf.ex.MD5.7Nv: Ok
build id: a50e350e97c43b4708d09bcd85ebfff7
link: /tmp/perf.debug.IuW/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7
file: /tmp/perf.debug.IuW/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf
OK for /tmp/perf.ex.MD5.7Nv
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.034 MB /tmp/perf.data.xrH ]
build id: d1abc1eb7568358cf23c959566f23462461834d1
link: /tmp/perf.debug.eGR/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1
file: /tmp/perf.debug.eGR/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf
OK for /tmp/perf.ex.SHA1.B8I
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.034 MB /tmp/perf.data.cbE ]
build id: a50e350e97c43b4708d09bcd85ebfff7
link: /tmp/perf.debug.82t/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7
file: /tmp/perf.debug.82t/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf
OK for /tmp/perf.ex.MD5.7Nv
test child finished with 0
---- end ----
build id cache operations: Ok
#
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With shorter md5 build ids we need to align their paths properly with
other build ids:
$ perf buildid-list
17f4e448cc746582ea1881528deb549f7fdb3fd5 [kernel.kallsyms]
a50e350e97c43b4708d09bcd85ebfff7 .../tools/perf/buildid-ex-md5
1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We do not store size with build ids in perf data, but there's enough
space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark
build id event with size.
With this fix the dso with md5 build id will have correct build id data
and will be usable for debuginfod processing if needed (coming in
following patches).
Committer notes:
Use %zu with size_t to fix this error on 32-bit arches:
util/header.c: In function '__event_process_build_id':
util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
pr_debug("build id event received for %s: %s [%lu]\n",
^
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing build_id object to dso__build_id_equal(), so we can properly
check build id with different size than sha1.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing build_id object to dso__set_build_id(), so it's easier
to initialize dos's build id object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing build_id object to build_id__sprintf function, so it can operate
with the proper size of build id.
This will create proper md5 build id readable names,
like following:
a50e350e97c43b4708d09bcd85ebfff7
instead of:
a50e350e97c43b4708d09bcd85ebfff700000000
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing build id object to sysfs__read_build_id function, so it can
populate the size of the build_id object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pass a build_id object to filename__read_build_id function, so it can
populate the size of the build_id object.
Changing filename__read_build_id() code for both ELF/non-ELF code.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replace build_id byte array with struct build_id object and all the code
that references it.
The objective is to carry size together with build id array, so it's
better to keep both together.
This is preparatory change for following patches, and there's no
functional change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The kselftests test running infrastructure expects tests to finish with an
exit code of 4 if the test decided it should be skipped. Currently
eeh-basic.sh exits with the number of devices that failed to recover, so if
four devices didn't recover we'll report a skip instead of a fail.
Fix this by checking if the return code is non-zero and report success
and failure by returning 0 or 1 respectively. For the cases where should
actually skip return 4.
Fixes: 85d86c8aa5 ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014024711.1138386-1-oohall@gmail.com
This patch reduces the running time for compaction_test from about 27 sec,
to 3.3 sec, which is about an 8x speedup.
These numbers are for an Intel x86_64 system with 32 GB of DRAM.
The compaction_test.c program was spending most of its time doing mmap(),
1 MB at a time, on about 25 GB of memory.
Instead, do the mmaps 100 MB at a time. (Going past 100 MB doesn't make
things go much faster, because other parts of the program are using the
remaining time.)
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sri Jayaramappa <sjayaram@akamai.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20201002080621.551044-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some tests might not be able to be run if resources like huge pages are
not available. Mark these tests as skipped instead of simply passing.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/20200827190400.12608-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid accidental wrong builds, due to built-in rules working just a little
bit too well--but not quite as well as required for our situation here.
In other words, "make userfaultfd" (for example) is supposed to fail to
build at all, because this Makefile only supports either "make" (all), or
"make /full/path". However, the built-in rules, if not suppressed, will
pick up CFLAGS and the initial LDLIBS (but not the target-specific LDLIBS,
because those are only set for the full path target!). This causes it to
get pretty far into building things despite using incorrect values such as
an *occasionally* incomplete LDLIBS value.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lkml.kernel.org/r/20200915012901.1655280-3-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "selftests/vm: fix some minor aggravating factors in the Makefile".
This fixes a couple of minor aggravating factors that I ran across while
trying to do some changes in selftests/vm. These are simple things, but
like most things with GNU Make, it's rarely obvious what's wrong until you
understand *the entire Makefile and all of its includes*.
So while there is, of course, joy in learning those details, I thought I'd
fix these little things, so as to allow others to skip out on the Joy if
they so choose. :)
First of all, if you have an item (let's choose userfaultfd for an
example) that fails to build, you might do this:
$ make -j32
# ...you observe a failed item in the threaded output
# OK, let's get a closer look
$ make
# ...but now the build quietly "succeeds".
That's what Patch 0001 fixes.
Second, if you instead attempt this approach for your closer look (a casual
mistake, as it's not supported):
$ make userfaultfd
# ...userfaultfd fails to link, due to incomplete LDLIBS
That's what Patch 0002 fixes.
This patch (of 2):
If one or more of these selftest fail to build, then after the first
failure, subsequent invocations of "make" will make it appear that there
are no build failures, after all.
That's because the failed build products remain, with up-to-date
timestamps, thus tricking Make (and you!) into believing that there's
nothing else to build.
Fix this by telling Make to delete targets that didn't completely
succeed.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Link: https://lkml.kernel.org/r/20200915012901.1655280-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20200915012901.1655280-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to Documentation/core-api/pin_user_pages.rst, FOLL_PIN is a
prerequisite to FOLL_LONGTERM. Another way of saying that is,
FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.
Almost all kernel modules are using pin_user_pages() with FOLL_LONGTERM,
mm/gup_benchmark.c seems to the only exception in which FOLL_PIN is not a
prerequisite to FOLL_LONGTERM.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200815122056.29508-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Break the requirement that device-dax instances are physically contiguous.
With this constraint removed it allows fragmented available capacity to
be fully allocated.
This capability is useful to mitigate the "noisy neighbor" problem with
memory-side-cache management for virtual machines, or any other scenario
where a platform address boundary also designates a performance boundary.
For example a direct mapped memory side cache might rotate cache colors at
1GB boundaries. With dis-contiguous allocations a device-dax instance
could be configured to contain only 1 cache color.
It also satisfies Joao's use case (see link) for partitioning memory for
exclusive guest access. It allows for a future potential mode where the
host kernel need not allocate 'struct page' capacity up-front.
Reported-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/
Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.
This is in preparation for introducing a multi-range extension of
devm_memremap_pages().
The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.
P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.
[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen]
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly. In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.
Pass the resource range via a new @range property of 'struct
dev_dax_data'.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- heavily refactor seccomp selftests (and clone3 selftests dependency) to
fix powerpc (Kees Cook, Thadeu Lima de Souza Cascardo)
- fix style issue in selftests (Zou Wei)
- upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich Felker)
- replace task_pt_regs(current) with current_pt_regs() (Denis Efremov)
- fix corner-case race in USER_NOTIF (Jann Horn)
- make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl+E1LAWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgRfD/0cq7W51+o34719vefC+oZaMjJJ
Bd5HYshmr6NRpMqn0OhtT9kVi6OeV0sK0VJeNxSISDIaGNJ8xCI9YhnXwzY+7myK
+IQu3i2Hv7dlWvTaXWFLL+mvfk6WopLntFGGJQ8KPMnP2gcfH2AZmOeAKGFGhBDe
NwpAUZ9zriXg9JCQp6u0FzPJgk8KfgfHjUY6Hsa095gg0aPSJhc8bWEUNBQwjCe6
uIcxDP/zK2WWaEhO9BfHt6/VTcXw7QgTLS3yM+pwBCgR1JHs7HMhtgcwPT410qES
LmYD8OiHmv5AZhDjcCcNipKEv3ZnxkLnpU/6hfaKM4zn/DoaR/zbfjO9U017rcNV
9gf7k5siAP7DH48IFlqf4Erzd3xyF0OJDnVfC7NiPtggPfO9aWOHJJZCuJRQOdrN
qPMjkaQzFb02qb501PLEn55F24OLDjz1vFOqpkJm2/XamOBVV4uiRKmfpNEo/MOf
QkhSvzvwEFErWwzPH95uFyVhs42stwnM3ppnwtya2+U5kxXdNvbAR8N5leH7siaU
ab+YJIHW59+BxXTlKgXIcqBP/6RqJWJtuT9OqGs0K2A7FhQSexh5MOm+9vvGgIwZ
Qjyijku8dB3aV94BNGnlJq6BV+4Hc6EGadh7h3b8GiRAUTYo0pk5G/iKL6Ii+R6p
0msJENqalKFtNCr70w==
=a4u2
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"The bulk of the changes are with the seccomp selftests to accommodate
some powerpc-specific behavioral characteristics. Additional cleanups,
fixes, and improvements are also included:
- heavily refactor seccomp selftests (and clone3 selftests
dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
Cascardo)
- fix style issue in selftests (Zou Wei)
- upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
Felker)
- replace task_pt_regs(current) with current_pt_regs() (Denis
Efremov)
- fix corner-case race in USER_NOTIF (Jann Horn)
- make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"
* tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
seccomp: Make duplicate listener detection non-racy
seccomp: Move config option SECCOMP to arch/Kconfig
selftests/clone3: Avoid OS-defined clone_args
selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
selftests/seccomp: Allow syscall nr and ret value to be set separately
selftests/seccomp: Record syscall during ptrace entry
selftests/seccomp: powerpc: Fix seccomp return value testing
selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
selftests/seccomp: Avoid redundant register flushes
selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Remove syscall setting #ifdefs
selftests/seccomp: mips: Remove O32-specific macro
selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
selftests/seccomp: Provide generic syscall setting macro
selftests/seccomp: Refactor arch register macros to avoid xtensa special case
selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
selftests/seccomp: Use bitwise instead of arithmetic operator for flags
...
We'll use it to ask for extra config files to be loaded, profile like
stuff that will be used first to make 'perf trace' mimic 'strace' output
via a 'perf strace' command that just sets up 'perf trace' output.
At some point it'll be used for regression tests, where we'll run some
simple commands like:
perf strace ls > perf-strace.output
strace ls > strace.output
And then do some mutable syscall arg aware diff like tool to deal with
arguments for things like mmap, that change at each execution, to be
first ignored and then properly tracked when used accoss multiple
syscalls.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some distros don't come with python2 and only have python3 available.
This causes the "'import perf' in python" self test to fail.
This change adds python3 to the list of possible python versions
that are autodetected but maintains the priorities for
'python2' and 'python' detection. Python3 has the lowest priority.
Committer notes:
On a fedora system without python2 packages the 'perf test python'
continues to work:
# python2
bash: python2: command not found...
Similar command is: 'python'
# rpm -qa | grep python2
#
That "Similar command" gives the clue:
# rpm -qf /usr/bin/python
python-unversioned-command-3.8.5-5.fc32.noarch
# rpm -ql python-unversioned-command
/usr/bin/python
/usr/share/man/man1/python.1.gz
#
With it in place the 'python' binary is found and perf builds the python
binding using python3:
# perf test -v python
19: 'import perf' in python :
--- start ---
test child forked, pid 379988
python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python' "
test child finished with 0
---- end ----
'import perf' in python: Ok
#
Looking at that path:
# ls -la /tmp/build/perf/python
total 1864
drwxrwxr-x. 2 acme acme 60 Oct 13 16:20 .
drwxrwxr-x. 18 acme acme 4420 Oct 13 16:28 ..
-rwxrwxr-x. 1 acme acme 1907216 Oct 13 16:28 perf.cpython-38-x86_64-linux-gnu.so
#
And:
# ldd ~/bin/perf | grep python
libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f5471187000)
#
As soon as we remove it:
# rpm -e python-unversioned-command-3.8.5-5.fc32.noarch
# hash -r
# python
bash: python: command not found...
Install package 'python-unversioned-command' to provide command 'python'? [N/y] n
#
And rebuilding perf now doesn't find python in the system:
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j24' parallel build
<SNIP>
Makefile.config:786: No python interpreter was found: disables Python support - please install python-devel/python-dev
<SNIP>
After this patch:
$ rpm -qi python-unversioned-command
package python-unversioned-command is not installed
$
$ python
bash: python: command not found...
Install package 'python-unversioned-command' to provide command 'python'? [N/y] ^C
$
$ m
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j24' parallel build
<SNIP>
CC /tmp/build/perf/tests/attr.o
CC /tmp/build/perf/tests/python-use.o
DESCEND plugins
GEN /tmp/build/perf/python/perf.so
INSTALL trace_plugins
LD /tmp/build/perf/tests/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
<SNIP>
make: Leaving directory '/home/acme/git/perf/tools/perf'
19: 'import perf' in python : Ok
$ ldd ~/bin/perf | grep python
libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f2c8c708000)
$ ls -la /tmp/build/perf/python
total 1864
drwxrwxr-x. 2 acme acme 60 Oct 13 16:20 .
drwxrwxr-x. 18 acme acme 4420 Oct 13 16:31 ..
-rwxrwxr-x. 1 acme acme 1907216 Oct 13 16:31 perf.cpython-38-x86_64-linux-gnu.so
$
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LPU-Reference: 20201005080645.6588-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To help figure out where it is getting the binding.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
ZwfpDh2+Tg==
=LzyE
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
Currently BUILD_BUG() macro is expanded to smth like the following:
do {
extern void __compiletime_assert_0(void)
__attribute__((error("BUILD_BUG failed")));
if (!(!(1)))
__compiletime_assert_0();
} while (0);
If used in a function body this obviously would produce build errors
with -Wnested-externs and -Werror.
To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
includes in intel-pt-decoder, build perf without -Wnested-externs.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Core changes:
- The big core change is the updated (v2) userspace character
device API. This corrects badly designed 64-bit alignment around
the line events. We also add the debounce request feature.
This echoes the often quotes passage from Frederick Brooks
"The mythical man-month" to always throw one away, which we
have seen before in things such as V4L2. So we put in a new
one and deprecate and obsolete the old one.
- All example tools in tools/gpio/* are migrated to the new API
to set a good example. The libgpiod userspace library has been
augmented to use this new API pretty much from day 1.
- Some misc API hardening by using strn* function calls has been
added as well.
- Use the simpler IDA interface for GPIO chip instance enumeration.
- Add device core function for counting string arrays in
device properties.
- Provide a generic library function kfree_strarray() that can
be used throughout the kernel.
Driver enhancements:
- The DesignWare dwapb-gpio driver has been enhanced and now
uses the IRQ handling in the gpiolib core.
- The mockup and aggregator drivers have seen some substantial
code clean-up and now use more of the core kernel
inftrastructure.
- Misc cleanups using dev_err_probe().
- The MXC drivers (Freescale/NXP) can now be built modularized,
which makes modularized GKI Android kernels happy.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAl+FdjkACgkQQRCzN7AZ
XXMYgQ/+JgpHrp7yS1IkS1KiAxHdeIGnKzloTCQQo1JxYEymAnIeMwo/iWAk5wHu
NeJIEVxD0YzZwoI3BXbnO5Qy/62g1z7Ik8ToIa0TiFMwYxz5a7lqsiHwpBgHa50h
T2N8FRFdslVrhpUYBH4Q9wlfYxTki4FwdTD6aaoFFGcMwIVJXWyaYzE+o+qEUEne
VaPsGoNhRKTdKASP3c6+zbbPonzpZW7s/wvIBQAyBgPxEizlL97RzzX3bSSraoCX
i0NsDLHMe+9twqE064KN+CYu0Cy80etQSQsYcfnstVshMuY9+WC1YdyJqzYMciuQ
CYUIQBeskft86IBlsEU/fNCbV+FeAgrxRW6TJK7Hn+sUWZ5+UGdpJ03UE1hA3jjO
SniwG0vpqvZIkio49B6h51VdjNqVJn+AE8tN3hCzqpFknblXgJOVysD7RS7rNM6D
flV1bCsUYtC6jN43qsGFiRYLE9ml2iUxFFoBQUaAEh+pXgUzPTQqD7aSjyzmE3x2
uapKXgxN0dCNH+tFXij73Ro4bYf4ZTZhx3Z3XoEUNEyJpl8fE1bv1SZ2EykOmK8g
c78fAmT0vG3xYZvK10WZj4zuHV6GlPAYVm/MlhB7QHsrF3wa9vervOuqhEPmp2th
hTsVj/Zlz0SSDLncMQL64B7gbxOmzOYlVRxIkSrDEXUOFU7kiWE=
=8CE2
-----END PGP SIGNATURE-----
Merge tag 'gpio-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This time very little driver changes but lots of core changes.
We have some interesting cooperative work for ARM and Intel alike,
making the GPIO subsystem more and more suitable for industrial
systems and the like, in addition to the in-kernel users.
We touch driver core (device properties) and lib/* by adding one
simple string array free function, these are authored by Andy
Shevchenko who is a well known and recognized core helpers maintainers
so this should be fine.
We also see some Android GKI-related modularization in the MXC
drivers.
Core changes:
- The big core change is the updated (v2) userspace character device
API.
This corrects badly designed 64-bit alignment around the line
events. We also add the debounce request feature. This echoes the
often quotes passage from Frederick Brooks "The mythical man-month"
to always throw one away, which we have seen before in things such
as V4L2. So we put in a new one and deprecate and obsolete the old
one.
- All example tools in tools/gpio/* are migrated to the new API to
set a good example. The libgpiod userspace library has been
augmented to use this new API pretty much from day 1.
- Some misc API hardening by using strn* function calls has been
added as well.
- Use the simpler IDA interface for GPIO chip instance enumeration.
- Add device core function for counting string arrays in device
properties.
- Provide a generic library function kfree_strarray() that can be
used throughout the kernel.
Driver enhancements:
- The DesignWare dwapb-gpio driver has been enhanced and now uses the
IRQ handling in the gpiolib core.
- The mockup and aggregator drivers have seen some substantial code
clean-up and now use more of the core kernel inftrastructure.
- Misc cleanups using dev_err_probe().
- The MXC drivers (Freescale/NXP) can now be built modularized, which
makes modularized GKI Android kernels happy"
* tag 'gpio-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (73 commits)
gpiolib: Update header block in gpiolib-cdev.h
gpiolib: cdev: switch from kstrdup() to kstrndup()
docs: gpio: add a new document to its index.rst
gpio: pca953x: Add support for the NXP PCAL9554B/C
tools: gpio: add debounce support to gpio-event-mon
tools: gpio: add multi-line monitoring to gpio-event-mon
tools: gpio: port gpio-event-mon to v2 uAPI
tools: gpio: port gpio-hammer to v2 uAPI
tools: gpio: rename nlines to num_lines
tools: gpio: port gpio-watch to v2 uAPI
tools: gpio: port lsgpio to v2 uAPI
gpio: uapi: document uAPI v1 as deprecated
gpiolib: cdev: support setting debounce
gpiolib: cdev: support GPIO_V2_LINE_SET_VALUES_IOCTL
gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL
gpiolib: cdev: support edge detection for uAPI v2
gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL
gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL
gpiolib: add build option for CDEV v1 ABI
gpiolib: make cdev a build option
...
'perf trace ls' started crashing after commit d21cb73a90 on
!HAVE_SYSCALL_TABLE_SUPPORT configs (armv7l here) like this:
0 strlen () at ../sysdeps/arm/armv6t2/strlen.S:126
1 0xb6800780 in __vfprintf_internal (s=0xbeff9908, s@entry=0xbeff9900, format=0xa27160 "]: %s()", ap=..., mode_flags=<optimized out>) at vfprintf-internal.c:1688
...
5 0x0056ecdc in fprintf (__fmt=0xa27160 "]: %s()", __stream=<optimized out>) at /usr/include/bits/stdio2.h:100
6 trace__sys_exit (trace=trace@entry=0xbeffc710, evsel=evsel@entry=0xd968d0, event=<optimized out>, sample=sample@entry=0xbeffc3e8) at builtin-trace.c:2475
7 0x00566d40 in trace__handle_event (sample=0xbeffc3e8, event=<optimized out>, trace=0xbeffc710) at builtin-trace.c:3122
...
15 main (argc=2, argv=0xbefff6e8) at perf.c:538
It is because memset in trace__read_syscall_info zeroes wrong memory:
1) when initializing for the first time, it does not reset the last id.
2) in other cases, it resets the last id of previous buffer.
ad 1) it causes the crash above as sc->name used in the fprintf above
contains garbage.
ad 2) it sets nonexistent from true back to false for id 11 here. Not
sure, what the consequences are.
So fix it by introducing a special case for the initial initialization
and do the right +1 in both cases.
Fixes: d21cb73a90 ("perf trace: Grow the syscall table as needed when using libaudit")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201001093419.15761-1-jslaby@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit b027cc6fdf ("perf c2c: Fix 'perf c2c record -e list' to
show the default events used"), "perf c2c" tool can show the memory
events properly, it's no reason to still suggest user to use the
command "perf mem record -e list" for showing events.
This patch updates the usage for showing memory events with command
"perf c2c record -e list".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201011121022.22409-1-leo.yan@linaro.org
There are internal library functions, which are not declared as a static.
They are used inside the library from different files. Hide them from
the library users, as they are not part of the API.
These functions are made hidden and are renamed without the prefix "tep_":
tep_free_plugin_paths
tep_peek_char
tep_buffer_init
tep_get_input_buf_ptr
tep_get_input_buf
tep_read_token
tep_free_token
tep_free_event
tep_free_format_field
__tep_parse_format
Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf sched latency' tool is really useful at showing worst-case
latencies that task encountered since wakeup. However it shows only the
end of the latency. Often times the start of a latency is interesting as
it can show what else was going on at the time to cause the latency. I
certainly myself spending a lot of time backtracking to the start of the
latency in "perf sched script" which wastes a lot of time.
This patch therefore adds a new column "Max delay start". Considering
this, also rename "Maximum delay at" to "Max delay end" as its easier to
understand.
Example of the new output:
----------------------------------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end |
----------------------------------------------------------------------------------------------------------------------------------
MediaScannerSer:11936 | 651.296 ms | 67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s
audio@2.0-servi:(3) | 0.000 ms | 3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s
AudioOut_1D:8112 | 0.000 ms | 2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s
Time-limited te:14973 | 7966.090 ms | 24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s
surfaceflinger:8049 | 9.680 ms | 603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s
HeapTaskDaemon:(3) | 1588.830 ms | 7040 | avg: 0.065 ms | max: 6.880 ms | max start: 473.666043 s | max end: 473.672922 s
mount-passthrou:(3) | 1370.809 ms | 68904 | avg: 0.011 ms | max: 6.524 ms | max start: 478.090630 s | max end: 478.097154 s
ReferenceQueueD:(3) | 11.794 ms | 1725 | avg: 0.014 ms | max: 6.521 ms | max start: 476.119782 s | max end: 476.126303 s
writer:14077 | 18.410 ms | 1427 | avg: 0.036 ms | max: 6.131 ms | max start: 474.169675 s | max end: 474.175805 s
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For comparison, it now runs the benchmark twice - one if regular -b and
another for --buildid-all.
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 21.002 msec (+- 0.172 msec)
Average time per event: 2.059 usec (+- 0.017 usec)
Average memory usage: 8169 KB (+- 0 KB)
Average build-id-all injection took: 19.543 msec (+- 0.124 msec)
Average time per event: 1.916 usec (+- 0.012 usec)
Average memory usage: 7348 KB (+- 0 KB)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like 'perf record', we can even more speedup build-id processing by just
using all DSOs. Then we don't need to look at all the sample events
anymore. The following patch will update 'perf bench' to show the result
of the --buildid-all option too.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Original-patch-by: Stephane Eranian <eranian@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It should be in a proper mnt namespace when accessing the file.
I think this had no problem since the build-id was actually read from
map__load() -> dso__load() already. But I'd like to change it in the
following commit.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I found some events (like PERF_RECORD_CGROUP) are not copied by perf
inject due to the missing callbacks. Let's add them.
While at it, I've changed the order of the callbacks to match with
struct perf_tool so that we can compare them easily.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the BUILD_BUG() macro is expanded to the following:
do {
extern void __compiletime_assert_0(void)
__attribute__((error("BUILD_BUG failed")));
if (!(!(1)))
__compiletime_assert_0();
} while (0);
If used in a function body this would obviously produce build errors
with -Wnested-externs and -Werror.
To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
includes in intel-pt-decoder, build perf without -Wnested-externs.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours
Pull compat mount cleanups from Al Viro:
"The last remnants of mount(2) compat buried by Christoph.
Buried into NFS, that is.
Generally I'm less enthusiastic about "let's use in_compat_syscall()
deep in call chain" kind of approach than Christoph seems to be, but
in this case it's warranted - that had been an NFS-specific wart,
hopefully not to be repeated in any other filesystems (read: any new
filesystem introducing non-text mount options will get NAKed even if
it doesn't mess the layout up).
IOW, not worth trying to grow an infrastructure that would avoid that
use of in_compat_syscall()..."
* 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove compat_sys_mount
fs,nfs: lift compat nfs4 mount data handling into the nfs code
nfs: simplify nfs4_parse_monolithic
Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in <linux/compat.h>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-10-12
The main changes are:
1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.
2) libbpf relocation support for different size load/store, from Andrii.
3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.
4) BPF support for per-cpu variables, form Hao.
5) sockmap improvements, from John.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
applied to indirect function calls. Remove a data load (indirection) by
modifying the text.
They give the flexibility of function pointers, but with better
performance. (This is especially important for cases where
retpolines would otherwise be used, as retpolines can be pretty
slow.)
API overview:
DECLARE_STATIC_CALL(name, func);
DEFINE_STATIC_CALL(name, func);
DEFINE_STATIC_CALL_NULL(name, typename);
static_call(name)(args...);
static_call_cond(name)(args...);
static_call_update(name, func);
x86 is supported via text patching, otherwise basic indirect calls are used,
with function pointers.
There's a second variant using inline code patching, inspired by jump-labels,
implemented on x86 as well.
The new APIs are utilized in the x86 perf code, a heavy user of function pointers,
where static calls speed up the PMU handler by 4.2% (!).
The generic implementation is not really excercised on other architectures,
outside of the trivial test_static_call_init() self-test.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EfAQRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iEAw//divHeVCJnHhV+YBbuI9ROUsERkzu8VhK
O1DEmW68Fvj7pszT8NZsMjtkt97ZtxDRK7aCJiiup0eItG9qCJ8lpCLb84ZbizHV
HhCbhBLrpxSvTrWlQnkgP1OkPAbtoryIjVlZzWhjye2MY8UEbVnZWyviBolbAAxH
Fk1Yi56fIMu19GO+9Ohzy9E2VDnVEH1iMx5YWoLD2H88Qbq/yEMP+U2tIj8hIVKT
Y/jdogihNXRIau6QB+YPfDPisdty+RHxfU7zct4Rv8cFF5ylglZB5fD34C3sUQF2
WqsaYz7zjUj9f02F8pw8hIaAT7InzArPhlNVITxf2oMfmdrNqBptnSCddZqCJLvv
oDGew21k50Zcbqkv9amclpxXH5tTpRvJeqit2pz/85GMeqBRuhzHUAkCpht5YA73
qJsHWS3z+qIxKi0tDbhDJswuwa51q5sgdUUwo1uCr3wT3DGDlqNhCAZBzX14dcty
0shDSbv13TCwqAcb7asPzEoPwE15cwa+x+viGEIL901pyZKyQYjs/abDU26It3BW
roWRkuVJZ9/QMdZJs1v7kaXw1L8YiKIDkBgke+xbfrDwEvvjudQkl2LUL66DB11j
RJU3GyxKClvdY06SSRh/H13fqZLNKh1JZ0nPEWSTJECDFN9zcDjrDrod/7PFOcpY
NAlawLoGG+s=
=JvpF
-----END PGP SIGNATURE-----
Merge tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull static call support from Ingo Molnar:
"This introduces static_call(), which is the idea of static_branch()
applied to indirect function calls. Remove a data load (indirection)
by modifying the text.
They give the flexibility of function pointers, but with better
performance. (This is especially important for cases where retpolines
would otherwise be used, as retpolines can be pretty slow.)
API overview:
DECLARE_STATIC_CALL(name, func);
DEFINE_STATIC_CALL(name, func);
DEFINE_STATIC_CALL_NULL(name, typename);
static_call(name)(args...);
static_call_cond(name)(args...);
static_call_update(name, func);
x86 is supported via text patching, otherwise basic indirect calls are
used, with function pointers.
There's a second variant using inline code patching, inspired by
jump-labels, implemented on x86 as well.
The new APIs are utilized in the x86 perf code, a heavy user of
function pointers, where static calls speed up the PMU handler by
4.2% (!).
The generic implementation is not really excercised on other
architectures, outside of the trivial test_static_call_init()
self-test"
* tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
static_call: Fix return type of static_call_init
tracepoint: Fix out of sync data passing by static caller
tracepoint: Fix overly long tracepoint names
x86/perf, static_call: Optimize x86_pmu methods
tracepoint: Optimize using static_call()
static_call: Allow early init
static_call: Add some validation
static_call: Handle tail-calls
static_call: Add static_call_cond()
x86/alternatives: Teach text_poke_bp() to emulate RET
static_call: Add simple self-test for static calls
x86/static_call: Add inline static call implementation for x86-64
x86/static_call: Add out-of-line static call implementation
static_call: Avoid kprobes on inline static_call()s
static_call: Add inline static call infrastructure
static_call: Add basic static call infrastructure
compiler.h: Make __ADDRESSABLE() symbol truly unique
jump_label,module: Fix module lifetime for __jump_label_mod_text_reserved()
module: Properly propagate MODULE_STATE_COMING failure
module: Fix up module_notifier return values
...
- Add deadlock detection for recursive read-locks. The rationale is outlined
in:
224ec489d3cd: ("lockdep/Documention: Recursive read lock detection reasoning")
The main deadlock pattern we want to detect is:
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock_2(X);
- Add "latch sequence counters" (seqcount_latch_t):
A sequence counter variant where the counter even/odd value is used to
switch between two copies of protected data. This allows the read path,
typically NMIs, to safely interrupt the write side critical section.
We utilize this new variant for sched-clock, and to make x86 TSC handling safer.
- Other seqlock cleanups, fixes and enhancements
- KCSAN updates
- LKMM updates
- Misc updates, cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EX6QRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g3gxAAkg+Jy/tcdRxlxlEDOQPFy1mBqvFmulNA
pGFPkB6dzqmAWF/NfOZSl4g/h/mqGYsq2V+PfK5E8Sq8DQ/yCmnLhjgVOHNUUliv
x0WWfOysNgJdtdf69NLYJufIQhxhyI0dwFHHoHIsCdGdGqjh2DVevQFPFTBjdpOc
BUZYo+u3gCaCdB6A2nmlcWYbEw8eVEHgv3qLG6dq46J0KJOV0HfliqJoU3EZqH+s
977LvEIo+THfuYWMo/Jepwngbi0y36KeeukOAdwm9fK196htBHIUR+YPPrAe+FWD
z+UXP5IS5XIw9V1sGLmUaC2m+6gpdW19jKBtlzPkxHXmJmsgiZdLLeytEh3WYey7
nzfH+9Jd4NyyZKucLssYkOjf6P5BxGKCyJ9LXb7vlSthIhiDdFNx47oKtW4hxjOY
jubsI3BP5c3G1sIBIjTS53XmOhJg+Z52FxTpQ33JswXn1wGidcHZiuNHZuU5q28p
+tn8rGb2NGJFb4Sw/Vp0yTcqIpEXf+vweiQoaxm6tc9BWzcVzZntGnh0i3gFotx/
VgKafN4+pgXgo6bwHbN2WBK2FGyvcXFaptfaOMZL48En82hJ1DI6EnBEYN+vuERQ
JcCXg+iHeeVbxoou7q8NJxITkBmEL5xNBIugXRRqNSP3fXLxKjFuPYqT84/e7yZi
elGTReYcq6g=
=Iq51
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"These are the locking updates for v5.10:
- Add deadlock detection for recursive read-locks.
The rationale is outlined in commit 224ec489d3 ("lockdep/
Documention: Recursive read lock detection reasoning")
The main deadlock pattern we want to detect is:
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock_2(X);
- Add "latch sequence counters" (seqcount_latch_t):
A sequence counter variant where the counter even/odd value is used
to switch between two copies of protected data. This allows the
read path, typically NMIs, to safely interrupt the write side
critical section.
We utilize this new variant for sched-clock, and to make x86 TSC
handling safer.
- Other seqlock cleanups, fixes and enhancements
- KCSAN updates
- LKMM updates
- Misc updates, cleanups and fixes"
* tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
lockdep: Fix lockdep recursion
lockdep: Fix usage_traceoverflow
locking/atomics: Check atomic-arch-fallback.h too
locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc
lockdep: Optimize the memory usage of circular queue
seqlock: Unbreak lockdep
seqlock: PREEMPT_RT: Do not starve seqlock_t writers
seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
seqlock: seqcount_t: Implement all read APIs as statement expressions
seqlock: Use unique prefix for seqcount_t property accessors
seqlock: seqcount_LOCKNAME_t: Standardize naming convention
seqlock: seqcount latch APIs: Only allow seqcount_latch_t
rbtree_latch: Use seqcount_latch_t
x86/tsc: Use seqcount_latch_t
timekeeping: Use seqcount_latch_t
time/sched_clock: Use seqcount_latch_t
seqlock: Introduce seqcount_latch_t
mm/swap: Do not abuse the seqcount_t latching API
time/sched_clock: Use raw_read_seqcount_latch() during suspend
...
- Reorganize & clean up the SD* flags definitions and add a bunch
of sanity checks. These new checks caught quite a few bugs or at
least inconsistencies, resulting in another set of patches.
- Rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
- Add a new tracepoint to improve CPU capacity tracking
- Improve overloaded SMP system load-balancing behavior
- Tweak SMT balancing
- Energy-aware scheduling updates
- NUMA balancing improvements
- Deadline scheduler fixes and improvements
- CPU isolation fixes
- Misc cleanups, simplifications and smaller optimizations.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EWRERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hV8A/7BB0nt/zYVZ8Z3Di8V0b9hMtr0d1xtRM5
ZAvg4hcZl/fVgobFndxBw6KdlK8lSce9Mcq+bTTWeD46CS13cK5Vrpiaf7x7Q00P
m8YHeYEH13ME0pbBrhDoRCR4XzfXukzjkUl7LiyrTekAvRUtFikJ/uKl8MeJtYGZ
gANEkadqforxUW0v45iUEGepmCWAl8hSlSMb2mDKsVhw4DFMD+px0EBmmA0VDqjE
e0rkh6dEoUVNqlic2KoaXULld1rLg1xiaOcLUbTAXnucfhmuv5p/H11AC4ABuf+s
7d0zLrLEfZrcLJkthYxfMHs7DYMtARiQM9Db/a5hAq9Af4Z2bvvVAaHt3gCGvkV1
llB6BB2yWCki9Qv7oiGOAhANnyJHG/cU4r6WwMuHdlYi4dFT/iN5qkOMUL1IrDgi
a6ZzvECChXBeisQXHSlMd8Y5O+j0gRvDR7E18z2q0/PlmO8PGJq4w34mEWveWIg3
LaVF16bmvaARuNFJTQH/zaHhjqVQANSMx5OIv9swp0OkwvQkw21ICYHG0YxfzWCr
oa/FESEpOL9XdYp8UwMPI0bmVIsEfx79pmDMF3zInYTpJpwMUhV2yjHE8uYVMqEf
7U8rZv7gdbZ2us38Gjf2l73hY+recp/GrgZKnk0R98OUeMk1l/iVP6dwco6ITUV5
czGmKlIB1ec=
=bXy6
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- reorganize & clean up the SD* flags definitions and add a bunch of
sanity checks. These new checks caught quite a few bugs or at least
inconsistencies, resulting in another set of patches.
- rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
- add a new tracepoint to improve CPU capacity tracking
- improve overloaded SMP system load-balancing behavior
- tweak SMT balancing
- energy-aware scheduling updates
- NUMA balancing improvements
- deadline scheduler fixes and improvements
- CPU isolation fixes
- misc cleanups, simplifications and smaller optimizations
* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
sched/deadline: Unthrottle PI boosted threads while enqueuing
sched/debug: Add new tracepoint to track cpu_capacity
sched/fair: Tweak pick_next_entity()
rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
rseq/selftests,x86_64: Add rseq_offset_deref_addv()
rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
sched/fair: Use dst group while checking imbalance for NUMA balancer
sched/fair: Reduce busy load balance interval
sched/fair: Minimize concurrent LBs between domain level
sched/fair: Reduce minimal imbalance threshold
sched/fair: Relax constraint on task's load during load balance
sched/fair: Remove the force parameter of update_tg_load_avg()
sched/fair: Fix wrong cpu selecting from isolated domain
sched: Remove unused inline function uclamp_bucket_base_value()
sched/rt: Disable RT_RUNTIME_SHARE by default
sched/deadline: Fix stale throttling on de-/boosted tasks
sched/numa: Use runnable_avg to classify node
sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
...
respective selftests.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EMr8ACgkQEsHwGGHe
VUpoBQ/+Pe7MaMbS7d+IATzhedKW7PNKcQr2g734rnZkz0BgG+1sBkCQPRAKBZoa
SIvt3gURM3aEeD4Dp3am4nLElyhldZrlwKoGKGv6AYv2BpLPQM9PG0fHJGUyYBze
ekKMdPu5YK0hYqoWctrY8h+qbExNdkfAvM7bJMFJMBqypVicm0n5wlgfZthGz0DU
tkD34WZNE2GGAfsi/NNJ2H+hcZo8bQVrqW98bkgdzIA7+KI3cyZZ132VbKxb03tK
A69C7+J4B20q/traWFlb4mTcFy/a1Txrt3cJXIv/Xer74gDMqNYcciGgnTJdhryY
gzBmWNTxuQr1EC8DYJaxjlQbBp6VSAwhELlyc6UeRxLAViEpMxyPfBVMOYqraImc
sZ8QKGgI02PggInN9yo4qCbtUWAGMCHV7HGGW8stVBrh1lia7o6Dy9jhO+nmTnzV
EEe/vEoSsp/ydnkgFNjaRwjFLp+vDX2lAf513ZuZukpt+IGQ0nAO5phzgcZVAyH4
qzr9uXdM3j+NtlXZgLttNppWEvHxzIpkri3Ly46VUFYOqTuKYPmS8A7stEfqx3NO
T8g38+dDirFfKCoJz8NJBUUs+1KXer8QmJvogfHx5fsZ01Q2qz6AmOmsmMfe7Wqm
+C9mvZJOJnW/7NunGWGVsuAZJOPx6o2oivjVpxxOyl8AeQnE6fg=
=itDm
-----END PGP SIGNATURE-----
Merge tag 'x86_fsgsbase_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fsgsbase updates from Borislav Petkov:
"Misc minor cleanups and corrections to the fsgsbase code and
respective selftests"
* tag 'x86_fsgsbase_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86/fsgsbase: Test PTRACE_PEEKUSER for GSBASE with invalid LDT GS
selftests/x86/fsgsbase: Reap a forgotten child
x86/fsgsbase: Replace static_cpu_has() with boot_cpu_has()
x86/entry/64: Correct the comment over SAVE_AND_SET_GSBASE
encounter an MCE in kernel space but while copying from user memory by
sending them a SIGBUS on return to user space and umapping the faulty
memory, by Tony Luck and Youquan Song.
* memcpy_mcsafe() rework by splitting the functionality into
copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
support for new hardware which can recover from a machine check
encountered during a fast string copy and makes that the default and
lets the older hardware which does not support that advance recovery,
opt in to use the old, fragile, slow variant, by Dan Williams.
* New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.
* Do not use MSR-tracing accessors in #MC context and flag any fault
while accessing MCA architectural MSRs as an architectural violation
with the hope that such hw/fw misdesigns are caught early during the hw
eval phase and they don't make it into production.
* Misc fixes, improvements and cleanups, as always.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EIpUACgkQEsHwGGHe
VUouoBAAgwb+NkWZtIqGImV4f+LOyFjhTR/r/7ZyiijXdbhOIuAdc/jQM31mQxug
sX2jxaRYnf1n6SLA0ggX99gwr2deRQ/hsNf5Abw55GC+Z1dOxpGL0k59A3ELl1IR
H9KYmCAFQIHvzfk38qcdND73XHcgthQoXFBOG9wAPAdgDWnaiWt6lcLAq8OiJTmp
D8pInAYhcnL8YXwMGyQQ1KkFn9HwydoWDsK5Ff2shaw2/+dMQqd1zetenbVtjhLb
iNYGvV7Bi/RQ8PyMbzmtTWa4kwQJAHC2gptkGxty//2ADGVBbqUQdqF9TjIWCNy5
V6Ldv5zo0/1s7DOzji3htzqkSs/K1Ea6d2LtZjejkJipHKV5x068UC6Fu+PlfS2D
VZfcICeapU4G2F3Zvks2DlZ7dVTbHCvoI78Qi7bBgczPUVmk6iqah4xuQaiHyBJc
kTFDA4Nnf/026GpoWRiFry9vqdnHBZyLet5A6Y+SoWF0FbhYnCVPpq4MnussYoav
lUIi9ZZav6X2RZp9DDM1f9d5xubtKq0DKt93wvzqAhjK0T2DikckJ+riOYkI6N8t
fHCBNUkdfgyMzJUTBPAzYQ7RmjbjKWJi7xWP0oz6+GqOJkQfSTVC5/2yEffbb3ya
whYRS6iklbl7yshzaOeecXsZcAeK2oGPfoHg34WkHFgXdF5mNgA=
=u1Wg
-----END PGP SIGNATURE-----
Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Extend the recovery from MCE in kernel space also to processes which
encounter an MCE in kernel space but while copying from user memory
by sending them a SIGBUS on return to user space and umapping the
faulty memory, by Tony Luck and Youquan Song.
- memcpy_mcsafe() rework by splitting the functionality into
copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
support for new hardware which can recover from a machine check
encountered during a fast string copy and makes that the default and
lets the older hardware which does not support that advance recovery,
opt in to use the old, fragile, slow variant, by Dan Williams.
- New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.
- Do not use MSR-tracing accessors in #MC context and flag any fault
while accessing MCA architectural MSRs as an architectural violation
with the hope that such hw/fw misdesigns are caught early during the
hw eval phase and they don't make it into production.
- Misc fixes, improvements and cleanups, as always.
* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
x86/mce: Decode a kernel instruction to determine if it is copying from user
x86/mce: Recover from poison found while copying from user space
x86/mce: Avoid tail copy when machine check terminated a copy from user
x86/mce: Add _ASM_EXTABLE_CPY for copy user access
x86/mce: Provide method to find out the type of an exception handler
x86/mce: Pass pointer to saved pt_regs to severity calculation routines
x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
x86/mce: Add Skylake quirk for patrol scrub reported errors
RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
x86/mce: Annotate mce_rd/wrmsrl() with noinstr
x86/mce/dev-mcelog: Do not update kflags on AMD systems
x86/mce: Stop mce_reign() from re-computing severity for every CPU
x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
x86/mce: Increase maximum number of banks to 64
x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
RAS/CEC: Fix cec_init() prototype
- Userspace support for the Memory Tagging Extension introduced by Armv8.5.
Kernel support (via KASAN) is likely to follow in 5.11.
- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.
- Fix and subsequent rewrite of our Spectre mitigations, including the
addition of support for PR_SPEC_DISABLE_NOEXEC.
- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing page-tables with
the SMMU.
- MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op.
- Perf/PMU driver updates, including addition of the ARM CMN PMU driver and
also support to handle CPU PMU IRQs as NMIs.
- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT failure.
- Improve robustness of user-visible HWCAP strings and their corresponding
numerical constants.
- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.
- Cleanups to handling of SVE and FPSIMD register state in preparation
for potential future optimisation of handling across syscalls.
- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+AUXMQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNFc1B/4q2Kabe+pPu7s1f58Q+OTaEfqcr3F1qh27
F1YpFZUYxg0GPfPsFrnbJpo5WKo7wdR9ceI9yF/GHjs7A/MSoQJis3pG6SlAd9c0
nMU5tCwhg9wfq6asJtl0/IPWem6cqqhdzC6m808DjeHuyi2CCJTt0vFWH3OeHEhG
cfmLfaSNXOXa/MjEkT8y1AXJ/8IpIpzkJeCRA1G5s18PXV9Kl5bafIo9iqyfKPLP
0rJljBmoWbzuCSMc81HmGUQI4+8KRp6HHhyZC/k0WEVgj3LiumT7am02bdjZlTnK
BeNDKQsv2Jk8pXP2SlrI3hIUTz0bM6I567FzJEokepvTUzZ+CVBi
=9J8H
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.
In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.
Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.
We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.
Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.
Summary:
- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.
- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.
- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.
- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.
- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.
- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.
- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT
failure.
- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.
- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.
- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.
- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...
Here we add three new tests for sockmap to test having a verdict program
without setting the parser program.
The first test covers the most simply case,
sender proxy_recv proxy_send recv
| | |
| verdict -----+ |
| | | |
+----------------+ +------------+
We load the verdict program on the proxy_recv socket without a
parser program. It then does a redirect into the send path of the
proxy_send socket using sendpage_locked().
Next we test the drop case to ensure if we kfree_skb as a result of
the verdict program everything behaves as expected.
Next we test the same configuration above, but with ktls and a
redirect into socket ingress queue. Shown here
tls tls
sender proxy_recv proxy_send recv
| | |
| verdict ------------------+
| | redirect_ingress
+----------------+
Also to set up ping/pong test
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239302638.8495.17125996694402793471.stgit@john-Precision-5820-Tower
Add option to allow running without a parser program in place. To test
with ping/pong program use,
# test_sockmap -t ping --txmsg_omit_skb_parser
this will send packets between two socket bouncing through a proxy
socket that does not use a parser program.
(ping) (pong)
sender proxy_recv proxy_send recv
| | |
| verdict -----+ |
| | | |
+----------------+ +------------+
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239300387.8495.11908295143121563076.stgit@john-Precision-5820-Tower
add a test with re-queueing: usespace doesn't pass accept verdict,
but tells to re-queue to another nf_queue instance.
Also, make the second nf-queue program use non-gso mode, kernel will
have to perform software segmentation.
Lastly, do not queue every packet, just one per second, and add delay
when re-injecting the packet to the kernel.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Create a test that changes a VLAN ID from 200 to 300.
We also need to modify the preferences of the filters installed for the
other rules so that they are unique, because we now install the "tc-vlan
modify" filter in VCAP IS1 only temporarily, and we need to perform the
deletion by filter preference number.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the test_tc_redirect test and add a small test that exercises the new
redirect_peer() helper for the IPv4 and IPv6 case.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-7-daniel@iogearbox.net
Rename into test_tc_redirect.sh and move setup and test code into separate
functions so they can be reused for newly added tests in here. Also remove
the crude hack to override ifindex inside the object file via xxd and sed
and just use a simple map instead. Map given iproute2 does not support BTF
fully and therefore neither global data at this point.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-6-daniel@iogearbox.net
Extend the "diff_size" subtest to also include a non-inlined array map variant
where dynamic inner #elems are possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-5-daniel@iogearbox.net
Recent work in f4d0525921 ("bpf: Add map_meta_equal map ops") and 134fede4ee
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.
We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.
The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.
Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.
Example code generation where inner map is dynamic sized array:
# bpftool p d x i 125
int handle__sys_enter(void * ctx):
; int handle__sys_enter(void *ctx)
0: (b4) w1 = 0
; int key = 0;
1: (63) *(u32 *)(r10 -4) = r1
2: (bf) r2 = r10
;
3: (07) r2 += -4
; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
4: (18) r1 = map[id:468]
6: (07) r1 += 272
7: (61) r0 = *(u32 *)(r2 +0)
8: (35) if r0 >= 0x3 goto pc+5
9: (67) r0 <<= 3
10: (0f) r0 += r1
11: (79) r0 = *(u64 *)(r0 +0)
12: (15) if r0 == 0x0 goto pc+1
13: (05) goto pc+1
14: (b7) r0 = 0
15: (b4) w6 = -1
; if (!inner_map)
16: (15) if r0 == 0x0 goto pc+6
17: (bf) r2 = r10
;
18: (07) r2 += -4
; val = bpf_map_lookup_elem(inner_map, &key);
19: (bf) r1 = r0 | No inlining but instead
20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem()
; return val ? *val : -1; | for inner array lookup.
21: (15) if r0 == 0x0 goto pc+1
; return val ? *val : -1;
22: (61) r6 = *(u32 *)(r0 +0)
; }
23: (bc) w0 = w6
24: (95) exit
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
Add an efficient ingress to ingress netns switch that can be used out of tc BPF
programs in order to redirect traffic from host ns ingress into a container
veth device ingress without having to go via CPU backlog queue [0]. For local
containers this can also be utilized and path via CPU backlog queue only needs
to be taken once, not twice. On a high level this borrows from ipvlan which does
similar switch in __netif_receive_skb_core() and then iterates via another_round.
This helps to reduce latency for mentioned use cases.
Pod to remote pod with redirect(), TCP_RR [1]:
# percpu_netperf 10.217.1.33
RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 )
MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 )
STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 )
MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 )
P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 )
P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 )
P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 )
TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 )
Pod to remote pod with redirect_peer(), TCP_RR:
# percpu_netperf 10.217.1.33
RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 )
MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 )
STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 )
MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 )
P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 )
P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 )
P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 )
TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 )
[0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
[1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
Follow-up to address David's feedback that we should better describe internals
of the bpf_redirect_neigh() helper.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
TAP 14 allows an optional test plan to be emitted before the start of
the start of testing[1]; this is valuable because it makes it possible
for a test harness to detect whether the number of tests run matches the
number of tests expected to be run, ensuring that no tests silently
failed.
Link[1]: https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#the-plan
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
CalledProcessError stores the output of the failed process as `bytes`,
not a `str`.
So when we log it on build error, the make output is all crammed into
one line with "\n" instead of actually printing new lines.
After this change, we get readable output with new lines, e.g.
> CC lib/kunit/kunit-example-test.o
> In file included from ../lib/kunit/test.c:9:
> ../include/kunit/test.h:22:1: error: unknown type name ‘invalid_type_that_causes_compile’
> 22 | invalid_type_that_causes_compile errors;
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> make[3]: *** [../scripts/Makefile.build:283: lib/kunit/test.o] Error 1
Secondly, trying to concat exceptions to strings will fail with
> TypeError: can only concatenate str (not "OSError") to str
so fix this with an explicit cast to str.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The main purpose of the profiler test to check different llvm generation
patterns to make sure the verifier can load these large programs.
Note that profiler.inc.h test doesn't follow strict kernel coding style.
The code was formatted in the kernel style, but variable declarations are
kept as-is to preserve original llvm IR pattern.
profiler1.c should pass with older and newer llvm
profiler[23].c may fail on older llvm that don't have:
https://reviews.llvm.org/D85570
because llvm may do speculative code motion optimization that
will generate code like this:
// r9 is a pointer to map_value
// r7 is a scalar
17: bf 96 00 00 00 00 00 00 r6 = r9
18: 0f 76 00 00 00 00 00 00 r6 += r7
19: a5 07 01 00 01 01 00 00 if r7 < 257 goto +1
20: bf 96 00 00 00 00 00 00 r6 = r9
// r6 is used here
The verifier will reject such code with the error:
"math between map_value pointer and register with unbounded min value is not allowed"
At insn 18 the r7 is indeed unbounded. The later insn 19 checks the bounds and
the insn 20 undoes map_value addition. It is currently impossible for the
verifier to understand such speculative pointer arithmetic. Hence llvm D85570
addresses it on the compiler side.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201009011240.48506-4-alexei.starovoitov@gmail.com
The llvm register allocator may use two different registers representing the
same virtual register. In such case the following pattern can be observed:
1047: (bf) r9 = r6
1048: (a5) if r6 < 0x1000 goto pc+1
1050: ...
1051: (a5) if r9 < 0x2 goto pc+66
1052: ...
1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
This is normal behavior of greedy register allocator.
The slides 137+ explain why regalloc introduces such register copy:
http://llvm.org/devmtg/2018-04/slides/Yatsina-LLVM%20Greedy%20Register%20Allocator.pdf
There is no way to tell llvm 'not to do this'.
Hence the verifier has to recognize such patterns.
In order to track this information without backtracking allocate ID
for scalars in a similar way as it's done for find_good_pkt_pointers().
When the verifier encounters r9 = r6 assignment it will assign the same ID
to both registers. Later if either register range is narrowed via conditional
jump propagate the register state into the other register.
Clear register ID in adjust_reg_min_max_vals() for any alu instruction. The
register ID is ignored for scalars in regsafe() and doesn't affect state
pruning. mark_reg_unknown() clears the ID. It's used to process call, endian
and other instructions. Hence ID is explicitly cleared only in
adjust_reg_min_max_vals() and in 32-bit mov.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201009011240.48506-2-alexei.starovoitov@gmail.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter selftests fixes from
Fabian Frederick:
1) Extend selftest nft_meta.sh to check for meta cpu.
2) Fix selftest nft_meta.sh error reporting.
3) Fix shellcheck warnings in selftest nft_meta.sh.
4) Extend selftest nft_meta.sh to check for meta time.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ld's --build-id defaults to "sha1" style, while lld defaults to "fast".
The build IDs are very different between the two, which may confuse
programs that reference them.
Signed-off-by: Bill Wendling <morbo@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Naresh reported that selftests: pidfd: pidfd_wait hangs on linux next kernel on
x86_64, i386 and arm64 Juno-r2
These devices are using NFS mounted rootfs.
I have tested pidfd testcases independently and all test PASS.
The Hang or exit from test run noticed when run by run_kselftest.sh
pidfd_wait.c:208:wait_nonblock:Expected sys_waitid(P_PIDFD, pidfd,
&info, WSTOPPED, NULL) (-1) == 0 (0)
wait_nonblock: Test terminated by assertion
metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: e64997027d5f171148687e58b78c8b3c869a6158
git describe: next-20200922
make_kernelversion: 5.9.0-rc6
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-core2-32/lkft/linux-next/865/config
The reason for this is a simple race in the selftests, that I overlooked and
which is more likely to hit when there's a lot of processes running on the
system. Basically the child process hasn't SIGSTOPed itself yet but the parent
is already calling waitid() on a O_NONBLOCK pidfd. Since it doesn't find a
WSTOPPED process it returns -EAGAIN correctly.
The fix for this is to move the line where we're removing the O_NONBLOCK
property from the fd before the waitid() WSTOPPED call so we hang until the
child becomes stopped.
Fixes: cd89597bbe ("tests: add waitid() tests for non-blocking pidfds")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lkft.validation.linaro.org/scheduler/job/1813223
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Pull KCSAN updates for v5.10 from Paul E. McKenney:
- Improve kernel messages.
- Be more permissive with bitops races under KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.
- Optimize debugfs stat counters.
- Introduce the instrument_*read_write() annotations, to provide a
finer description of certain ops - using KCSAN's compound instrumentation.
Use them for atomic RNW and bitops, where appropriate.
Doing this might find new races.
(Depends on the compiler having tsan-compound-read-before-write=1 support.)
- Support atomic built-ins, which will help certain architectures, such as s390.
- Misc enhancements and smaller fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull v5.10 RCU changes from Paul E. McKenney:
- Debugging for smp_call_function().
- Strict grace periods for KASAN. The point of this series is to find
RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
further disabled by dfefault. Finally, the help text includes
a goodly list of scary caveats.
- New smp_call_function() torture test.
- Torture-test updates.
- Documentation updates.
- Miscellaneous fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In case of errors, this message was printed:
(...)
balanced bwidth with unbalanced delay 5233 max 5005 [ fail ]
client exit code 0, server 0
\nnetns ns3-0-EwnkPH socket stat for 10003:
(...)
Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".
The commit 8b974778f9 ("selftests: mptcp: interpret \n as a new line")
is very similar to this one. But the modification in simult_flows.sh was
missed because this commit above was done in parallel to one here below.
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the UAPI headers start to appear in distros, we need to avoid
outdated versions of struct clone_args to be able to test modern
features, named "struct __clone_args". Additionally update the struct
size macro names to match UAPI names.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075432.u4gis3s2o5qrsb5g@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>