gen_pool_alloc_algo() uses different allocation functions implementing
different allocation algorithms. With gen_pool_first_fit_align()
allocation function, the returned address should be aligned on the
requested boundary.
If chunk start address isn't aligned on the requested boundary, the
returned address isn't aligned too. The only way to get properly
aligned address is to initialize the pool with chunks aligned on the
requested boundary. If want to have an ability to allocate buffers
aligned on different boundaries (for example, 4K, 1MB, ...), the chunk
start address should be aligned on the max possible alignment.
This happens because gen_pool_first_fit_align() looks for properly
aligned memory block without taking into account the chunk start address
alignment.
To fix this, we provide chunk start address to
gen_pool_first_fit_align() and change its implementation such that it
starts looking for properly aligned block with appropriate offset
(exactly as is done in CMA).
Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com
Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.com
Signed-off-by: Alexey Skidanov <alexey.skidanov@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour. It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.
Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.
Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Empty function will be inlined so asmlinkage doesn't do anything.
Link: http://lkml.kernel.org/r/20181124093530.GE10969@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joey Pabalinas <joeypabalinas@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
check_hung_uninterruptible_tasks() is currently calling rcu_lock_break()
for every 1024 threads. But check_hung_task() is very slow if printk()
was called, and is very fast otherwise.
If many threads within some 1024 threads called printk(), the RCU grace
period might be extended enough to trigger RCU stall warnings.
Therefore, calling rcu_lock_break() for every some fixed jiffies will be
safer.
Link: http://lkml.kernel.org/r/1544800658-11423-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on commit 401c636a0e ("kernel/hung_task.c: show all hung tasks
before panic"), we could get the call stack of hung task.
However, if the console loglevel is not high, we still can not see the
useful panic information in practice, and in most cases users don't set
console loglevel to high level.
This patch is to force console verbose before system panic, so that the
real useful information can be seen in the console, instead of being
like the following, which doesn't have hung task information.
INFO: task init:1 blocked for more than 120 seconds.
Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Kernel panic - not syncing: hung_task: blocked tasks
CPU: 2 PID: 479 Comm: khungtaskd Tainted: G U W 4.19.0-quilt-2e5dc0ac-g51b6c21d76cc #1
Call Trace:
dump_stack+0x4f/0x65
panic+0xde/0x231
watchdog+0x290/0x410
kthread+0x12c/0x150
ret_from_fork+0x35/0x40
reboot: panic mode set: p,w
Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A6015B675@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The introduction of these dummy BUILD_BUG_ON stubs dates back to commmit
903c0c7cdc ("sparse: define dummy BUILD_BUG_ON definition for
sparse").
At that time, BUILD_BUG_ON() was implemented with the negative array
trick *and* the link-time trick, like this:
extern int __build_bug_on_failed;
#define BUILD_BUG_ON(condition) \
do { \
((void)sizeof(char[1 - 2*!!(condition)])); \
if (condition) __build_bug_on_failed = 1; \
} while(0)
Sparse is more strict about the negative array trick than GCC because
Sparse requires the array length to be really constant.
Here is the simple test code for the macro above:
static const int x = 0;
BUILD_BUG_ON(x);
GCC is absolutely fine with it (-Wvla was enabled only very recently),
but Sparse warns like this:
error: bad constant expression
error: cannot size expression
(If you are using a newer version of Sparse, you will see a different
warning message, "warning: Variable length array is used".)
Anyway, Sparse was producing many false positives, and noisier than it
should be at that time.
With the previous commit, the leftover negative array trick is gone.
Sparse is fine with the current BUILD_BUG_ON(), which is implemented by
using the 'error' attribute.
I am keeping the stub for BUILD_BUG_ON_ZERO(). Otherwise, Sparse would
complain about the following code, which GCC is fine with:
static const int x = 0;
int y = BUILD_BUG_ON_ZERO(x);
Link: http://lkml.kernel.org/r/1542856462-18836-3-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel can only be compiled with an optimization option (-O2, -Os,
or the currently proposed -Og). Hence, __OPTIMIZE__ is always defined
in the kernel source.
The fallback for the -O0 case is just hypothetical and pointless.
Moreover, commit 0bb95f80a3 ("Makefile: Globally enable VLA warning")
enabled -Wvla warning. The use of variable length arrays is banned.
Link: http://lkml.kernel.org/r/1542856462-18836-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
`extern' with function prototypes makes lines longer and creates more
characters on the screen.
Do not bug people with checkpatch.pl warnings for now as fallout can be
devastating.
Link: http://lkml.kernel.org/r/20181101134153.GA29267@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the number of input parameters is less than the total parameters, an
EINVAL error will be returned.
For example, we use proc_doulongvec_minmax to pass up to two parameters
with kern_table:
{
.procname = "monitor_signals",
.data = &monitor_sigs,
.maxlen = 2*sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
},
Reproduce:
When passing two parameters, it's work normal. But passing only one
parameter, an error "Invalid argument"(EINVAL) is returned.
[root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
1 2
[root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
-bash: echo: write error: Invalid argument
[root@cl150 ~]# echo $?
1
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
3 2
[root@cl150 ~]#
The following is the result after apply this patch. No error is
returned when the number of input parameters is less than the total
parameters.
[root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
1 2
[root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
[root@cl150 ~]# echo $?
0
[root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
3 2
[root@cl150 ~]#
There are three processing functions dealing with digital parameters,
__do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.
This patch deals with __do_proc_doulongvec_minmax, just as
__do_proc_dointvec does, adding a check for parameters 'left'. In
__do_proc_douintvec, its code implementation explicitly does not support
multiple inputs.
static int __do_proc_douintvec(...){
...
/*
* Arrays are not supported, keep this simple. *Do not* add
* support for them.
*/
if (vleft != 1) {
*lenp = 0;
return -EINVAL;
}
...
}
So, just __do_proc_doulongvec_minmax has the problem. And most use of
proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
parameter.
Link: http://lkml.kernel.org/r/1544081775-15720-1-git-send-email-cheng.lin130@zte.com.cn
Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Header of /proc/*/limits is a fixed string, so print it directly without
formatting specifiers.
Link: http://lkml.kernel.org/r/20181203164242.GB6904@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
name_to_int() is defined in fs/proc/util.c and declared in
fs/proc/internal.h, but the declaration isn't included at the point of
the definition. Include the header to enforce that the definition
matches the declaration.
This addresses a gcc warning when -Wmissing-prototypes is enabled.
Link: http://lkml.kernel.org/r/20181115001833.49371-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
in its effective capability set, but the current check looks in the root
namespace instead of the process' user namespace. Since a process is
allowed to do other activities controlled by CAP_SYS_NICE inside a
namespace, it should also be able to adjust timerslack_ns.
Link: http://lkml.kernel.org/r/20181030180012.232896-1-bmgordon@google.com
Signed-off-by: Benjamin Gordon <bmgordon@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Colin Cross <ccross@android.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.
But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar. Which makes it very hard to verify that the access has
actually been range-checked.
If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin(). But
nothing really forces the range check.
By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses. We have way too long a history of people
trying to avoid them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When commit fddcd00a49 ("drm/i915: Force the slow path after a
user-write error") unified the error handling for various user access
problems, it didn't do the user_access_end() that is needed for the
unsafe_put_user() case.
It's not a huge deal: a missed user_access_end() will only mean that
SMAP protection isn't active afterwards, and for the error case we'll be
returning to user mode soon enough anyway. But it's wrong, and adding
the proper user_access_end() is trivial enough (and doing it for the
other error cases where it isn't needed doesn't hurt).
I noticed it while doing the same prep-work for changing
user_access_begin() that precipitated the access_ok() changes in commit
96d4f267e4 ("Remove 'type' argument from access_ok() function").
Fixes: fddcd00a49 ("drm/i915: Force the slow path after a user-write error")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@kernel.org # v4.20
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some kernels, like 4.19.13-300.fc29.x86_64 in fedora 29, fail with the
existing probe definition asking for the contents of result->name,
working when we ask for the 'filename' variable instead, so add a
fallback to that.
Now those tests are back working on fedora 29 systems with that kernel:
# perf test vfs_getname
65: Use vfs_getname probe to get syscall args filenames : Ok
66: Add vfs_getname probe to get syscall args filenames : Ok
67: Check open filename arg using perf trace + vfs_getname: Ok
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-klt3n0i58dfqttveti09q3fi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These two architectures actually had an intentional use of the 'type'
argument to access_ok() just to avoid warnings.
I had actually noticed the powerpc one, but forgot to then fix it up.
And I missed the sparc32 case entirely.
This is hopefully all of it.
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 96d4f267e4 ("Remove 'type' argument from access_ok() function")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of doing an unconditional mkdir, use a dummy Makefile variable
to check if the directory is there and if not, create it.
This is better than what we had and will help with other python bindings
that are in development, like one involved with python backtraces.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-iis6us2nocw3y4uuoon9osd7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Each call to va_copy() should have one, and only one, corresponding call
to va_end(). In strbuf_addv() some code paths result in va_end() getting
called multiple times. Remove the superfluous va_end().
Signed-off-by: Mattias Jacobsson <2pi@mok.nu>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sanskriti Sharma <sansharm@redhat.com>
Link: http://lkml.kernel.org/r/20181229141750.16945-1-2pi@mok.nu
Fixes: ce49d8436c ("perf strbuf: Match va_{add,copy} with va_end")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The symbol__disassemble() function uses shell to launch objdump and
filter its output via grep. Passing filenames by interpolating them into
the command line via "%s" may lead to problems if said filenames contain
special characters.
Instead, pass the filename as a command line argument where it is not
subject to any kind of interpretation, then use quoted shell
interpolation to build the strings we need safely.
Signed-off-by: Ivan Krylov <krylov.r00t@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181014111803.5d83b806@Tarkus
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By calculating the removed loops, we can get the iteration count.
But the iteration count could be reported incorrectly, reporting
impossibly high counts.
That's because previous code uses the number of removed LBR entries for
the iteration count. That's not good. Fix this by increasing the
iteration count when a loop is detected.
When matching the chain, the iteration count would be added up, finally we need
to compute the average value when printing out.
For example,
$ perf report --branch-history --stdio --no-children
Before:
---f2 +0
|
|--33.62%--f1 +9 (cycles:1)
| f1 +0
| main +22 (cycles:1)
| main +17
| main +38 (cycles:1)
| main +27
| f1 +26 (cycles:1)
| f1 +24
| f2 +27 (cycles:7)
| f2 +0
| f1 +19 (cycles:1)
| f1 +14
| f2 +27 (cycles:11)
| f2 +0
| f1 +9 (cycles:1 iter:2968 avg_cycles:3)
| f1 +0
| main +22 (cycles:1 iter:2968 avg_cycles:3)
| main +17
| main +38 (cycles:1 iter:2968 avg_cycles:3)
2968 is an impossible high iteration count and avg_cycles is too small.
After:
---f2 +0
|
|--33.62%--f1 +9 (cycles:1)
| f1 +0
| main +22 (cycles:1)
| main +17
| main +38 (cycles:1)
| main +27
| f1 +26 (cycles:1)
| f1 +24
| f2 +27 (cycles:7)
| f2 +0
| f1 +19 (cycles:1)
| f1 +14
| f2 +27 (cycles:11)
| f2 +0
| f1 +9 (cycles:1 iter:1 avg_cycles:23)
| f1 +0
| main +22 (cycles:1 iter:1 avg_cycles:23)
| main +17
| main +38 (cycles:1 iter:1 avg_cycles:23)
avg_cycles:23 is the average cycles of this iteration.
Fixes: c4ee06251d ("perf report: Calculate the average cycles of iterations")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1546582230-17507-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes from:
a0aea130af ("KVM: x86: Add CPUID support for new instruction WBNOINVD")
20c3a2c33e ("x86/speculation: Add support for STIBP always-on preferred mode")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Robert Hoo <robert.hu@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Link: https://lkml.kernel.org/n/tip-aonti3bu9rhnqe5hlawbidcp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
b7d624ab43 asm-generic: unistd.h: fixup broken macro include.
4e21565b7f asm-generic: add kexec_file_load system call to unistd.h
With this the 'kexec_file_load' syscall will be added to arm64's syscall
table and will appear on the output of 'perf trace' on that platform.
This silences this tools/perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/n/tip-er8j7qhavtdw0kdga3zswynm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes from these csets:
2bc39970e9 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
2a31b9db15 ("kvm: introduce manual dirty log reprotect")
That results in these new KVM IOCTLs being supported in 'perf trace'
when beautifying the cmd ioctl syscall argument:
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
--- before 2019-01-04 11:44:23.506605301 -0300
+++ after 2019-01-04 11:44:36.878730583 -0300
@@ -86,6 +86,8 @@
[0xbd] = "HYPERV_EVENTFD",
[0xbe] = "GET_NESTED_STATE",
[0xbf] = "SET_NESTED_STATE",
+ [0xc0] = "CLEAR_DIRTY_LOG",
+ [0xc1] = "GET_SUPPORTED_HV_CPUID",
[0xe0] = "CREATE_DEVICE",
[0xe1] = "SET_DEVICE_ATTR",
[0xe2] = "GET_DEVICE_ATTR",
$
At some point we should be able to do something:
# perf trace -e ioctl(cmd == KVM_CLEAR_DIRTY_LOG)
And have just those ioctls, optionally with callchains, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lkml.kernel.org/n/tip-konm3iigl2os6ritt7d2bori@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in this cset:
65cab850f0 ("net: Allow class-e address assignment via ifconfig ioctl")
The macros changed in this cset are not used in tools/, so this is just
to silence this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
Cc: Dave Taht <dave.taht@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-smghvyxb3budqd1e70i0ylw1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in these csets:
fe84168647 Revert "drm/i915/perf: add a parameter to control the size of OA buffer"
cd956bfcd0 drm/i915/perf: add a parameter to control the size of OA buffer
4bdafb9ddf drm/i915: Remove i915.enable_ppgtt override
Not one of them result in any changes in tools/perf/, this is just to
silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mdw7ta6qz7d2rl77gf00uqe8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So user could specify outside CFLAGS values.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190103161350.11446-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using -O3 instead of -O1 if it's supported by compiler.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Markus Mayer <mmayer@broadcom.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/20190103161350.11446-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 73aeb2cbcd ("ARM: 8787/1: wire up io_pgetevents syscall")
hooked up the io_pgetevents() system call for 32-bit ARM, so we can
do the same for the compat wrapper on arm64.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The syscall number may have been changed by a tracer, so we should pass
the actual number in from the caller instead of pulling it from the
saved r7 value directly.
Cc: <stable@vger.kernel.org>
Cc: Pi-Hsun Shih <pihsun@chromium.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM Linux kernel handles the EABI syscall numbers as follows:
0 - NR_SYSCALLS-1 : Invoke syscall via syscall table
NR_SYSCALLS - 0xeffff : -ENOSYS (to be allocated in future)
0xf0000 - 0xf07ff : Private syscall or -ENOSYS if not allocated
> 0xf07ff : SIGILL
Our compat code gets this wrong and ends up sending SIGILL in response
to all syscalls greater than NR_SYSCALLS which have a value greater
than 0x7ff in the bottom 16 bits.
Fix this by defining the end of the ARM private syscall region and
checking the syscall number against that directly. Update the comment
while we're at it.
Cc: <stable@vger.kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Reported-by: Pi-Hsun Shih <pihsun@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, <uapi/asm/sigcontext.h> provides common definitions for
describing SVE context structures that are also used by the ptrace
definitions in <uapi/asm/ptrace.h>.
For this reason, a #include of <asm/sigcontext.h> was added in
ptrace.h, but it this turns out that this can interact badly with
userspace code that tries to include ptrace.h on top of the libc
headers (which may provide their own shadow definitions for
sigcontext.h).
To make the headers easier for userspace to consume, this patch
bounces the common definitions into an __SVE_* namespace and moves
them to a backend header <uapi/asm/sve_context.h> that can be
included by the other headers as appropriate. This should allow
ptrace.h to be used alongside libc's sigcontext.h (if any) without
ill effects.
This should make the situation unambiguous: <asm/sigcontext.h> is
the header to include for the sigframe-specific definitions, while
<asm/ptrace.h> is the header to include for ptrace-specific
definitions.
To avoid conflicting with existing usage, <asm/sigcontext.h>
remains the canonical way to get the common definitions for
SVE_VQ_MIN, sve_vq_from_vl() etc., both in userspace and in the
kernel: relying on these being defined as a side effect of
including just <asm/ptrace.h> was never intended to be safe.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
SVE_PT_REGS_OFFSET is supposed to indicate the offset for skipping
over the ptrace NT_ARM_SVE header (struct user_sve_header) to the
start of the SVE register data proper.
However, currently SVE_PT_REGS_OFFSET is defined in terms of struct
sve_context, which is wrong: that structure describes the SVE
header in the signal frame, not in the ptrace regset.
This patch fixes the definition to use the ptrace header structure
struct user_sve_header instead.
By good fortune, the two structures are the same size anyway, so
there is no functional or ABI change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In commit 05a4ab8239 ("powerpc/uaccess: fix warning/error with
access_ok()") an attempt was made to remove a warning by referencing
the variable `type`. However in commit 96d4f267e4 ("Remove 'type'
argument from access_ok() function") the variable `type` has been
removed, breaking the build:
arch/powerpc/include/asm/uaccess.h:66:32: error: ‘type’ undeclared (first use in this function)
This essentially reverts commit 05a4ab8239 ("powerpc/uaccess: fix
warning/error with access_ok()") to fix the error.
Fixes: 96d4f267e4 ("Remove 'type' argument from access_ok() function")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
[mpe: Reword change log slightly.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since the commit 2a4eb7358a "OPP: Don't remove dynamic OPPs from
_dev_pm_opp_remove_table()", dynamically created OPP aren't
automatically removed anymore by dev_pm_opp_cpumask_remove_table(). This
affects the scpi and scmi cpufreq drivers which no longer free OPPs on
failures or on invocations of the policy->exit() callback.
Create a generic OPP helper dev_pm_opp_remove_all_dynamic() which can be
called from these drivers instead of dev_pm_opp_cpumask_remove_table().
In dev_pm_opp_remove_all_dynamic(), we need to make sure that the
opp_list isn't getting accessed simultaneously from other parts of the
OPP core while the helper is freeing dynamic OPPs, i.e. we can't drop
the opp_table->lock while traversing through the OPP list. And to
accomplish that, this patch also creates _opp_kref_release_unlocked()
which can be called from this new helper with the opp_table lock already
held.
Cc: 4.20 <stable@vger.kernel.org> # v4.20
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Fixes: 2a4eb7358a "OPP: Don't remove dynamic OPPs from _dev_pm_opp_remove_table()"
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Update the MAINTAINERS entry for cpuidle by making it clear that it
is not just drivers and adding a documentation record to it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For DDRC PMU, each PMU counter is fixed-purpose. There is a mismatch
between perf list and driver definition on rw_chg event.
# perf list | grep chg
hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event]
hisi_sccl1_ddrc0/rw_chg/ [Kernel PMU event]
But the register offset of rw_chg event is not defined in the driver,
meanwhile bnk_chg register offset is mis-defined, let's fixup it.
Fixes: 904dcf03f0 ("perf: hisi: Add support for HiSilicon SoC DDRC PMU driver")
Cc: stable@vger.kernel.org
Cc: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reported-by: Weijian Huang <huangweijian4@hisilicon.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Use the standard obj-$(CONFIG_...) syntex. The behavior is still the
same.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the slow path DMA API calls are implemented out of line a few
helpers only used by them don't need to be exported anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
This avoids link failures in drivers using the DMA API, when they
are compiled for user mode Linux with CONFIG_COMPILE_TEST=y.
Fixes: 356da6d0cd ("dma-mapping: bypass indirect calls for dma-direct")
Signed-off-by: Christoph Hellwig <hch@lst.de>
dmam_alloc_coherent is just the default no-flags case of
dmam_alloc_attrs, so take advantage of this similar to the non-managed
version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
And also switch the way we implement the unmap side around to stay
consistent. This ensures dma-debug works again because it records which
function we used for mapping to ensure it is also used for unmapping,
and also reduces further code duplication. Last but not least this
also officially allows calling dma_sync_single_* for mappings created
using dma_map_page, which is perfectly fine given that the sync calls
only take a dma_addr_t, but not a virtual address or struct page.
Fixes: 7f0fee242e ("dma-mapping: merge dma_unmap_page_attrs and dma_unmap_single_attrs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Adding a function inside a WARN_ON() didn't close the WARN_ON parathesis.
Link: http://lkml.kernel.org/r/201901020958.28Mzbs0O%fengguang.wu@intel.com
Cc: linux-sh@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: cec8d0e7f0 ("sh: ftrace: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>