Pull timer and time updates from Thomas Gleixner:
"A rather large update of timers, timekeeping & co
- Core timekeeping code is year-2038 safe now for 32bit machines.
Now we just need to fix all in kernel users and the gazillion of
user space interfaces which rely on timespec/timeval :)
- Better cache layout for the timekeeping internal data structures.
- Proper nanosecond based interfaces for in kernel users.
- Tree wide cleanup of code which wants nanoseconds but does hoops
and loops to convert back and forth from timespecs. Some of it
definitely belongs into the ugly code museum.
- Consolidation of the timekeeping interface zoo.
- A fast NMI safe accessor to clock monotonic for tracing. This is a
long standing request to support correlated user/kernel space
traces. With proper NTP frequency correction it's also suitable
for correlation of traces accross separate machines.
- Checkpoint/restart support for timerfd.
- A few NOHZ[_FULL] improvements in the [hr]timer code.
- Code move from kernel to kernel/time of all time* related code.
- New clocksource/event drivers from the ARM universe. I'm really
impressed that despite an architected timer in the newer chips SoC
manufacturers insist on inventing new and differently broken SoC
specific timers.
[ Ed. "Impressed"? I don't think that word means what you think it means ]
- Another round of code move from arch to drivers. Looks like most
of the legacy mess in ARM regarding timers is sorted out except for
a few obnoxious strongholds.
- The usual updates and fixlets all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
timekeeping: Fixup typo in update_vsyscall_old definition
clocksource: document some basic timekeeping concepts
timekeeping: Use cached ntp_tick_length when accumulating error
timekeeping: Rework frequency adjustments to work better w/ nohz
timekeeping: Minor fixup for timespec64->timespec assignment
ftrace: Provide trace clocks monotonic
timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
seqcount: Add raw_write_seqcount_latch()
seqcount: Provide raw_read_seqcount()
timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
timekeeping: Create struct tk_read_base and use it in struct timekeeper
timekeeping: Restructure the timekeeper some more
clocksource: Get rid of cycle_last
clocksource: Move cycle_last validation to core code
clocksource: Make delta calculation a function
wireless: ath9k: Get rid of timespec conversions
drm: vmwgfx: Use nsec based interfaces
drm: i915: Use nsec based interfaces
timekeeping: Provide ktime_get_raw()
hangcheck-timer: Use ktime_get_ns()
...
Pull irq updates from Thomas Gleixner:
"Nothing spectacular from the irq department this time:
- overhaul of the crossbar chip driver
- overhaul of the spear shirq chip driver
- support for the atmel-aic chip
- code move from arch to drivers
- the usual tiny fixlets
- two reverts worth to mention which undo the too simple attempt of
supporting wakeup interrupts on shared interrupt lines"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
Revert "irq: Warn when shared interrupts do not match on NO_SUSPEND"
Revert "PM / sleep / irq: Do not suspend wakeup interrupts"
irq: Warn when shared interrupts do not match on NO_SUSPEND
irqchip: atmel-aic: Define irq fixups for atmel SoCs
irqchip: atmel-aic: Implement RTC irq fixup
irqchip: atmel-aic: Add irq fixup infrastructure
irqchip: atmel-aic: Add atmel AIC/AIC5 drivers
irqchip: atmel-aic: Move binding doc to interrupt-controller directory
genirq: generic chip: Export irq_map_generic_chip function
PM / sleep / irq: Do not suspend wakeup interrupts
irqchip: or1k-pic: Migrate from arch/openrisc/
irqchip: crossbar: Allow for quirky hardware with direct hardwiring of GIC
documentation: dt: omap: crossbar: Add description for interrupt consumer
irqchip: crossbar: Introduce centralized check for crossbar write
irqchip: crossbar: Introduce ti, max-crossbar-sources to identify valid crossbar mapping
irqchip: crossbar: Add kerneldoc for crossbar_domain_unmap callback
irqchip: crossbar: Set cb pointer to null in case of error
irqchip: crossbar: Change the goto naming
irqchip: crossbar: Return proper error value
irqchip: crossbar: Fix kerneldoc warning
...
Here's the big pull request for the staging driver tree for 3.17-rc1.
Lots of things in here, over 2000 patches, but the best part is this:
1480 files changed, 39070 insertions(+), 254659 deletions(-)
Thanks to the great work of Kristina Martšenko, 14 different staging
drivers have been removed from the tree as they were obsolete and no one
was willing to work on cleaning them up. Other than the driver
removals, loads of cleanups are in here (comedi, lustre, etc.) as well
as the usual IIO driver updates and additions.
All of this has been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlPf1wYACgkQMUfUDdst+ykrNwCgswPkRSAPQ3C8WvLhzUYRZZ/L
AqEAoJP0Q8Fz8unXjlSMcx7pgcqUaJ8G
=mrTQ
-----END PGP SIGNATURE-----
Merge tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
"Here's the big pull request for the staging driver tree for 3.17-rc1.
Lots of things in here, over 2000 patches, but the best part is this:
1480 files changed, 39070 insertions(+), 254659 deletions(-)
Thanks to the great work of Kristina Martšenko, 14 different staging
drivers have been removed from the tree as they were obsolete and no
one was willing to work on cleaning them up. Other than the driver
removals, loads of cleanups are in here (comedi, lustre, etc.) as well
as the usual IIO driver updates and additions.
All of this has been in the linux-next tree for a while"
* tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits)
staging: comedi: addi_apci_1564: remove diagnostic interrupt support code
staging: comedi: addi_apci_1564: add subdevice to check diagnostic status
staging: wlan-ng: coding style problem fix
staging: wlan-ng: fixing coding style problems
staging: comedi: ii_pci20kc: request and ioremap memory
staging: lustre: bitwise vs logical typo
staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h
staging: dgnc: rephrase comment
staging: comedi: ni_tio: remove some dead code
staging: rtl8723au: Fix static symbol sparse warning
staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc'
staging: rtl8723au: Do not duplicate kernel provided USB macros
staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown
staging: rtl8723au: Remove two never set variables
staging: rtl8723au: RSSI_test is never set
staging:r8190: coding style: Fixed checkpatch reported Error
staging:r8180: coding style: Fixed too long lines
staging:r8180: coding style: Fixed commenting style
staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer
staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer
...
Pull scheduler updates from Ingo Molnar:
- Move the nohz kick code out of the scheduler tick to a dedicated IPI,
from Frederic Weisbecker.
This necessiated quite some background infrastructure rework,
including:
* Clean up some irq-work internals
* Implement remote irq-work
* Implement nohz kick on top of remote irq-work
* Move full dynticks timer enqueue notification to new kick
* Move multi-task notification to new kick
* Remove unecessary barriers on multi-task notification
- Remove proliferation of wait_on_bit() action functions and allow
wait_on_bit_action() functions to support a timeout. (Neil Brown)
- Another round of sched/numa improvements, cleanups and fixes. (Rik
van Riel)
- Implement fast idling of CPUs when the system is partially loaded,
for better scalability. (Tim Chen)
- Restructure and fix the CPU hotplug handling code that may leave
cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
cpu. (Kirill Tkhai)
- Robustify the sched topology setup code. (Peterz Zijlstra)
- Improve sched_feat() handling wrt. static_keys (Jason Baron)
- Misc fixes.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
sched/fair: Fix 'make xmldocs' warning caused by missing description
sched: Use macro for magic number of -1 for setparam
sched: Robustify topology setup
sched: Fix sched_setparam() policy == -1 logic
sched: Allow wait_on_bit_action() functions to support a timeout
sched: Remove proliferation of wait_on_bit() action functions
sched/numa: Revert "Use effective_load() to balance NUMA loads"
sched: Fix static_key race with sched_feat()
sched: Remove extra static_key*() function indirection
sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
sched: Transform resched_task() into resched_curr()
sched/deadline: Kill task_struct->pi_top_task
sched: Rework check_for_tasks()
sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
sched/fair: Disable runtime_enabled on dying rq
sched/numa: Change scan period code to match intent
sched/numa: Rework best node setting in task_numa_migrate()
sched/numa: Examine a task move when examining a task swap
sched/numa: Simplify task_numa_compare()
sched/numa: Use effective_load() to balance NUMA loads
...
Pull perf changes from Ingo Molnar:
"Kernel side changes:
- Consolidate the PMU interrupt-disabled code amongst architectures
(Vince Weaver)
- misc fixes
Tooling changes (new features, user visible changes):
- Add support for pagefault tracing in 'trace', please see multiple
examples in the changeset messages (Stanislav Fomichev).
- Add pagefault statistics in 'trace' (Stanislav Fomichev)
- Add header for columns in 'top' and 'report' TUI browsers (Jiri
Olsa)
- Add pagefault statistics in 'trace' (Stanislav Fomichev)
- Add IO mode into timechart command (Stanislav Fomichev)
- Fallback to syscalls:* when raw_syscalls:* is not available in the
perl and python perf scripts. (Daniel Bristot de Oliveira)
- Add --repeat global option to 'perf bench' to be used in benchmarks
such as the existing 'futex' one, that was modified to use it
instead of a local option. (Davidlohr Bueso)
- Fix fd -> pathname resolution in 'trace', be it using /proc or a
vfs_getname probe point. (Arnaldo Carvalho de Melo)
- Add suggestion of how to set perf_event_paranoid sysctl, to help
non-root users trying tools like 'trace' to get a working
environment. (Arnaldo Carvalho de Melo)
- Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
(Steven Rostedt, Jan Kiszka)
- Support S/390 in 'perf kvm stat' (Alexander Yarygin)
Tooling infrastructure changes:
- Allow reserving a row for header purposes in the hists browser
(Arnaldo Carvalho de Melo)
- Various fixes and prep work related to supporting Intel PT (Adrian
Hunter)
- Introduce multiple debug variables control (Jiri Olsa)
- Add callchain and additional sample information for python scripts
(Joseph Schuchart)
- More prep work to support Intel PT: (Adrian Hunter)
- Polishing 'script' BTS output
- 'inject' can specify --kallsym
- VDSO is per machine, not a global var
- Expose data addr lookup functions previously private to 'script'
- Large mmap fixes in events processing
- Include standard stringify macros in power pc code (Sukadev
Bhattiprolu)
Tooling cleanups:
- Convert open coded equivalents to asprintf() (Andy Shevchenko)
- Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)
- Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
Melo)
- No need to reimplement err() in 'perf bench sched-messaging', drop
barf(). (Davidlohr Bueso).
- Remove ev_name argument from perf_evsel__hists_browse, can be
obtained from the other parameters. (Jiri Olsa)
Tooling fixes:
- Fix memory leak in the 'sched-messaging' perf bench test.
(Davidlohr Bueso)
- The -o and -n 'perf bench mem' options are mutually exclusive, emit
error when both are specified. (Davidlohr Bueso)
- Fix scrollbar refresh row index in the ui browser, problem exposed
now that headers will be added and will be allowed to be switched
on/off. (Jiri Olsa)
- Handle the num array type in python properly (Sebastian Andrzej
Siewior)
- Fix wrong condition for allocation failure (Jiri Olsa)
- Adjust callchain based on DWARF debug info on powerpc (Sukadev
Bhattiprolu)
- Fix a risk for doing free on uninitialized pointer in traceevent
lib (Rickard Strandqvist)
- Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)
- Enable close-on-exec flag on perf file descriptor (Yann Droneaud)
- Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)
- Event ordering fixes (Jiri Olsa)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
Revert "perf tools: Fix jump label always changing during tracing"
perf tools: Fix perf usage string leftover
perf: Check permission only for parent tracepoint event
perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
perf record: Always force PERF_RECORD_FINISHED_ROUND event
perf inject: Add --kallsyms parameter
perf tools: Expose 'addr' functions so they can be reused
perf session: Fix accounting of ordered samples queue
perf powerpc: Include util/util.h and remove stringify macros
perf tools: Fix build on gcc 4.4.7
perf tools: Add thread parameter to vdso__dso_findnew()
perf tools: Add dso__type()
perf tools: Separate the VDSO map name from the VDSO dso name
perf tools: Add vdso__new()
perf machine: Fix the lifetime of the VDSO temporary file
perf tools: Group VDSO global variables into a structure
perf session: Add ability to skip 4GiB or more
perf session: Add ability to 'skip' a non-piped event stream
perf tools: Pass machine to vdso__dso_findnew()
perf tools: Add dso__data_size()
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- big rtmutex and futex cleanup and robustification from Thomas
Gleixner
- mutex optimizations and refinements from Jason Low
- arch_mutex_cpu_relax() removal and related cleanups
- smaller lockdep tweaks"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
arch, locking: Ciao arch_mutex_cpu_relax()
locking/lockdep: Only ask for /proc/lock_stat output when available
locking/mutexes: Optimize mutex trylock slowpath
locking/mutexes: Try to acquire mutex only if it is unlocked
locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
locking/mutexes: Correct documentation on mutex optimistic spinning
rtmutex: Make the rtmutex tester depend on BROKEN
futex: Simplify futex_lock_pi_atomic() and make it more robust
futex: Split out the first waiter attachment from lookup_pi_state()
futex: Split out the waiter check from lookup_pi_state()
futex: Use futex_top_waiter() in lookup_pi_state()
futex: Make unlock_pi more robust
rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
rtmutex: Cleanup deadlock detector debug logic
rtmutex: Confine deadlock logic to futex
rtmutex: Simplify remove_waiter()
rtmutex: Document pi chain walk
rtmutex: Clarify the boost/deboost part
rtmutex: No need to keep task ref for lock owner check
rtmutex: Simplify and document try_to_take_rtmutex()
...
Pull RCU changes from Ingo Molar:
"The main changes:
- torture-test updates
- callback-offloading changes
- maintainership changes
- update RCU documentation
- miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing
rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp()
rcu: Fix a sparse warning in rcu_initiate_boost()
rcu: Fix __rcu_reclaim() to use true/false for bool
rcu: Remove CONFIG_PROVE_RCU_DELAY
rcu: Use __this_cpu_read() instead of per_cpu_ptr()
rcu: Don't use NMIs to dump other CPUs' stacks
rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
rcu: Simplify priority boosting by putting rt_mutex in rcu_node
rcu: Check both root and current rcu_node when setting up future grace period
rcu: Allow post-unlock reference for rt_mutex
rcu: Loosen __call_rcu()'s rcu_head alignment constraint
rcu: Eliminate read-modify-write ACCESS_ONCE() calls
rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu
rcu: Make rcu node arrays static const char * const
signal: Explain local_irq_save() call
rcu: Handle obsolete references to TINY_PREEMPT_RCU
rcu: Document deadlock-avoidance information for rcu_read_unlock()
scripts: Teach get_maintainer.pl about the new "R:" tag
rcu: Update rcu torture maintainership filename patterns
...
As he found some small bugs that went into 3.16, and these changes were
based on that, I had to apply his changes to a separate branch than
my main development branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT3533AAoJEKQekfcNnQGuaNkIALM9uf6gvIXgvyAlhK1qfjJL
0pxAlgppukA7hZT9gcubGuStPxKnwtVSvd/GwGI/mc7Ls57vOvdA61TEHN48l2MW
SUev1Jeu1Nx3Wh4B2ivRq+dJadrwU9NwG507yvuXn2L8Gqzwal133dg3AHnsTYgz
yu7oe8baxMniFFYlAmSELFp2cA3kli/WrRuQB7weudbT12jZlpOMqz7OUN9o7dfo
E4XfwkkDUuo9EiKD1QwVMk2cIkCaHFI8lJDYJbrM3L5XJKfnKzZNRMWfwDG2L7hS
DGjMl6T0SjU+ZIoTidHdbyF/Q+I0o9kKTnUrErGu88+2Z1gfEKN5O8zovNMbPUM=
=dT/x
-----END PGP SIGNATURE-----
Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing filter cleanups from Steven Rostedt:
"Oleg Nesterov did several clean ups with the tracing filter code. As
he found some small bugs that went into 3.16, and these changes were
based on that, I had to apply his changes to a separate branch than my
main development branch.
This was based on work that was already pulled into 3.16, and is a
separate pull request to keep from having local merges in my pull
request"
* tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Kill "filter_string" arg of replace_preds()
tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
tracing: Kill ftrace_event_call->files
tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
tracing: Kill call_filter_disable()
tracing: Kill destroy_call_preds()
tracing: Kill destroy_preds() and destroy_file_preds()
to the ftrace function callback infrastructure. It's introducing a
way to allow different functions to call directly different trampolines
instead of all calling the same "mcount" one.
The only user of this for now is the function graph tracer, which always
had a different trampoline, but the function tracer trampoline was called
and did basically nothing, and then the function graph tracer trampoline
was called. The difference now, is that the function graph tracer
trampoline can be called directly if a function is only being traced by
the function graph trampoline. If function tracing is also happening on
the same function, the old way is still done.
The accounting for this takes up more memory when function graph tracing
is activated, as it needs to keep track of which functions it uses.
I have a new way that wont take as much memory, but it's not ready yet
for this merge window, and will have to wait for the next one.
Another big change was the removal of the ftrace_start/stop() calls that
were used by the suspend/resume code that stopped function tracing when
entering into suspend and resume paths. The stop of ftrace was done
because there was some function that would crash the system if one called
smp_processor_id()! The stop/start was a big hammer to solve the issue
at the time, which was when ftrace was first introduced into Linux.
Now ftrace has better infrastructure to debug such issues, and I found
the problem function and labeled it with "notrace" and function tracing
can now safely be activated all the way down into the guts of suspend
and resume.
Other changes include clean ups of uprobe code.
Clean up of the trace_seq() code.
And other various small fixes and clean ups to ftrace and tracing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
=fwQY
-----END PGP SIGNATURE-----
Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This pull request has a lot of work done. The main thing is the
changes to the ftrace function callback infrastructure. It's
introducing a way to allow different functions to call directly
different trampolines instead of all calling the same "mcount" one.
The only user of this for now is the function graph tracer, which
always had a different trampoline, but the function tracer trampoline
was called and did basically nothing, and then the function graph
tracer trampoline was called. The difference now, is that the
function graph tracer trampoline can be called directly if a function
is only being traced by the function graph trampoline. If function
tracing is also happening on the same function, the old way is still
done.
The accounting for this takes up more memory when function graph
tracing is activated, as it needs to keep track of which functions it
uses. I have a new way that wont take as much memory, but it's not
ready yet for this merge window, and will have to wait for the next
one.
Another big change was the removal of the ftrace_start/stop() calls
that were used by the suspend/resume code that stopped function
tracing when entering into suspend and resume paths. The stop of
ftrace was done because there was some function that would crash the
system if one called smp_processor_id()! The stop/start was a big
hammer to solve the issue at the time, which was when ftrace was first
introduced into Linux. Now ftrace has better infrastructure to debug
such issues, and I found the problem function and labeled it with
"notrace" and function tracing can now safely be activated all the way
down into the guts of suspend and resume
Other changes include clean ups of uprobe code, clean up of the
trace_seq() code, and other various small fixes and clean ups to
ftrace and tracing"
* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
ftrace: Add warning if tramp hash does not match nr_trampolines
ftrace: Fix trampoline hash update check on rec->flags
ring-buffer: Use rb_page_size() instead of open coded head_page size
ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
tracing: Convert local function_graph functions to static
ftrace: Do not copy old hash when resetting
tracing: let user specify tracing_thresh after selecting function_graph
ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
s390/ftrace: remove check of obsolete variable function_trace_stop
arm64, ftrace: Remove check of obsolete variable function_trace_stop
Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
metag: ftrace: Remove check of obsolete variable function_trace_stop
microblaze: ftrace: Remove check of obsolete variable function_trace_stop
MIPS: ftrace: Remove check of obsolete variable function_trace_stop
parisc: ftrace: Remove check of obsolete variable function_trace_stop
sh: ftrace: Remove check of obsolete variable function_trace_stop
sparc64,ftrace: Remove check of obsolete variable function_trace_stop
tile: ftrace: Remove check of obsolete variable function_trace_stop
ftrace: x86: Remove check of obsolete variable function_trace_stop
...
Pull cgroup changes from Tejun Heo:
"Mostly changes to get the v2 interface ready. The core features are
mostly ready now and I think it's reasonable to expect to drop the
devel mask in one or two devel cycles at least for a subset of
controllers.
- cgroup added a controller dependency mechanism so that block cgroup
can depend on memory cgroup. This will be used to finally support
IO provisioning on the writeback traffic, which is currently being
implemented.
- The v2 interface now uses a separate table so that the interface
files for the new interface are explicitly declared in one place.
Each controller will explicitly review and add the files for the
new interface.
- cpuset is getting ready for the hierarchical behavior which is in
the similar style with other controllers so that an ancestor's
configuration change doesn't change the descendants' configurations
irreversibly and processes aren't silently migrated when a CPU or
node goes down.
All the changes are to the new interface and no behavior changed for
the multiple hierarchies"
* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits)
cpuset: fix the WARN_ON() in update_nodemasks_hier()
cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test
cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core
cgroup: distinguish the default and legacy hierarchies when handling cftypes
cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()
cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes
cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[]
cpuset: export effective masks to userspace
cpuset: allow writing offlined masks to cpuset.cpus/mems
cpuset: enable onlined cpu/node in effective masks
cpuset: refactor cpuset_hotplug_update_tasks()
cpuset: make cs->{cpus, mems}_allowed as user-configured masks
cpuset: apply cs->effective_{cpus,mems}
cpuset: initialize top_cpuset's configured masks at mount
cpuset: use effective cpumask to build sched domains
cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty
cpuset: update cs->effective_{cpus, mems} when config changes
cpuset: update cpuset->effective_{cpus,mems} at hotplug
cpuset: add cs->effective_cpus and cs->effective_mems
cgroup: clean up sane_behavior handling
...
Pull percpu updates from Tejun Heo:
- Major reorganization of percpu header files which I think makes
things a lot more readable and logical than before.
- percpu-refcount is updated so that it requires explicit destruction
and can be reinitialized if necessary. This was pulled into the
block tree to replace the custom percpu refcnting implemented in
blk-mq.
- In the process, percpu and percpu-refcount got cleaned up a bit
* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
percpu-refcount: require percpu_ref to be exited explicitly
percpu-refcount: use unsigned long for pcpu_count pointer
percpu-refcount: add helpers for ->percpu_count accesses
percpu-refcount: one bit is enough for REF_STATUS
percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
workqueue: stronger test in process_one_work()
workqueue: clear POOL_DISASSOCIATED in rebind_workers()
percpu: Use ALIGN macro instead of hand coding alignment calculation
percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
percpu: preffity percpu header files
percpu: use raw_cpu_*() to define __this_cpu_*()
percpu: reorder macros in percpu header files
percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
percpu: reorganize include/linux/percpu-defs.h
percpu: move accessors from include/linux/percpu.h to percpu-defs.h
percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
percpu: introduce arch_raw_cpu_ptr()
...
Pull workqueue updates from Tejun Heo:
"Lai has been doing a lot of cleanups of workqueue and kthread_work.
No significant behavior change. Just a lot of cleanups all over the
place. Some are a bit invasive but overall nothing too dangerous"
* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
kthread_work: remove the unused wait_queue_head
kthread_work: wake up worker only when the worker is idle
workqueue: use nr_node_ids instead of wq_numa_tbl_len
workqueue: remove the misnamed out_unlock label in get_unbound_pool()
workqueue: remove the stale comment in pwq_unbound_release_workfn()
workqueue: move rescuer pool detachment to the end
workqueue: unfold start_worker() into create_worker()
workqueue: remove @wakeup from worker_set_flags()
workqueue: remove an unneeded UNBOUND test before waking up the next worker
workqueue: wake regular worker if need_more_worker() when rescuer leave the pool
workqueue: alloc struct worker on its local node
workqueue: reuse the already calculated pwq in try_to_grab_pending()
workqueue: stronger test in process_one_work()
workqueue: clear POOL_DISASSOCIATED in rebind_workers()
workqueue: sanity check pool->cpu in wq_worker_sleeping()
workqueue: clear leftover flags when detached
workqueue: remove useless WARN_ON_ONCE()
workqueue: use schedule_timeout_interruptible() instead of open code
workqueue: remove the empty check in too_many_workers()
workqueue: use "pool->cpu < 0" to stand for an unbound pool
Pull crypto update from Herbert Xu:
- CTR(AES) optimisation on x86_64 using "by8" AVX.
- arm64 support to ccp
- Intel QAT crypto driver
- Qualcomm crypto engine driver
- x86-64 assembly optimisation for 3DES
- CTR(3DES) speed test
- move FIPS panic from module.c so that it only triggers on crypto
modules
- SP800-90A Deterministic Random Bit Generator (drbg).
- more test vectors for ghash.
- tweak self tests to catch partial block bugs.
- misc fixes.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits)
crypto: drbg - fix failure of generating multiple of 2**16 bytes
crypto: ccp - Do not sign extend input data to CCP
crypto: testmgr - add missing spaces to drbg error strings
crypto: atmel-tdes - Switch to managed version of kzalloc
crypto: atmel-sha - Switch to managed version of kzalloc
crypto: testmgr - use chunks smaller than algo block size in chunk tests
crypto: qat - Fixed SKU1 dev issue
crypto: qat - Use hweight for bit counting
crypto: qat - Updated print outputs
crypto: qat - change ae_num to ae_id
crypto: qat - change slice->regions to slice->region
crypto: qat - use min_t macro
crypto: qat - remove unnecessary parentheses
crypto: qat - remove unneeded header
crypto: qat - checkpatch blank lines
crypto: qat - remove unnecessary return codes
crypto: Resolve shadow warnings
crypto: ccp - Remove "select OF" from Kconfig
crypto: caam - fix DECO RSR polling
crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM
...
Pull timer fixes from Thomas Gleixner:
"Two fixes in the timer area:
- a long-standing lock inversion due to a printk
- suspend-related hrtimer corruption in sched_clock"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
sched_clock: Avoid corrupting hrtimer tree during suspend
This reverts commit 4fae4e7624.
Undo because it breaks working systems.
Requested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit d709f7bcbb.
Undo, because it might break exisiting functionality.
Requested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no
need to filter PageHuge() page is such a configuration either, so avoid
exporting the symbol to fix a build error:
In file included from kernel/kexec.c:14:0:
kernel/kexec.c: In function 'crash_save_vmcoreinfo_init':
kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function)
VMCOREINFO_SYMBOL(free_huge_page);
^
Introduced by commit 8f1d26d0e5 ("kexec: export free_huge_page to
VMCOREINFO")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My IBM email addresses haven't worked for years; also map some
old-but-functional forwarding addresses to my canonical address.
Update my GPG key fingerprint; I moved to 4096R a long time ago.
Update description.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.
If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.
We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():
int PageHuge(struct page *page)
{
if (!PageCompound(page))
return 0;
page = compound_head(page);
return get_compound_page_dtor(page) == free_huge_page;
}
For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The WARN_ON() is used to check if we break the legal hierarchy, on
which the effective mems should be equal to configured mems.
Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
If the worker is already executing a work item when another is queued,
we can safely skip wakeup without worrying about stalling queue thus
avoiding waking up the busy worker spuriously. Spurious wakeups
should be fine but still isn't nice and avoiding it is trivial here.
tj: Updated description.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch fix following warning caused by missing description
"overload" in kernel/sched/fair.c
Warning(.//kernel/sched/fair.c:5906): No description found for
parameter 'overload'
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1406518686-7274-1-git-send-email-standby24x7@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of passing around a magic number -1 for the sched_setparam()
policy, use a more descriptive macro name like SETPARAM_POLICY.
[ based on top of Daniel's sched_setparam() fix ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Daniel Bristot de Oliveira<bristot@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140723112826.6ed6cbce@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We hard assume that higher topology levels are supersets of lower
levels.
Detect, warn and try to fixup when we encounter this violated.
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Bruno Wolff III <bruno@wolff.to>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140722094740.GJ12054@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's no need to check cloned event's permission once the
parent was already checked.
Also the code is checking 'current' process permissions, which
is not owner process for cloned events, thus could end up with
wrong permission check result.
Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT1VYNAAoJEHm+PkMAQRiGQJwIAKSYp1Uqz5O/e5r0V1TlZKT4
1B4Njopl57PwSrJQWcGEuH2yHyM896vfPO4L6BJIOfyWzh8kwpQqclDt6uhXoF/v
OsO1zb/7/j+n/pDZsePqP9AyIgErsHEBgUbhecDqzjN++ITPcZjQ6TIMPglZaumN
jFAdAZuAaEwqAk8jqN2wlm689Fh9MuUEarHXbXLCqu5RgLrWhFGhp/cTWY62aqnZ
XfEeQ9KtpRZmlR/IYjerbb1eRH7ZdJsZ88WngLX9dj/JdNxHWBkWQBXGAusXk5Fk
y6LsIV3TjyBdrRKJ1Ifyg/2EIXHNBs8HxTFGXpjtp2HPuMLDxZOWOWikb9URtNg=
=Fjf4
-----END PGP SIGNATURE-----
Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The scheduler uses policy == -1 to preserve the current policy state to
implement sched_setparam(). But, as (int) -1 is equals to 0xffffffff,
it's matching the if (policy & SCHED_RESET_ON_FORK) on
_sched_setscheduler(). This match changes the policy value to an
invalid value, breaking the sched_setparam() syscall.
This patch checks policy == -1 before check the SCHED_RESET_ON_FORK flag.
The following program shows the bug:
int main(void)
{
struct sched_param param = {
.sched_priority = 5,
};
sched_setscheduler(0, SCHED_FIFO, ¶m);
param.sched_priority = 1;
sched_setparam(0, ¶m);
param.sched_priority = 0;
sched_getparam(0, ¶m);
if (param.sched_priority != 1)
printf("failed priority setting (found %d instead of 1)\n",
param.sched_priority);
else
printf("priority setting fine\n");
}
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> # 3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Fixes: 7479f3c9cf "sched: Move SCHED_RESET_ON_FORK into attr::sched_flags"
Link: http://lkml.kernel.org/r/9ebe0566a08dbbb3999759d3f20d6004bb2dbcfa.1406079891.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf fixes from Thomas Gleixner:
"A bunch of fixes for perf and kprobes:
- revert a commit that caused a perf group regression
- silence dmesg spam
- fix kprobe probing errors on ia64 and ppc64
- filter kprobe faults from userspace
- lockdep fix for perf exit path
- prevent perf #GP in KVM guest
- correct perf event and filters"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
kprobes/x86: Don't try to resolve kprobe faults from userspace
perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
perf/x86/intel: Protect LBR and extra_regs against KVM lying
perf: Fix lockdep warning on process exit
perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
perf: Revert ("perf: Always destroy groups on exit")
When suspend_device_irqs() iterates all descriptors, its pointless if
one has NO_SUSPEND set while another has not.
Validate on request_irq() that NO_SUSPEND state maches for SHARED
interrupts.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/20140724133921.GY6758@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After adding all the records to the tramp_hash, add a check that makes
sure that the number of records added matches the number of records
expected to match and do a WARN_ON and disable ftrace if they do
not match.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
to the ops hash if the rec has only one callback attached and the
ops is connected to the rec. It gives a nasty warning and shuts down
ftrace if the rec doesn't have a trampoline set for it. But this
can happen with the following scenario:
# cd /sys/kernel/debug/tracing
# echo schedule do_IRQ > set_ftrace_filter
# mkdir instances/foo
# echo schedule > instances/foo/set_ftrace_filter
# echo function_graph > current_function
# echo function > instances/foo/current_function
# echo nop > instances/foo/current_function
The above would then trigger the following warning and disable
ftrace:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
Call Trace:
[<ffffffff81502130>] ? dump_stack+0x4a/0x75
[<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
[<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
[<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
[<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
[<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
[<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
[<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
[<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
[<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
[<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
[<ffffffff81130e49>] ? vfs_write+0xac/0x11c
[<ffffffff8113115d>] ? SyS_write+0x60/0x8e
[<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
---[ end trace 938c4415cbc7dc96 ]---
------------[ cut here ]------------
Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.
Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.
Fixes: a08ca5d108 "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There's a helper function to get a ring buffer page size (the number
of bytes of data recorded on the page), called rb_page_size().
Use that instead of open coding it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.
This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.
This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.
This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.
Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.
Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.
The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.
Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.
1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.
2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.
This method performs fairly well in the simulator Miroslav created.
Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
we take the tk_xtime() value, which returns a timespec64, and
store it in a timespec.
This luckily is ok, since the only architectures that use
GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
64 bit systems where timespec64 is the same as a timespec.
Even so, for cleanliness reasons, use the conversion function
to assign the proper type.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Expose the new NMI safe accessor to clock monotonic to the tracer.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tracers want a correlated time between the kernel instrumentation and
user space. We really do not want to export sched_clock() to user
space, so we need to provide something sensible for this.
Using separate data structures with an non blocking sequence count
based update mechanism allows us to do that. The data structure
required for the readout has a sequence counter and two copies of the
timekeeping data.
On the update side:
smp_wmb();
tkf->seq++;
smp_wmb();
update(tkf->base[0], tk);
smp_wmb();
tkf->seq++;
smp_wmb();
update(tkf->base[1], tk);
On the reader side:
do {
seq = tkf->seq;
smp_rmb();
idx = seq & 0x01;
now = now(tkf->base[idx]);
smp_rmb();
} while (seq != tkf->seq)
So if a NMI hits the update of base[0] it will use base[1] which is
still consistent, but this timestamp is not guaranteed to be monotonic
across an update.
The timestamp is calculated by:
now = base_mono + clock_delta * slope
So if the update lowers the slope, readers who are forced to the
not yet updated second array are still using the old steeper slope.
tmono
^
| o n
| o n
| u
| o
|o
|12345678---> reader order
o = old slope
u = update
n = new slope
So reader 6 will observe time going backwards versus reader 5.
While other CPUs are likely to be able observe that, the only way
for a CPU local observation is when an NMI hits in the middle of
the update. Timestamps taken from that NMI context might be ahead
of the following timestamps. Callers need to be aware of that and
deal with it.
V2: Got rid of clock monotonic raw and reorganized the data
structures. Folded in the barrier fix from Mathieu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
All the function needs is in the tk_read_base struct. No functional
change for the current code, just a preparatory patch for the NMI safe
accessor to clock monotonic which will use struct tk_read_base as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The members of the new struct are the required ones for the new NMI
safe accessor to clcok monotonic. In order to reuse the existing
timekeeping code and to make the update of the fast NMI safe
timekeepers a simple memcpy use the struct for the timekeeper as well
and convert all users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Access to time requires to touch two cachelines at minimum
1) The timekeeper data structure
2) The clocksource data structure
The access to the clocksource data structure can be avoided as almost
all clocksource implementations ignore the argument to the read
callback, which is a pointer to the clocksource.
But the core needs to touch it to access the members @read and @mask.
So we are better off by copying the @read function pointer and the
@mask from the clocksource to the core data structure itself.
For the most used ktime_get() access all required data including the
@read and @mask copies fits together with the sequence counter into a
single 64 byte cacheline.
For the other time access functions we touch in the current code three
cache lines in the worst case. But with the clocksource data copies we
can reduce that to two adjacent cachelines, which is more efficient
than disjunct cache lines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
cycle_last was added to the clocksource to support the TSC
validation. We moved that to the core code, so we can get rid of the
extra copy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The only user of the cycle_last validation is the x86 TSC. In order to
provide NMI safe accessor functions for clock monotonic and
monotonic_raw we need to do that in the core.
We can't do the TSC specific
if (now < cycle_last)
now = cycle_last;
for the other wrapping around clocksources, but TSC has
CLOCKSOURCE_MASK(64) which actually does not mask out anything so if
now is less than cycle_last the subtraction will give a negative
result. So we can check for that in clocksource_delta() and return 0
for that case.
Implement and enable it for x86
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
We want to move the TSC sanity check into core code to make NMI safe
accessors to clock monotonic[_raw] possible. For this we need to
sanity check the delta calculation. Create a helper function and
convert all sites to use it.
[ Build fix from jstultz ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Provide a ktime_t based interface for raw monotonic time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
timekeeping_clocktai() is not used in fast pathes, so the extra
timespec conversion is not problematic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Subtracting plain nsec values and converting to timespec is simpler
than the whole timespec math. Not really fastpath code, so the
division is not an issue.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>