linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Magnus Damm	a25cbd045a	clocksource: setup mult_orig in clocksource_enable() Setup clocksource mult_orig in clocksource_enable(). Clocksource drivers can save power by using keeping the device clock disabled while the clocksource is unused. In practice this means that the enable() and disable() callbacks perform clk_enable() and clk_disable(). The enable() callback may also use clk_get_rate() to get the clock rate from the clock framework. This information can then be used to calculate the shift and mult variables. Currently the mult_orig variable is setup from mult at registration time only. This is conflicting with the above case since the clock is disabled and the mult variable is not yet calculated at the time of registration. Moving the mult_orig setup code to clocksource_enable() allows us to both handle the common case with no enable() callback and the mult-changed-after-enable() case. [ Impact: allow dynamic clock source usage ] Signed-off-by: Magnus Damm <damm@igel.co.jp> LKML-Reference: <20090501054546.8193.10688.sendpatchset@rx1.opensource.se> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-05-02 11:45:15 +02:00
Dmitri Vorobiev	a52f5c5620	clockevents: tick_broadcast_device can become static The variable tick_broadcast_device is not used outside of the file where it is defined, so let's make it static. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-05-02 10:31:14 +02:00
john stultz	74a03b69d1	clockevents: prevent endless loop in tick_handle_periodic() tick_handle_periodic() can lock up hard when a one shot clock event device is used in combination with jiffies clocksource. Avoid an endless loop issue by requiring that a highres valid clocksource be installed before we call tick_periodic() in a loop when using ONESHOT mode. The result is we will only increment jiffies once per interrupt until a continuous hardware clocksource is available. Without this, we can run into a endless loop, where each cycle through the loop, jiffies is updated which increments time by tick_period or more (due to clock steering), which can cause the event programming to think the next event was before the newly incremented time and fail causing tick_periodic() to be called again and the whole process loops forever. [ Impact: prevent hard lock up ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org	2009-05-02 10:22:27 +02:00
Magnus Damm	4614e6adaf	clocksource: add enable() and disable() callbacks Add enable() and disable() callbacks for clocksources. This allows us to put unused clocksources in power save mode. The functions clocksource_enable() and clocksource_disable() wrap the callbacks and are inserted in the timekeeping code to enable before use and disable after switching to a new clocksource. Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-21 13:41:47 -07:00
Magnus Damm	8e19608e8b	clocksource: pass clocksource to read() callback Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-21 13:41:47 -07:00
Linus Torvalds	6671de344c	Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits) posix timers: fix RLIMIT_CPU && fork() time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix time: ntp: clean up second_overflow() time: ntp: simplify ntp_tick_adj calculations time: ntp: make 64-bit constants more robust time: ntp: refactor do_adjtimex() some more time: ntp: refactor do_adjtimex() time: ntp: fix bug in ntp_update_offset() & do_adjtimex() time: ntp: micro-optimize ntp_update_offset() time: ntp: simplify ntp_update_offset_fll() time: ntp: refactor and clean up ntp_update_offset() time: ntp: refactor up ntp_update_frequency() time: ntp: clean up ntp_update_frequency() time: ntp: simplify the MAX_TICKADJ_SCALED definition time: ntp: simplify the second_overflow() code flow time: ntp: clean up kernel/time/ntp.c x86: hpet: stop HPET_COUNTER when programming periodic mode x86: hpet: provide separate functions to stop and start the counter x86: hpet: print HPET registers during setup (if hpet=verbose is used) time: apply NTP frequency/tick changes immediately ...	2009-03-26 16:05:42 -07:00
Ingo Molnar	7c526e1fef	Merge branches 'timers/new-apis', 'timers/ntp' and 'timers/urgent' into timers/core	2009-03-26 15:45:52 +01:00
John Stultz	a2a5ac8650	time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fix The time_status conditional was accidentally placed right after we clear the checked time_status bits, which causes us to take the conditional every time through. This fixes it by moving the conditional to before we clear the time_status bits. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 19:39:47 +01:00
Ingo Molnar	39854fe8c1	time: ntp: clean up second_overflow() Impact: cleanup, no functionality changed The 'time_adj' local variable is named in a very confusing way because it almost shadows the 'time_adjust' global variable - which is used in this same function. Rename it to 'delta' - to make them stand apart more clearly. kernel/time/ntp.o: text data bss dec hex filename 2545 114 144 2803 af3 ntp.o.before 2545 114 144 2803 af3 ntp.o.after md5: 1bf0b3be564512279ba7cee299d1d2be ntp.o.before.asm 1bf0b3be564512279ba7cee299d1d2be ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:17 +01:00
Ingo Molnar	069569e025	time: ntp: simplify ntp_tick_adj calculations Impact: micro-optimization Convert the (internal) ntp_tick_adj value we store from unscaled units to scaled units. This is a constant that we never modify, so scaling it up once during bootup is enough - we dont have to do it for every adjustment step. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:16 +01:00
Ingo Molnar	2b9d1496e7	time: ntp: make 64-bit constants more robust Impact: cleanup, no functionality changed - make PPM_SCALE an explicit s64 constant, to remove (s64) casts from usage sites. kernel/time/ntp.o: text data bss dec hex filename 2536 114 136 2786 ae2 ntp.o.before 2536 114 136 2786 ae2 ntp.o.after md5: 40a7728d1188aa18e83e21a81fa7b150 ntp.o.before.asm 40a7728d1188aa18e83e21a81fa7b150 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:15 +01:00
Ingo Molnar	e96291653b	time: ntp: refactor do_adjtimex() some more Impact: cleanup, no functionality changed Further simplify do_adjtimex(): - introduce the ntp_start_leap_timer() helper function - eliminate the goto adj_done complication Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:15 +01:00
Ingo Molnar	80f2257116	time: ntp: refactor do_adjtimex() Impact: cleanup, no functionality changed do_adjtimex() is currently a monster function with a maze of branches. Refactor the txc->modes setting aspects of it into two new helper functions: process_adj_status() process_adjtimex_modes() kernel/time/ntp.o: text data bss dec hex filename 2512 114 136 2762 aca ntp.o.before 2512 114 136 2762 aca ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:14 +01:00
Ingo Molnar	10dd31a7a1	time: ntp: fix bug in ntp_update_offset() & do_adjtimex() Impact: change (fix) the way the NTP PLL seconds offset is initialized/tracked Fix a bug and do a micro-optimization: When PLL is enabled we do not reset time_reftime. If the PLL was off for a long time (for example after bootup), this is arguably the wrong thing to do. We already had a hack for the common boot-time case in ntp_update_offset(), in form of: if (unlikely(time_status & STA_FREQHOLD \|\| time_reftime == 0)) secs = 0; But the update delta should be reset later on too - not just when the PLL is enabled for the first time after bootup. So do it on !STA_PLL -> STA_PLL transitions. This changes behavior, as previously if ntpd was disabled for a long time and we restarted it, we'd run from that last update, with a very large delta. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:13 +01:00
Ingo Molnar	c7986acba2	time: ntp: micro-optimize ntp_update_offset() Impact: cleanup, no functionality changed The time_reftime update in ntp_update_offset() to xtime.tv_sec is a convoluted way of saying that we want to freeze the frequency and want the 'secs' delta to be 0. Also make this branch unlikely. This shaves off 8 bytes from the code size: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2496 114 136 2746 aba ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:12 +01:00
Ingo Molnar	478b7aab16	time: ntp: simplify ntp_update_offset_fll() Impact: cleanup, no functionality changed Change ntp_update_offset_fll() to delta logic instead of absolute value logic. This eliminates 'freq_adj' from the function. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:12 +01:00
Ingo Molnar	f939890b66	time: ntp: refactor and clean up ntp_update_offset() Impact: cleanup, no functionality changed - introduce the ntp_update_offset_fll() helper - clean up the flow and variable naming kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.before.asm 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:11 +01:00
Ingo Molnar	bc26c31d44	time: ntp: refactor up ntp_update_frequency() Impact: cleanup, no functionality changed Change ntp_update_frequency() from a hard to follow code flow that uses global variables as temporaries, to a clean input+output flow. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:10 +01:00
Ingo Molnar	9ce616aaef	time: ntp: clean up ntp_update_frequency() Impact: cleanup, no functionality changed Prepare a refactoring of ntp_update_frequency(). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:09 +01:00
Ingo Molnar	bbd1267690	time: ntp: simplify the MAX_TICKADJ_SCALED definition Impact: cleanup, no functionality changed There's an ugly u64 typecase in the MAX_TICKADJ_SCALED definition, this can be eliminated by making the MAX_TICKADJ constant's type 64-bit (signed). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:08 +01:00
Ingo Molnar	3c972c2444	time: ntp: simplify the second_overflow() code flow Impact: cleanup, no functionality changed Instead of a hierarchy of conditions, transform them to clean gradual conditions and return's. This makes the flow easier to read and makes the purpose of the function easier to understand. kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:07 +01:00
Ingo Molnar	53bbfa9e94	time: ntp: clean up kernel/time/ntp.c Impact: cleanup, no functionality changed Make this file a bit more readable by applying a consistent coding style. No code changed: kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 18:38:06 +01:00
john stultz	fdcedf7b75	time: apply NTP frequency/tick changes immediately Since the GENERIC_TIME changes landed, the adjtimex behavior changed for struct timex.tick and .freq changed. When the tick or freq value is set, we adjust the tick_length_base in ntp_update_frequency(). However, this new value doesn't get applied to tick_length until the next second (via second_overflow). This means some applications that do quick time tweaking do not see the requested change made as quickly as expected. I've run a few tests with this change, and ntpd still functions fine. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-19 10:10:08 +01:00
Patrick Ohly	a75244c3d5	timecompare: generic infrastructure to map between two time bases Mapping from a struct timecounter to a time returned by functions like ktime_get_real() is implemented. This is sufficient to use this code in a network device driver which wants to support hardware time stamping and transformation of hardware time stamps to system time. The interface could have been made more versatile by not depending on a time counter, but this wasn't done to avoid writing glue code elsewhere. The method implemented here is the one used and analyzed under the name "assisted PTP" in the LCI PTP paper: http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-15 22:43:32 -08:00
Patrick Ohly	a038a353c3	clocksource: allow usage independent of timekeeping.c So far struct clocksource acted as the interface between time/timekeeping.c and hardware. This patch generalizes the concept so that a similar interface can also be used in other contexts. For that it introduces new structures and related functions without touching the existing struct clocksource. The reasons for adding these new structures to clocksource.[ch] are * the APIs are clearly related * struct clocksource could be cleaned up to use the new structs * avoids proliferation of files with similar names (timesource.h? timecounter.h?) As outlined in the discussion with John Stultz, this patch adds * struct cyclecounter: stateless API to hardware which counts clock cycles * struct timecounter: stateful utility code built on a cyclecounter which provides a nanosecond counter * only the function to read the nanosecond counter; deltas are used internally and not exposed to users of timecounter The code does no locking of the shared state. It must be called at least as often as the cycle counter wraps around to detect these wrap arounds. Both is the responsibility of the timecounter user. Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-15 22:43:31 -08:00
Ingo Molnar	5ba1ae92b6	Merge branches 'timers/clockevents', 'timers/hpet', 'timers/hrtimers' and 'timers/urgent' into timers/core	2009-02-08 20:14:11 +01:00
Sebastien Dugue	94df7de028	hrtimers: allow the hot-unplugging of all cpus Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 22:35:29 +01:00
Magnus Damm	2d68259db2	clockevents: let set_mode() setup delta information Allow the set_mode() clockevent callback to decide and fill in delta details such as shift, mult, max_delta_ns and min_delta_ns. With this change the clockevent can be registered without delta details which allows us to keep the parent clock disabled until the clockevent gets setup using set_mode(). Letting set_mode() fill in or update delta details allows us to save power by disabling the parent clock while the clockevent is unused. This may however make the parent clock rate change, so next time the clockevent gets enabled we need let set_mode() to update the detla details accordingly. Doing it at registration time is not enough. Furthermore, the delta details seem unused in the case of periodic-only clockevent drivers, so this change also allows registration of such drivers without the delta details filled in. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-01-16 12:27:39 +01:00
Jaswinder Singh Rajput	934d96eafa	time-sched.c: tick_nohz_update_jiffies should be static Impact: cleanup, reduce kernel size a bit, avoid sparse warning Fixes sparse warning: kernel/time/tick-sched.c:137:6: warning: symbol 'tick_nohz_update_jiffies' was not declared. Should it be static? Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-15 12:06:56 +01:00
Ingo Molnar	e3ee1e1231	Merge commit 'v2.6.29-rc1' into timers/hrtimers Conflicts: kernel/time/tick-common.c	2009-01-12 11:32:03 +01:00
Linus Torvalds	57c44c5f6f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) trivial: chack -> check typo fix in main Makefile trivial: Add a space (and a comma) to a printk in 8250 driver trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx trivial: Fix misspelling of "firmware" in powerpc Makefile trivial: Fix misspelling of "firmware" in usb.c trivial: Fix misspelling of "firmware" in qla1280.c trivial: Fix misspelling of "firmware" in a100u2w.c trivial: Fix misspelling of "firmware" in megaraid.c trivial: Fix misspelling of "firmware" in ql4_mbx.c trivial: Fix misspelling of "firmware" in acpi_memhotplug.c trivial: Fix misspelling of "firmware" in ipw2100.c trivial: Fix misspelling of "firmware" in atmel.c trivial: Fix misspelled firmware in Kconfig trivial: fix an -> a typos in documentation and comments trivial: fix then -> than typos in comments and documentation trivial: update Jesper Juhl CREDITS entry with new email trivial: fix singal -> signal typo trivial: Fix incorrect use of "loose" in event.c trivial: printk: fix indentation of new_text_line declaration trivial: rtc-stk17ta8: fix sparse warning ...	2009-01-07 11:31:52 -08:00
Frederik Schwarzer	025dfdafe7	trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:06 +01:00
Ingo Molnar	d9be28ea91	Merge branches 'sched/clock', 'sched/cleanups' and 'linus' into sched/urgent	2009-01-06 09:33:57 +01:00
Linus Torvalds	7d3b56ba37	Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...	2009-01-03 12:04:39 -08:00
Linus Torvalds	61420f59a5	Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting	2009-01-03 11:56:24 -08:00
Mike Travis	7eb1955336	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 18:53:31 +01:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Rusty Russell	5db0e1e9e0	cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ Impact: cleanup Simple replacement, now the _nr is redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@redhat.com>	2009-01-01 10:12:29 +10:30
Rusty Russell	6b954823c2	cpumask: convert kernel time functions Impact: Use new APIs Convert kernel/time functions to use struct cpumask *. Note the ugly bitmap declarations in tick-broadcast.c. These should be cpumask_var_t, but there was no obvious initialization function to put the alloc_cpumask_var() calls in. This was safe. (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK, so we use a bitmap here to show we really mean it). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-01 10:12:25 +10:30
Martin Schwidefsky	79741dd357	[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Martin Schwidefsky	457533a7d3	[PATCH] fix scaled & unscaled cputime accounting The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Rusty Russell	2ca1a61583	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/x86/kernel/io_apic.c	2008-12-31 23:05:57 +10:30
Thomas Gleixner	1c5745aa38	sched_clock: prevent scd->clock from moving backwards, take #2 Redo: 5b7dba4: sched_clock: prevent scd->clock from moving backwards which had to be reverted due to s2ram hangs: ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards" ... this time with resume restoring GTOD later in the sequence taken into account as well. The "timekeeping_suspended" flag is not very nice but we cannot call into GTOD before it has been properly resumed and the scheduler will run very early in the resume sequence. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-31 09:53:21 +01:00
Sebastien Dugue	5762ba1873	hrtimers: allow the hot-unplugging of all cpus Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 07:37:19 +01:00
Ingo Molnar	32e8d18683	Merge branches 'timers/clocksource', 'timers/hpet', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/rtc' into timers/core	2008-12-25 18:02:25 +01:00
Rusty Russell	968ea6d80e	Merge ../linux-2.6-x86 Conflicts: arch/x86/kernel/io_apic.c kernel/sched.c kernel/sched_stats.h	2008-12-13 21:55:51 +10:30
Rusty Russell	320ab2b0b1	cpumask: convert struct clock_event_device to cpumask pointers. Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>	2008-12-13 21:20:26 +10:30
Rusty Russell	0de26520c7	cpumask: make irq_set_affinity() take a const struct cpumask Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	2008-12-13 21:20:26 +10:30
Woodruff, Richard	001474491f	nohz: suppress needless timer reprogramming In my device I get many interrupts from a high speed USB device in a very short period of time. The system spends a lot of time reprogramming the hardware timer which is in a slower timing domain as compared to the CPU. This results in the CPU spending a huge amount of time waiting for the timer posting to be done. All of this reprogramming is useless as the wake up time has not changed. As measured using ETM trace this drops my reprogramming penalty from almost 60% CPU load down to 15% during high interrupt rate. I can send traces to show this. Suppress setting of duplicate timer event when timer already stopped. Timer programming can be very costly and can result in long cpu stall/wait times. [akpm@linux-foundation.org: coding-style fixes] [tglx@linutronix.de: move the check to the right place and avoid raising the softirq for nothing] Signed-off-by: Richard Woodruff <r-woodruff2@ti.com> Cc: johnstul@us.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 16:55:31 +01:00
Ingo Molnar	45ab6b0c76	Merge branch 'sched/core' into cpus4096 Conflicts: include/linux/ftrace.h kernel/sched.c	2008-12-12 13:48:57 +01:00
Heiko Carstens	fa116ea35e	nohz: no softirq pending warnings for offline cpus Impact: remove false positive warning After a cpu was taken down during cpu hotplug (read: disabled for interrupts) it still might have pending softirqs. However take_cpu_down makes sure that the idle task will run next instead of ksoftirqd on the taken down cpu. The idle task will call tick_nohz_stop_sched_tick which might warn about pending softirqs just before the cpu kills itself completely. However the pending softirqs on the dead cpu aren't a problem because they will be moved to an online cpu during CPU_DEAD handling. So make sure we warn only for online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 07:27:01 +01:00
john stultz	6c9bacb41c	time: catch xtime_nsec underflows and fix them Impact: fix time warp bug Alex Shi, along with Yanmin Zhang have been noticing occasional time inconsistencies recently. Through their great diagnosis, they found that the xtime_nsec value used in update_wall_time was occasionally going negative. After looking through the code for awhile, I realized we have the possibility for an underflow when three conditions are met in update_wall_time(): 1) We have accumulated a second's worth of nanoseconds, so we incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This doesn't cause xtime_nsec to go negative, but it can cause it to be small). 2) The remaining offset value is large, but just slightly less then cycle_interval. 3) clocksource_adjust() is speeding up the clock, causing a corrective amount (compensating for the increase in the multiplier being multiplied against the unaccumulated offset value) to be subtracted from xtime_nsec. This can cause xtime_nsec to underflow. Unfortunately, since we notify the NTP subsystem via second_overflow() whenever we accumulate a full second, and this effects the error accumulation that has already occured, we cannot simply revert the accumulated second from xtime nor move the second accumulation to after the clocksource_adjust call without a change in behavior. This leaves us with (at least) two options: 1) Simply return from clocksource_adjust() without making a change if we notice the adjustment would cause xtime_nsec to go negative. This would work, but I'm concerned that if a large adjustment was needed (due to the error being large), it may be possible to get stuck with an ever increasing error that becomes too large to correct (since it may always force xtime_nsec negative). This may just be paranoia on my part. 2) Catch xtime_nsec if it is negative, then add back the amount its negative to both xtime_nsec and the error. This second method is consistent with how we've handled earlier rounding issues, and also has the benefit that the error being added is always in the oposite direction also always equal or smaller then the correction being applied. So the risk of a corner case where things get out of control is lessened. This patch fixes bug 11970, as tested by Yanmin Zhang http://bugzilla.kernel.org/show_bug.cgi?id=11970 Reported-by: alex.shi@intel.com Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-04 08:43:02 +01:00
Peter Zijlstra	ca109491f6	hrtimer: removing all ur callback modes Impact: cleanup, move all hrtimer processing into hardirq context This is an attempt at removing some of the hrtimer complexity by reducing the number of callback modes to 1. This means that all hrtimer callback functions will be ran from HARD-irq context. I went through all the 30 odd hrtimer callback functions in the kernel and saw only one that I'm not quite sure of, which is the one in net/can/bcm.c - hence I'm CC-ing the folks responsible for that code. Furthermore, the hrtimer core now calls callbacks directly with IRQs disabled in case you try to enqueue an expired timer. If this timer is a periodic timer (which should use hrtimer_forward() to advance its time) then it might be possible to end up in an inf. recursive loop due to the fact that hrtimer_forward() doesn't round up to the next timer granularity, and therefore keeps on calling the callback - obviously this needs a fix. Aside from that, this seems to compile and actually boot on my dual core test box - although I'm sure there are some bugs in, me not hitting any makes me certain :-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 15:45:46 +01:00
Rusty Russell	6a7b3dc344	sched: convert nohz_cpu_mask to cpumask_var_t. Impact: (future) size reduction for large NR_CPUS. Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-24 17:51:10 +01:00
Thomas Gleixner	ae99286b4f	nohz: disable tick_nohz_kick_tick() for now Impact: nohz powersavings and wakeup regression commit `fb02fbc14d` (NOHZ: restart tick device from irq_enter()) causes a serious wakeup regression. While the patch is correct it does not take into account that spurious wakeups happen on x86. A fix for this issue is available, but we just revert to the .27 behaviour and let long running softirqs screw themself. Disable it for now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-11-10 22:39:27 +01:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Thomas Gleixner	c4bd822e7b	NOHZ: fix thinko in the timer restart code path commit `fb02fbc14d` (NOHZ: restart tick device from irq_enter()) solves the problem of stale jiffies when long running softirqs happen in a long idle sleep period, but it has a major thinko in it: When the interrupt which came in _is_ the timer interrupt which should expire ts->sched_timer then we cancel and rearm the timer _before_ it gets expired in hrtimer_interrupt() to the next period. That means the call back function is not called. This game can go on for ever :( Prevent this by making sure to only rearm the timer when the expiry time is more than one tick_period away. Otherwise keep it running as it is either already expired or will expiry at the right point to update jiffies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Venkatesch Pallipadi <venkatesh.pallipadi@intel.com>	2008-10-21 20:53:24 +02:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Thomas Gleixner	870e2a2845	timer_list: add base address to clock base The base address of a (per cpu) clock base is a useful debug info. Add it and bump the version number of timer_lists. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Thomas Gleixner	c5b77a3d3a	timer_list: print cpu number of clockevents device The per cpu clock events device output of timer_list lacks an association of the device to the cpu which is annoying when looking at the output of /proc/timer_list from a 128 way system. Add the CPU number info and mark the broadcast device in the device list printout. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Thomas Gleixner	e67ef25a35	timer_list: print real timer address The current timer_list output prints the address of the on stack copy of the active hrtimer instead of the hrtimer itself. Print the address of the real timer instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Arjan van de Ven	651dab4264	Merge commit 'linus/master' into merge-linus Conflicts: arch/x86/kvm/i8254.c	2008-10-17 09:20:26 -07:00
Thomas Gleixner	fb02fbc14d	NOHZ: restart tick device from irq_enter() We did not restart the tick device from irq_enter() to avoid double reprogramming and extra events in the return immediate to idle case. But long lasting softirqs can lead to a situation where jiffies become stale: idle() tick stopped (reprogrammed to next pending timer) halt() interrupt jiffies updated from irq_enter() interrupt handler softirq function 1 runs 20ms softirq function 2 arms a 10ms timer with a stale jiffies value jiffies updated from irq_exit() timer wheel has now an already expired timer (the one added in function 2) timer fires and timer softirq runs This was discovered when debugging a timer problem which happend only when the ath5k driver is active. The debugging proved that there is a softirq function running for more than 20ms, which is a bug by itself. To solve this we restart the tick timer right from irq_enter(), but do not go through the other functions which are necessary to return from idle when need_resched() is set. Reported-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Elias Oltmanns <eo@nebensachen.de>	2008-10-17 18:13:38 +02:00
Thomas Gleixner	c34bec5a44	NOHZ: split tick_nohz_restart_sched_tick() Split out the clock event device reprogramming. Preparatory patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-17 18:13:38 +02:00
Thomas Gleixner	719254faa1	NOHZ: unify the nohz function calls in irq_enter() We have two separate nohz function calls in irq_enter() for no good reason. Just call a single NOHZ function from irq_enter() and call the bits in the tick code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-17 18:13:38 +02:00
Linus Torvalds	e533b22705	Merge branch 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails softirq, warning fix: correct a format to avoid a warning softirqs, debug: preemption check x86, pci-hotplug, calgary / rio: fix EBDA ioremap() IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description dmi scan: warn about too early calls to dmi_check_system() generic: redefine resource_size_t as phys_addr_t generic: make PFN_PHYS explicitly return phys_addr_t generic: add phys_addr_t for holding physical addresses softirq: allocate less vectors IO resources: fix/remove printk printk: robustify printk, update comment printk: robustify printk, fix #2 printk: robustify printk, fix printk: robustify printk Fixed up conflicts in: arch/powerpc/include/asm/types.h arch/powerpc/platforms/Kconfig.cputype manually.	2008-10-16 15:17:40 -07:00
Jan Beulich	9ba16087d9	Kconfig: eliminate "def_bool n" constructs Using "def_bool n" is pointless, simply using bool here appears more appropriate. Further, retaining such options that don't have a prompt and aren't selected by anything seems also at least questionable. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:31 -07:00
Ingo Molnar	6b2ada8210	Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk' and 'core/misc' into core-v28-for-linus	2008-10-15 12:48:44 +02:00
venkatesh.pallipadi@intel.com	8083e4ad97	[CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor export get_cpu_idle_time_us() for it to be used in ondemand governor. Last update time can be current time when the CPU is currently non-idle, accounting for the busy time since last idle. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>	2008-10-09 13:52:44 -04:00
Thomas Gleixner	07454bfff1	clockevents: check broadcast tick device not the clock events device Impact: jiffies increment too fast. Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get incremented too fast. The reason is a wrong check in the broadcast enter/exit code, which keeps the local apic timer in periodic mode when the switch happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-04 10:51:07 +02:00
Thomas Gleixner	ccc7dadf73	hrtimer: prevent migration of per CPU hrtimers Impact: per CPU hrtimers can be migrated from a dead CPU The hrtimer code has no knowledge about per CPU timers, but we need to prevent the migration of such timers and warn when such a timer is active at migration time. Explicitely mark the timers as per CPU and use a more understandable mode descriptor for the interrupts safe unlocked callback mode, which is used by hrtimer_sleeper and the scheduler code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:14 +02:00
Roman Zippel	d40e944c25	ntp: improve adjtimex frequency rounding Change PPM_SCALE_INV_SHIFT so that it doesn't throw away any input bits (19 is the amount of the factor 2 in PPM_SCALE), the output frequency can then be calculated back to its input value, as the inverse divide produce a slightly larger value, which is then correctly rounded by the final shift. Reported-by: Martin Ziegler <ziegler@uni-freiburg.de> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:13 +02:00
Roman Zippel	5cd1c9c5cf	timekeeping: fix rounding problem during clock update Due to a rounding problem during a clock update it's possible for readers to observe the clock jumping back by 1nsec. The following simplified example demonstrates the problem: cycle xtime 0 0 1000 999999.6 2000 1999999.2 3000 2999998.8 ... 1500 = 1499999.4 = 0.0 + 1499999.4 = 999999.6 + 499999.8 When reading the clock only the full nanosecond part is used, while timekeeping internally keeps nanosecond fractions. If the clock is now updated at cycle 1500 here, a nanosecond is missing due to the truncation. The simple fix is to round up the xtime value during the update, this also changes the distance to the reference time, but the adjustment will automatically take care that it stays under control. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:13 +02:00
Maciej W. Rozycki	eb3f938fd6	ntp: let update_persistent_clock() sleep This is a change that makes the 11-minute RTC update be run in the process context. This is so that update_persistent_clock() can sleep, which may be required for certain types of RTC hardware -- most notably I2C devices. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Rik van Riel <riel@redhat.com> Cc: David Brownell <david-b@pacbell.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:12 +02:00
Ingo Molnar	f8e256c687	timers: fix build error in !oneshot case kernel/time/tick-common.c: In function ‘tick_setup_periodic’: kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’ Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-23 12:57:00 +02:00
Thomas Gleixner	27ce4cb4a0	clockevents: prevent mode mismatch on cpu online Impact: timer hang on CPU online observed on AMD C1E systems When a CPU is brought online then the broadcast machinery can be in the one shot state already. Check this and setup the timer device of the new CPU in one shot mode so the broadcast code can pick up the next_event value correctly. Another AMD C1E oddity, as we switch to broadcast immediately and not after the full bring up via the ACPI cpu idle code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	302745699c	clockevents: check broadcast device not tick device Impact: Possible hang on CPU online observed on AMD C1E machines. The broadcast setup code looks at the mode of the tick device to determine whether it needs to be shut down or setup. This is wrong when the broadcast mode is set to one shot already. This can happen when a CPU is brought online as it goes through the periodic setup first. The problem went unnoticed as sane systems do not call into that code before the switch to one shot for the clock event device happens. The AMD C1E idle routine switches over immediately and thereby shuts down the just setup device before the first interrupt happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	49d670fb8d	clockevents: prevent stale tick_next_period for onlining CPUs Impact: possible hang on CPU onlining in timer one shot mode. The tick_next_period variable is only used during boot on nohz/highres enabled systems, but for CPU onlining it needs to be maintained when the per cpu clock events device operates in one shot mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	6441402b1f	clockevents: prevent cpu online to interfere with nohz Impact: rare hang which can be triggered on CPU online. tick_do_timer_cpu keeps track of the CPU which updates jiffies via do_timer. The value -1 is used to signal, that currently no CPU is doing this. There are two cases, where the variable can have this state: boot: necessary for systems where the boot cpu id can be != 0 nohz long idle sleep: When the CPU which did the jiffies update last goes into a long idle sleep it drops the update jiffies duty so another CPU which is not idle can pick it up and keep jiffies going. Using the same value for both situations is wrong, as the CPU online code can see the -1 state when the timer of the newly onlined CPU is setup. The setup for a newly onlined CPU goes through periodic mode and can pick up the do_timer duty without being aware of the nohz / highres mode of the already running system. Use two separate states and make them constants to avoid magic numbers confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:52 +02:00
Thomas Gleixner	2344abbcbd	clockevents: make device shutdown robust The device shut down does not cleanup the next_event variable of the clock event device. So when the device is reactivated the possible stale next_event value can prevent the device to be reprogrammed as it claims to wait on a event already. This is the root cause of the resurfacing suspend/resume problem, where systems need key press to come back to life. Fix this by setting next_event to KTIME_MAX when the device is shut down. Use a separate function for shutdown which takes care of that and only keep the direct set mode call in the broadcast code, where we can not touch the next_event value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-16 13:47:02 -07:00
Thomas Gleixner	61c22c34c6	clockevents: remove WARN_ON which was used to gather information The issue of the endless reprogramming loop due to a too small min_delta_ns was fixed with the previous updates of the clock events code, but we had no information about the spread of this problem. I added a WARN_ON to get automated information via kerneloops.org and to get some direct reports, which allowed me to analyse the affected machines. The WARN_ON has served its purpose and would be annoying for a release kernel. Remove it and just keep the information about the increase of the min_delta_ns value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-09 22:20:01 +02:00
Arjan van de Ven	704af52bd1	hrtimer: show the timer ranges in /proc/timer_list to help debugging and visibility of timer ranges, show them in the existing timer list in /proc/timer_list Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-07 16:10:20 -07:00
Linus Torvalds	f532522565	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource, acpi_pm.c: check for monotonicity clocksource, acpi_pm.c: use proper read function also in errata mode ntp: fix calculation of the next jiffie to trigger RTC sync x86: HPET: read back compare register before reading counter x86: HPET fix moronic 32/64bit thinko clockevents: broadcast fixup possible waiters HPET: make minimum reprogramming delta useful clockevents: prevent endless loop lockup clockevents: prevent multiple init/shutdown clockevents: enforce reprogram in oneshot setup clockevents: prevent endless loop in periodic broadcast handler clockevents: prevent clockevent event_handler ending up handler_noop	2008-09-06 19:33:26 -07:00
Maciej W. Rozycki	4ff4b9e19a	ntp: fix calculation of the next jiffie to trigger RTC sync We have a bug in the calculation of the next jiffie to trigger the RTC synchronisation. The aim here is to run sync_cmos_clock() as close as possible to the middle of a second. Which means we want this function to be called less than or equal to half a jiffie away from when now.tv_nsec equals 5e8 (500000000). If this is not the case for a given call to the function, for this purpose instead of updating the RTC we calculate the offset in nanoseconds to the next point in time where now.tv_nsec will be equal 5e8. The calculated offset is then converted to jiffies as these are the unit used by the timer. Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode, where the resulting value is rounded up. As a result the range of now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2. As a result if for example sync_cmos_clock() happens to be called at the time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 + TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the same range of now.tv_nsec again. Similarly for cases offsetted by an integer multiple of TICK_NSEC. This change addresses the problem by subtracting TICK_NSEC / 2 from the nanosecond offset to the next point in time where now.tv_nsec will be equal 5e8, effectively shifting the following rounding in timespec_to_jiffies() so that it produces a rounded-to-nearest result. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 15:31:48 +02:00
Ingo Molnar	77dd3b3bd2	Merge branch 'linus' into timers/ntp	2008-09-06 15:31:03 +02:00
Thomas Gleixner	7300711e8c	clockevents: broadcast fixup possible waiters Until the C1E patches arrived there where no users of periodic broadcast before switching to oneshot mode. Now we need to trigger a possible waiter for a periodic broadcast when switching to oneshot mode. Otherwise we can starve them for ever. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-06 07:21:17 +02:00
Arjan van de Ven	cc584b213f	hrtimer: convert kernel/* to the new hrtimer apis In order to be able to do range hrtimers we need to use accessor functions to the "expire" member of the hrtimer struct. This patch converts kernel/* to these accessors. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:13 -07:00
Peter Zijlstra	56c7426b39	sched_clock: fix NOHZ interaction If HLT stops the TSC, we'll fail to account idle time, thereby inflating the actual process times. Fix this by re-calibrating the clock against GTOD when leaving nohz mode. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 18:14:08 +02:00
Thomas Gleixner	1fb9b7d29d	clockevents: prevent endless loop lockup The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a too small value of the HPET minumum delta for programming an event. The clockevents code needs to enforce an interrupt event on the clock event device in some cases. The enforcement code was stupid and naive, as it just added the minimum delta to the current time and tried to reprogram the device. When the minimum delta is too small, then this loops forever. Add a sanity check. Allow reprogramming to fail 3 times, then print a warning and double the minimum delta value to make sure, that this does not happen again. Use the same function for both tick-oneshot and tick-broadcast code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:53 +02:00
Thomas Gleixner	9c17bcda99	clockevents: prevent multiple init/shutdown While chasing the C1E/HPET bugreports I went through the clock events code inch by inch and found that the broadcast device can be initialized and shutdown multiple times. Multiple shutdowns are not critical, but useless waste of time. Multiple initializations are simply broken. Another CPU might have the device in use already after the first initialization and the second init could just render it unusable again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	7205656ab4	clockevents: enforce reprogram in oneshot setup In tick_oneshot_setup we program the device to the given next_event, but we do not check the return value. We need to make sure that the device is programmed enforced so the interrupt handler engine starts working. Split out the reprogramming function from tick_program_event() and call it with the device, which was handed in to tick_setup_oneshot(). Set the force argument, so the devices is firing an interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	d4496b3955	clockevents: prevent endless loop in periodic broadcast handler The reprogramming of the periodic broadcast handler was broken, when the first programming returned -ETIME. The clockevents code stores the new expiry value in the clock events device next_event field only when the programming time has not been elapsed yet. The loop in question calculates the new expiry value from the next_event value and therefor never increases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Venkatesh Pallipadi	7c1e768974	clockevents: prevent clockevent event_handler ending up handler_noop There is a ordering related problem with clockevents code, due to which clockevents_register_device() called after tickless/highres switch will not work. The new clockevent ends up with clockevents_handle_noop as event handler, resulting in no timer activity. The problematic path seems to be * old device already has hrtimer_interrupt as the event_handler * new clockevent device registers with a higher rating * tick_check_new_device() is called * clockevents_exchange_device() gets called * old->event_handler is set to clockevents_handle_noop * tick_setup_device() is called for the new device * which sets new->event_handler using the old->event_handler which is noop. Change the ordering so that new device inherits the proper handler. This does not have any issue in normal case as most likely all the clockevent devices are setup before the highres switch. But, can potentially be affecting some corner case where HPET force detect happens after the highres switch. This was a problem with HPET in MSI mode code that we have been experimenting with. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Roman Zippel	916c7a8551	ntp: fix ADJ_OFFSET_SS_READ bug and do_adjtimex() cleanup Thanks to the review by Michael Kerrisk a bug in the recent ADJ_OFFSET_SS_READ option was discovered, where the ntp time_offset was inadvertently set by it. This fixes this by making the adjtime code more separate from the ntp_adjtime code (both of which really want to be separate syscalls). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 06:40:18 +02:00
Miao Xie	3c4fbe5e01	nohz: fix wrong event handler after online an offlined cpu On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after I made an offlined cpu online, I found this cpu's event handler was tick_handle_periodic, not tick_nohz_handler. After debuging, I found this bug was caused by the wrong tick mode. the tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline. This patch fixes this bug. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:54:06 +02:00
John Stultz	2d42244ae7	clocksource: introduce CLOCK_MONOTONIC_RAW In talking with Josip Loncaric, and his work on clock synchronization (see btime.sf.net), he mentioned that for really close synchronization, it is useful to have access to "hardware time", that is a notion of time that is not in any way adjusted by the clock slewing done to keep close time sync. Part of the issue is if we are using the kernel's ntp adjusted representation of time in order to measure how we should correct time, we can run into what Paul McKenney aptly described as "Painting a road using the lines we're painting as the guide". I had been thinking of a similar problem, and was trying to come up with a way to give users access to a purely hardware based time representation that avoided users having to know the underlying frequency and mask values needed to deal with the wide variety of possible underlying hardware counters. My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a nanosecond based time value, that increments starting at bootup and has no frequency adjustments made to it what so ever. The time is accessed from userspace via the posix_clock_gettime() syscall, passing CLOCK_MONOTONIC_RAW as the clock_id. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:24 +02:00
Roman Zippel	9a055117d3	clocksource: introduce clocksource_forward_now() To keep the raw monotonic patch simple first introduce clocksource_forward_now(), which takes care of the offset since the last update_wall_time() call and adds it to the clock, so there is no need anymore to deal with it explicitly at various places, which need to make significant changes to the clock. This is also gets rid of the timekeeping_suspend_nsecs, instead of waiting until resume, the value is accumulated during suspend. In the end there is only a single user of __get_nsec_offset() left, so I integrated it back to getnstimeofday(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:24 +02:00
John Stultz	1aa5dfb751	clocksource: keep track of original clocksource frequency The clocksource frequency is represented by clocksource->mult/2^(clocksource->shift). Currently, when NTP makes adjustments to the clock frequency, they are made directly to the mult value. This has the drawback that once changed, we cannot know what the orignal mult value was, or how much adjustment has been applied. This property causes problems in calculating proper ntp intervals when switching back and forth between clocksources. This patch separates the current mult value into a mult and mult_orig pair. The mult_orig value stays constant, while the ntp clocksource adjustments are done only to the mult value. This allows for correct ntp interval calculation and additionally lays the groundwork for a new notion of time, what I'm calling the monotonic-raw time, which is introduced in a following patch. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:23 +02:00
Peter Zijlstra	b845b517b5	printk: robustify printk Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd wakeup by polling from the timer tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-11 13:46:53 +02:00
Ingo Molnar	e4e4e534fa	sched clock: revert various sched_clock() changes Found an interactivity problem on a quad core test-system - simple CPU loops would occasionally delay the system un an unacceptable way. After much debugging with Peter Zijlstra it turned out that the problem is caused by the string of sched_clock() changes - they caused the CPU clock to jump backwards a bit - which confuses the scheduler arithmetics. (which is unsigned for performance reasons) So revert: # c300ba2: sched_clock: and multiplier for TSC to gtod drift # c0c8773: sched_clock: only update deltas with local reads. # af52a90: sched_clock: stop maximum check on NO HZ # f7cce27: sched_clock: widen the max and min time This solves the interactivity problems. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>	2008-07-31 17:20:29 +02:00

1 2 3 4 5 ...

328 Commits