OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	7d3b56ba37	Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...	2009-01-03 12:04:39 -08:00
Linus Torvalds	61420f59a5	Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting	2009-01-03 11:56:24 -08:00
Mike Travis	7eb1955336	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-03 18:53:31 +01:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Rusty Russell	5db0e1e9e0	cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ Impact: cleanup Simple replacement, now the _nr is redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@redhat.com>	2009-01-01 10:12:29 +10:30
Rusty Russell	6b954823c2	cpumask: convert kernel time functions Impact: Use new APIs Convert kernel/time functions to use struct cpumask *. Note the ugly bitmap declarations in tick-broadcast.c. These should be cpumask_var_t, but there was no obvious initialization function to put the alloc_cpumask_var() calls in. This was safe. (Eventually 'struct cpumask' will be undefined for CONFIG_CPUMASK_OFFSTACK, so we use a bitmap here to show we really mean it). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-01 10:12:25 +10:30
Martin Schwidefsky	79741dd357	[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Martin Schwidefsky	457533a7d3	[PATCH] fix scaled & unscaled cputime accounting The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Rusty Russell	2ca1a61583	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: arch/x86/kernel/io_apic.c	2008-12-31 23:05:57 +10:30
Thomas Gleixner	1c5745aa38	sched_clock: prevent scd->clock from moving backwards, take #2 Redo: 5b7dba4: sched_clock: prevent scd->clock from moving backwards which had to be reverted due to s2ram hangs: ca7e716: Revert "sched_clock: prevent scd->clock from moving backwards" ... this time with resume restoring GTOD later in the sequence taken into account as well. The "timekeeping_suspended" flag is not very nice but we cannot call into GTOD before it has been properly resumed and the scheduler will run very early in the resume sequence. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-31 09:53:21 +01:00
Sebastien Dugue	5762ba1873	hrtimers: allow the hot-unplugging of all cpus Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-30 07:37:19 +01:00
Ingo Molnar	32e8d18683	Merge branches 'timers/clocksource', 'timers/hpet', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/rtc' into timers/core	2008-12-25 18:02:25 +01:00
Rusty Russell	968ea6d80e	Merge ../linux-2.6-x86 Conflicts: arch/x86/kernel/io_apic.c kernel/sched.c kernel/sched_stats.h	2008-12-13 21:55:51 +10:30
Rusty Russell	320ab2b0b1	cpumask: convert struct clock_event_device to cpumask pointers. Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>	2008-12-13 21:20:26 +10:30
Rusty Russell	0de26520c7	cpumask: make irq_set_affinity() take a const struct cpumask Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	2008-12-13 21:20:26 +10:30
Woodruff, Richard	001474491f	nohz: suppress needless timer reprogramming In my device I get many interrupts from a high speed USB device in a very short period of time. The system spends a lot of time reprogramming the hardware timer which is in a slower timing domain as compared to the CPU. This results in the CPU spending a huge amount of time waiting for the timer posting to be done. All of this reprogramming is useless as the wake up time has not changed. As measured using ETM trace this drops my reprogramming penalty from almost 60% CPU load down to 15% during high interrupt rate. I can send traces to show this. Suppress setting of duplicate timer event when timer already stopped. Timer programming can be very costly and can result in long cpu stall/wait times. [akpm@linux-foundation.org: coding-style fixes] [tglx@linutronix.de: move the check to the right place and avoid raising the softirq for nothing] Signed-off-by: Richard Woodruff <r-woodruff2@ti.com> Cc: johnstul@us.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 16:55:31 +01:00
Ingo Molnar	45ab6b0c76	Merge branch 'sched/core' into cpus4096 Conflicts: include/linux/ftrace.h kernel/sched.c	2008-12-12 13:48:57 +01:00
Heiko Carstens	fa116ea35e	nohz: no softirq pending warnings for offline cpus Impact: remove false positive warning After a cpu was taken down during cpu hotplug (read: disabled for interrupts) it still might have pending softirqs. However take_cpu_down makes sure that the idle task will run next instead of ksoftirqd on the taken down cpu. The idle task will call tick_nohz_stop_sched_tick which might warn about pending softirqs just before the cpu kills itself completely. However the pending softirqs on the dead cpu aren't a problem because they will be moved to an online cpu during CPU_DEAD handling. So make sure we warn only for online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 07:27:01 +01:00
john stultz	6c9bacb41c	time: catch xtime_nsec underflows and fix them Impact: fix time warp bug Alex Shi, along with Yanmin Zhang have been noticing occasional time inconsistencies recently. Through their great diagnosis, they found that the xtime_nsec value used in update_wall_time was occasionally going negative. After looking through the code for awhile, I realized we have the possibility for an underflow when three conditions are met in update_wall_time(): 1) We have accumulated a second's worth of nanoseconds, so we incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This doesn't cause xtime_nsec to go negative, but it can cause it to be small). 2) The remaining offset value is large, but just slightly less then cycle_interval. 3) clocksource_adjust() is speeding up the clock, causing a corrective amount (compensating for the increase in the multiplier being multiplied against the unaccumulated offset value) to be subtracted from xtime_nsec. This can cause xtime_nsec to underflow. Unfortunately, since we notify the NTP subsystem via second_overflow() whenever we accumulate a full second, and this effects the error accumulation that has already occured, we cannot simply revert the accumulated second from xtime nor move the second accumulation to after the clocksource_adjust call without a change in behavior. This leaves us with (at least) two options: 1) Simply return from clocksource_adjust() without making a change if we notice the adjustment would cause xtime_nsec to go negative. This would work, but I'm concerned that if a large adjustment was needed (due to the error being large), it may be possible to get stuck with an ever increasing error that becomes too large to correct (since it may always force xtime_nsec negative). This may just be paranoia on my part. 2) Catch xtime_nsec if it is negative, then add back the amount its negative to both xtime_nsec and the error. This second method is consistent with how we've handled earlier rounding issues, and also has the benefit that the error being added is always in the oposite direction also always equal or smaller then the correction being applied. So the risk of a corner case where things get out of control is lessened. This patch fixes bug 11970, as tested by Yanmin Zhang http://bugzilla.kernel.org/show_bug.cgi?id=11970 Reported-by: alex.shi@intel.com Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-04 08:43:02 +01:00
Peter Zijlstra	ca109491f6	hrtimer: removing all ur callback modes Impact: cleanup, move all hrtimer processing into hardirq context This is an attempt at removing some of the hrtimer complexity by reducing the number of callback modes to 1. This means that all hrtimer callback functions will be ran from HARD-irq context. I went through all the 30 odd hrtimer callback functions in the kernel and saw only one that I'm not quite sure of, which is the one in net/can/bcm.c - hence I'm CC-ing the folks responsible for that code. Furthermore, the hrtimer core now calls callbacks directly with IRQs disabled in case you try to enqueue an expired timer. If this timer is a periodic timer (which should use hrtimer_forward() to advance its time) then it might be possible to end up in an inf. recursive loop due to the fact that hrtimer_forward() doesn't round up to the next timer granularity, and therefore keeps on calling the callback - obviously this needs a fix. Aside from that, this seems to compile and actually boot on my dual core test box - although I'm sure there are some bugs in, me not hitting any makes me certain :-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-25 15:45:46 +01:00
Rusty Russell	6a7b3dc344	sched: convert nohz_cpu_mask to cpumask_var_t. Impact: (future) size reduction for large NR_CPUS. Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-24 17:51:10 +01:00
Thomas Gleixner	ae99286b4f	nohz: disable tick_nohz_kick_tick() for now Impact: nohz powersavings and wakeup regression commit `fb02fbc14d` (NOHZ: restart tick device from irq_enter()) causes a serious wakeup regression. While the patch is correct it does not take into account that spurious wakeups happen on x86. A fix for this issue is available, but we just revert to the .27 behaviour and let long running softirqs screw themself. Disable it for now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-11-10 22:39:27 +01:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Thomas Gleixner	c4bd822e7b	NOHZ: fix thinko in the timer restart code path commit `fb02fbc14d` (NOHZ: restart tick device from irq_enter()) solves the problem of stale jiffies when long running softirqs happen in a long idle sleep period, but it has a major thinko in it: When the interrupt which came in _is_ the timer interrupt which should expire ts->sched_timer then we cancel and rearm the timer _before_ it gets expired in hrtimer_interrupt() to the next period. That means the call back function is not called. This game can go on for ever :( Prevent this by making sure to only rearm the timer when the expiry time is more than one tick_period away. Otherwise keep it running as it is either already expired or will expiry at the right point to update jiffies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Venkatesch Pallipadi <venkatesh.pallipadi@intel.com>	2008-10-21 20:53:24 +02:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Thomas Gleixner	870e2a2845	timer_list: add base address to clock base The base address of a (per cpu) clock base is a useful debug info. Add it and bump the version number of timer_lists. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Thomas Gleixner	c5b77a3d3a	timer_list: print cpu number of clockevents device The per cpu clock events device output of timer_list lacks an association of the device to the cpu which is annoying when looking at the output of /proc/timer_list from a 128 way system. Add the CPU number info and mark the broadcast device in the device list printout. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Thomas Gleixner	e67ef25a35	timer_list: print real timer address The current timer_list output prints the address of the on stack copy of the active hrtimer instead of the hrtimer itself. Print the address of the real timer instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 11:51:30 +02:00
Arjan van de Ven	651dab4264	Merge commit 'linus/master' into merge-linus Conflicts: arch/x86/kvm/i8254.c	2008-10-17 09:20:26 -07:00
Thomas Gleixner	fb02fbc14d	NOHZ: restart tick device from irq_enter() We did not restart the tick device from irq_enter() to avoid double reprogramming and extra events in the return immediate to idle case. But long lasting softirqs can lead to a situation where jiffies become stale: idle() tick stopped (reprogrammed to next pending timer) halt() interrupt jiffies updated from irq_enter() interrupt handler softirq function 1 runs 20ms softirq function 2 arms a 10ms timer with a stale jiffies value jiffies updated from irq_exit() timer wheel has now an already expired timer (the one added in function 2) timer fires and timer softirq runs This was discovered when debugging a timer problem which happend only when the ath5k driver is active. The debugging proved that there is a softirq function running for more than 20ms, which is a bug by itself. To solve this we restart the tick timer right from irq_enter(), but do not go through the other functions which are necessary to return from idle when need_resched() is set. Reported-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Elias Oltmanns <eo@nebensachen.de>	2008-10-17 18:13:38 +02:00
Thomas Gleixner	c34bec5a44	NOHZ: split tick_nohz_restart_sched_tick() Split out the clock event device reprogramming. Preparatory patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-17 18:13:38 +02:00
Thomas Gleixner	719254faa1	NOHZ: unify the nohz function calls in irq_enter() We have two separate nohz function calls in irq_enter() for no good reason. Just call a single NOHZ function from irq_enter() and call the bits in the tick code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-17 18:13:38 +02:00
Linus Torvalds	e533b22705	Merge branch 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails softirq, warning fix: correct a format to avoid a warning softirqs, debug: preemption check x86, pci-hotplug, calgary / rio: fix EBDA ioremap() IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description dmi scan: warn about too early calls to dmi_check_system() generic: redefine resource_size_t as phys_addr_t generic: make PFN_PHYS explicitly return phys_addr_t generic: add phys_addr_t for holding physical addresses softirq: allocate less vectors IO resources: fix/remove printk printk: robustify printk, update comment printk: robustify printk, fix #2 printk: robustify printk, fix printk: robustify printk Fixed up conflicts in: arch/powerpc/include/asm/types.h arch/powerpc/platforms/Kconfig.cputype manually.	2008-10-16 15:17:40 -07:00
Jan Beulich	9ba16087d9	Kconfig: eliminate "def_bool n" constructs Using "def_bool n" is pointless, simply using bool here appears more appropriate. Further, retaining such options that don't have a prompt and aren't selected by anything seems also at least questionable. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:31 -07:00
Ingo Molnar	6b2ada8210	Merge branches 'core/softlockup', 'core/softirq', 'core/resources', 'core/printk' and 'core/misc' into core-v28-for-linus	2008-10-15 12:48:44 +02:00
venkatesh.pallipadi@intel.com	8083e4ad97	[CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor export get_cpu_idle_time_us() for it to be used in ondemand governor. Last update time can be current time when the CPU is currently non-idle, accounting for the busy time since last idle. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>	2008-10-09 13:52:44 -04:00
Thomas Gleixner	07454bfff1	clockevents: check broadcast tick device not the clock events device Impact: jiffies increment too fast. Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get incremented too fast. The reason is a wrong check in the broadcast enter/exit code, which keeps the local apic timer in periodic mode when the switch happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-04 10:51:07 +02:00
Thomas Gleixner	ccc7dadf73	hrtimer: prevent migration of per CPU hrtimers Impact: per CPU hrtimers can be migrated from a dead CPU The hrtimer code has no knowledge about per CPU timers, but we need to prevent the migration of such timers and warn when such a timer is active at migration time. Explicitely mark the timers as per CPU and use a more understandable mode descriptor for the interrupts safe unlocked callback mode, which is used by hrtimer_sleeper and the scheduler code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-29 17:09:14 +02:00
Roman Zippel	d40e944c25	ntp: improve adjtimex frequency rounding Change PPM_SCALE_INV_SHIFT so that it doesn't throw away any input bits (19 is the amount of the factor 2 in PPM_SCALE), the output frequency can then be calculated back to its input value, as the inverse divide produce a slightly larger value, which is then correctly rounded by the final shift. Reported-by: Martin Ziegler <ziegler@uni-freiburg.de> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:13 +02:00
Roman Zippel	5cd1c9c5cf	timekeeping: fix rounding problem during clock update Due to a rounding problem during a clock update it's possible for readers to observe the clock jumping back by 1nsec. The following simplified example demonstrates the problem: cycle xtime 0 0 1000 999999.6 2000 1999999.2 3000 2999998.8 ... 1500 = 1499999.4 = 0.0 + 1499999.4 = 999999.6 + 499999.8 When reading the clock only the full nanosecond part is used, while timekeeping internally keeps nanosecond fractions. If the clock is now updated at cycle 1500 here, a nanosecond is missing due to the truncation. The simple fix is to round up the xtime value during the update, this also changes the distance to the reference time, but the adjustment will automatically take care that it stays under control. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:13 +02:00
Maciej W. Rozycki	eb3f938fd6	ntp: let update_persistent_clock() sleep This is a change that makes the 11-minute RTC update be run in the process context. This is so that update_persistent_clock() can sleep, which may be required for certain types of RTC hardware -- most notably I2C devices. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Rik van Riel <riel@redhat.com> Cc: David Brownell <david-b@pacbell.net> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-24 17:33:12 +02:00
Ingo Molnar	f8e256c687	timers: fix build error in !oneshot case kernel/time/tick-common.c: In function ‘tick_setup_periodic’: kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’ Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-23 12:57:00 +02:00
Thomas Gleixner	27ce4cb4a0	clockevents: prevent mode mismatch on cpu online Impact: timer hang on CPU online observed on AMD C1E systems When a CPU is brought online then the broadcast machinery can be in the one shot state already. Check this and setup the timer device of the new CPU in one shot mode so the broadcast code can pick up the next_event value correctly. Another AMD C1E oddity, as we switch to broadcast immediately and not after the full bring up via the ACPI cpu idle code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	302745699c	clockevents: check broadcast device not tick device Impact: Possible hang on CPU online observed on AMD C1E machines. The broadcast setup code looks at the mode of the tick device to determine whether it needs to be shut down or setup. This is wrong when the broadcast mode is set to one shot already. This can happen when a CPU is brought online as it goes through the periodic setup first. The problem went unnoticed as sane systems do not call into that code before the switch to one shot for the clock event device happens. The AMD C1E idle routine switches over immediately and thereby shuts down the just setup device before the first interrupt happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	49d670fb8d	clockevents: prevent stale tick_next_period for onlining CPUs Impact: possible hang on CPU onlining in timer one shot mode. The tick_next_period variable is only used during boot on nohz/highres enabled systems, but for CPU onlining it needs to be maintained when the per cpu clock events device operates in one shot mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:53 +02:00
Thomas Gleixner	6441402b1f	clockevents: prevent cpu online to interfere with nohz Impact: rare hang which can be triggered on CPU online. tick_do_timer_cpu keeps track of the CPU which updates jiffies via do_timer. The value -1 is used to signal, that currently no CPU is doing this. There are two cases, where the variable can have this state: boot: necessary for systems where the boot cpu id can be != 0 nohz long idle sleep: When the CPU which did the jiffies update last goes into a long idle sleep it drops the update jiffies duty so another CPU which is not idle can pick it up and keep jiffies going. Using the same value for both situations is wrong, as the CPU online code can see the -1 state when the timer of the newly onlined CPU is setup. The setup for a newly onlined CPU goes through periodic mode and can pick up the do_timer duty without being aware of the nohz / highres mode of the already running system. Use two separate states and make them constants to avoid magic numbers confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-23 11:38:52 +02:00
Thomas Gleixner	2344abbcbd	clockevents: make device shutdown robust The device shut down does not cleanup the next_event variable of the clock event device. So when the device is reactivated the possible stale next_event value can prevent the device to be reprogrammed as it claims to wait on a event already. This is the root cause of the resurfacing suspend/resume problem, where systems need key press to come back to life. Fix this by setting next_event to KTIME_MAX when the device is shut down. Use a separate function for shutdown which takes care of that and only keep the direct set mode call in the broadcast code, where we can not touch the next_event value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-16 13:47:02 -07:00
Thomas Gleixner	61c22c34c6	clockevents: remove WARN_ON which was used to gather information The issue of the endless reprogramming loop due to a too small min_delta_ns was fixed with the previous updates of the clock events code, but we had no information about the spread of this problem. I added a WARN_ON to get automated information via kerneloops.org and to get some direct reports, which allowed me to analyse the affected machines. The WARN_ON has served its purpose and would be annoying for a release kernel. Remove it and just keep the information about the increase of the min_delta_ns value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-09 22:20:01 +02:00
Arjan van de Ven	704af52bd1	hrtimer: show the timer ranges in /proc/timer_list to help debugging and visibility of timer ranges, show them in the existing timer list in /proc/timer_list Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-07 16:10:20 -07:00
Linus Torvalds	f532522565	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource, acpi_pm.c: check for monotonicity clocksource, acpi_pm.c: use proper read function also in errata mode ntp: fix calculation of the next jiffie to trigger RTC sync x86: HPET: read back compare register before reading counter x86: HPET fix moronic 32/64bit thinko clockevents: broadcast fixup possible waiters HPET: make minimum reprogramming delta useful clockevents: prevent endless loop lockup clockevents: prevent multiple init/shutdown clockevents: enforce reprogram in oneshot setup clockevents: prevent endless loop in periodic broadcast handler clockevents: prevent clockevent event_handler ending up handler_noop	2008-09-06 19:33:26 -07:00
Maciej W. Rozycki	4ff4b9e19a	ntp: fix calculation of the next jiffie to trigger RTC sync We have a bug in the calculation of the next jiffie to trigger the RTC synchronisation. The aim here is to run sync_cmos_clock() as close as possible to the middle of a second. Which means we want this function to be called less than or equal to half a jiffie away from when now.tv_nsec equals 5e8 (500000000). If this is not the case for a given call to the function, for this purpose instead of updating the RTC we calculate the offset in nanoseconds to the next point in time where now.tv_nsec will be equal 5e8. The calculated offset is then converted to jiffies as these are the unit used by the timer. Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode, where the resulting value is rounded up. As a result the range of now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2. As a result if for example sync_cmos_clock() happens to be called at the time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 + TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the same range of now.tv_nsec again. Similarly for cases offsetted by an integer multiple of TICK_NSEC. This change addresses the problem by subtracting TICK_NSEC / 2 from the nanosecond offset to the next point in time where now.tv_nsec will be equal 5e8, effectively shifting the following rounding in timespec_to_jiffies() so that it produces a rounded-to-nearest result. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 15:31:48 +02:00
Ingo Molnar	77dd3b3bd2	Merge branch 'linus' into timers/ntp	2008-09-06 15:31:03 +02:00
Thomas Gleixner	7300711e8c	clockevents: broadcast fixup possible waiters Until the C1E patches arrived there where no users of periodic broadcast before switching to oneshot mode. Now we need to trigger a possible waiter for a periodic broadcast when switching to oneshot mode. Otherwise we can starve them for ever. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-06 07:21:17 +02:00
Arjan van de Ven	cc584b213f	hrtimer: convert kernel/* to the new hrtimer apis In order to be able to do range hrtimers we need to use accessor functions to the "expire" member of the hrtimer struct. This patch converts kernel/* to these accessors. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:13 -07:00
Peter Zijlstra	56c7426b39	sched_clock: fix NOHZ interaction If HLT stops the TSC, we'll fail to account idle time, thereby inflating the actual process times. Fix this by re-calibrating the clock against GTOD when leaving nohz mode. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 18:14:08 +02:00
Thomas Gleixner	1fb9b7d29d	clockevents: prevent endless loop lockup The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a too small value of the HPET minumum delta for programming an event. The clockevents code needs to enforce an interrupt event on the clock event device in some cases. The enforcement code was stupid and naive, as it just added the minimum delta to the current time and tried to reprogram the device. When the minimum delta is too small, then this loops forever. Add a sanity check. Allow reprogramming to fail 3 times, then print a warning and double the minimum delta value to make sure, that this does not happen again. Use the same function for both tick-oneshot and tick-broadcast code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:53 +02:00
Thomas Gleixner	9c17bcda99	clockevents: prevent multiple init/shutdown While chasing the C1E/HPET bugreports I went through the clock events code inch by inch and found that the broadcast device can be initialized and shutdown multiple times. Multiple shutdowns are not critical, but useless waste of time. Multiple initializations are simply broken. Another CPU might have the device in use already after the first initialization and the second init could just render it unusable again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	7205656ab4	clockevents: enforce reprogram in oneshot setup In tick_oneshot_setup we program the device to the given next_event, but we do not check the return value. We need to make sure that the device is programmed enforced so the interrupt handler engine starts working. Split out the reprogramming function from tick_program_event() and call it with the device, which was handed in to tick_setup_oneshot(). Set the force argument, so the devices is firing an interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	d4496b3955	clockevents: prevent endless loop in periodic broadcast handler The reprogramming of the periodic broadcast handler was broken, when the first programming returned -ETIME. The clockevents code stores the new expiry value in the clock events device next_event field only when the programming time has not been elapsed yet. The loop in question calculates the new expiry value from the next_event value and therefor never increases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Venkatesh Pallipadi	7c1e768974	clockevents: prevent clockevent event_handler ending up handler_noop There is a ordering related problem with clockevents code, due to which clockevents_register_device() called after tickless/highres switch will not work. The new clockevent ends up with clockevents_handle_noop as event handler, resulting in no timer activity. The problematic path seems to be * old device already has hrtimer_interrupt as the event_handler * new clockevent device registers with a higher rating * tick_check_new_device() is called * clockevents_exchange_device() gets called * old->event_handler is set to clockevents_handle_noop * tick_setup_device() is called for the new device * which sets new->event_handler using the old->event_handler which is noop. Change the ordering so that new device inherits the proper handler. This does not have any issue in normal case as most likely all the clockevent devices are setup before the highres switch. But, can potentially be affecting some corner case where HPET force detect happens after the highres switch. This was a problem with HPET in MSI mode code that we have been experimenting with. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Roman Zippel	916c7a8551	ntp: fix ADJ_OFFSET_SS_READ bug and do_adjtimex() cleanup Thanks to the review by Michael Kerrisk a bug in the recent ADJ_OFFSET_SS_READ option was discovered, where the ntp time_offset was inadvertently set by it. This fixes this by making the adjtime code more separate from the ntp_adjtime code (both of which really want to be separate syscalls). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 06:40:18 +02:00
Miao Xie	3c4fbe5e01	nohz: fix wrong event handler after online an offlined cpu On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after I made an offlined cpu online, I found this cpu's event handler was tick_handle_periodic, not tick_nohz_handler. After debuging, I found this bug was caused by the wrong tick mode. the tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline. This patch fixes this bug. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:54:06 +02:00
John Stultz	2d42244ae7	clocksource: introduce CLOCK_MONOTONIC_RAW In talking with Josip Loncaric, and his work on clock synchronization (see btime.sf.net), he mentioned that for really close synchronization, it is useful to have access to "hardware time", that is a notion of time that is not in any way adjusted by the clock slewing done to keep close time sync. Part of the issue is if we are using the kernel's ntp adjusted representation of time in order to measure how we should correct time, we can run into what Paul McKenney aptly described as "Painting a road using the lines we're painting as the guide". I had been thinking of a similar problem, and was trying to come up with a way to give users access to a purely hardware based time representation that avoided users having to know the underlying frequency and mask values needed to deal with the wide variety of possible underlying hardware counters. My solution is to introduce CLOCK_MONOTONIC_RAW. This exposes a nanosecond based time value, that increments starting at bootup and has no frequency adjustments made to it what so ever. The time is accessed from userspace via the posix_clock_gettime() syscall, passing CLOCK_MONOTONIC_RAW as the clock_id. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:24 +02:00
Roman Zippel	9a055117d3	clocksource: introduce clocksource_forward_now() To keep the raw monotonic patch simple first introduce clocksource_forward_now(), which takes care of the offset since the last update_wall_time() call and adds it to the clock, so there is no need anymore to deal with it explicitly at various places, which need to make significant changes to the clock. This is also gets rid of the timekeeping_suspend_nsecs, instead of waiting until resume, the value is accumulated during suspend. In the end there is only a single user of __get_nsec_offset() left, so I integrated it back to getnstimeofday(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:24 +02:00
John Stultz	1aa5dfb751	clocksource: keep track of original clocksource frequency The clocksource frequency is represented by clocksource->mult/2^(clocksource->shift). Currently, when NTP makes adjustments to the clock frequency, they are made directly to the mult value. This has the drawback that once changed, we cannot know what the orignal mult value was, or how much adjustment has been applied. This property causes problems in calculating proper ntp intervals when switching back and forth between clocksources. This patch separates the current mult value into a mult and mult_orig pair. The mult_orig value stays constant, while the ntp clocksource adjustments are done only to the mult value. This allows for correct ntp interval calculation and additionally lays the groundwork for a new notion of time, what I'm calling the monotonic-raw time, which is introduced in a following patch. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:50:23 +02:00
Peter Zijlstra	b845b517b5	printk: robustify printk Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd wakeup by polling from the timer tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-11 13:46:53 +02:00
Ingo Molnar	e4e4e534fa	sched clock: revert various sched_clock() changes Found an interactivity problem on a quad core test-system - simple CPU loops would occasionally delay the system un an unacceptable way. After much debugging with Peter Zijlstra it turned out that the problem is caused by the string of sched_clock() changes - they caused the CPU clock to jump backwards a bit - which confuses the scheduler arithmetics. (which is unsigned for performance reasons) So revert: # c300ba2: sched_clock: and multiplier for TSC to gtod drift # c0c8773: sched_clock: only update deltas with local reads. # af52a90: sched_clock: stop maximum check on NO HZ # f7cce27: sched_clock: widen the max and min time This solves the interactivity problems. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>	2008-07-31 17:20:29 +02:00
Mike Travis	0bc3cc03fa	cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu * Replace previous instances of the cpumask_of_cpu_ptr* macros with a the new (lvalue capable) generic cpumask_of_cpu(). Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 16:40:33 +02:00
Linus Torvalds	ecc8b655b3	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well nohz: prevent tick stop outside of the idle loop	2008-07-24 12:55:01 -07:00
Linus Torvalds	26dcce0fab	Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually	2008-07-23 18:37:44 -07:00
Linus Torvalds	d7b6de14a0	Merge branch 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softlockup: fix invalid proc_handler for softlockup_panic softlockup: fix watchdog task wakeup frequency softlockup: fix watchdog task wakeup frequency softlockup: show irqtrace softlockup: print a module list on being stuck softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds softlockup: fix softlockup_thresh fix softlockup: fix softlockup_thresh unaligned access and disable detection at runtime softlockup: allow panic on lockup	2008-07-23 18:34:13 -07:00
Andi Kleen	4a0b2b4dbe	sysdev: Pass the attribute to the low level sysdev show/store function This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:55:02 -07:00
Mike Travis	c18a41fbbc	cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c * Optimize various places where a pointer to the cpumask_of_cpu value will result in reducing stack pressure. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:59 +02:00
Ingo Molnar	9b610fda0d	Merge branch 'linus' into timers/nohz	2008-07-18 19:53:16 +02:00
Thomas Gleixner	b8f8c3cf0a	nohz: prevent tick stop outside of the idle loop Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-07-18 18:10:28 +02:00
Ingo Molnar	82638844d9	Merge branch 'linus' into cpus4096 Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 00:29:07 +02:00
Ingo Molnar	1e09481365	Merge branch 'linus' into core/softlockup Conflicts: kernel/softlockup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 23:12:58 +02:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Linus Torvalds	da6e88f496	Merge branch 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: add PCI ID for 6300ESB force hpet x86: add another PCI ID for ICH6 force-hpet kernel-paramaters: document pmtmr= command line option acpi_pm clccksource: fix printk format warning nohz: don't stop idle tick if softirqs are pending. pmtmr: allow command line override of ioport nohz: reduce jiffies polling overhead hrtimer: Remove unused variables in ktime_divns() hrtimer: remove warning in hres_timers_resume posix-timers: print RT watchdog message	2008-07-15 10:39:57 -07:00
Linus Torvalds	17489c058e	Merge branch 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits) sched_clock: and multiplier for TSC to gtod drift sched_clock: record TSC after gtod sched_clock: only update deltas with local reads. sched_clock: fix calculation of other CPU sched_clock: stop maximum check on NO HZ sched_clock: widen the max and min time sched_clock: record from last tick sched: fix accounting in task delay accounting & migration sched: add avg-overlap support to RT tasks sched: terminate newidle balancing once at least one task has moved over sched: fix warning sched: build fix sched: sched_clock_cpu() based cpu_clock(), lockdep fix sched: export cpu_clock sched: make sched_{rt,fair}.c ifdefs more readable sched: bias effective_load() error towards failing wake_affine(). sched: incremental effective_load() sched: correct wakeup weight calculations sched: fix mult overflow sched: update shares on wakeup ...	2008-07-14 13:54:49 -07:00
Steven Rostedt	af52a90a14	sched_clock: stop maximum check on NO HZ Working with ftrace I would get large jumps of 11 millisecs or more with the clock tracer. This killed the latencing timings of ftrace and also caused the irqoff self tests to fail. What was happening is with NO_HZ the idle would stop the jiffy counter and before the jiffy counter was updated the sched_clock would have a bad delta jiffies to compare with the gtod with the maximum. The jiffies would stop and the last sched_tick would record the last gtod. On wakeup, the sched clock update would compare the gtod + delta jiffies (which would be zero) and compare it to the TSC. The TSC would have correctly (with a stable TSC) moved forward several jiffies. But because the jiffies has not been updated yet the clock would be prevented from moving forward because it would appear that the TSC jumped too far ahead. The clock would then virtually stop, until the jiffies are updated. Then the next sched clock update would see that the clock was very much behind since the delta jiffies is now correct. This would then jump the clock forward by several jiffies. This caused ftrace to report several milliseconds of interrupts off latency at every resume from NO_HZ idle. This patch adds hooks into the nohz code to disable the checking of the maximum clock update when nohz is in effect. It resumes the max check when nohz has updated the jiffies again. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-11 15:53:26 +02:00
Heiko Carstens	857f3fd7a4	nohz: don't stop idle tick if softirqs are pending. In case a cpu goes idle but softirqs are pending only an error message is printed to the console. It may take a very long time until the pending softirqs will finally be executed. Worst case would be a hanging system. With this patch the timer tick just continues and the softirqs will be executed after the next interrupt. Still a delay but better than a hanging system. Currently we have at least two device drivers on s390 which under certain circumstances schedule a tasklet from process context. This is a reason why we can end up with pending softirqs when going idle. Fixing these drivers seems to be non-trivial. However there is no question that the drivers should be fixed. This patch shouldn't be considered as a bug fix. It just is intended to keep a system running even if device drivers are buggy. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Glauber <jan.glauber@de.ibm.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-11 11:17:04 +02:00
Thomas Gleixner	aa276e1caf	x86, clockevents: add C1E aware idle function C1E on AMD machines is like C3 but without control from the OS. Up to now we disabled the local apic timer for those machines as it stops when the CPU goes into C1E. This excludes those machines from high resolution timers / dynamic ticks, which hurts especially X2 based laptops. The current boot time C1E detection has another, more serious flaw as well: some BIOSes do not enable C1E until the ACPI processor module is loaded. This causes systems to stop working after that point. To work nicely with C1E enabled machines we use a separate idle function, which checks on idle entry whether C1E was enabled in the Interrupt Pending Message MSR. This allows us to do timer broadcasting for C1E and covers the late enablement of C1E as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-08 07:47:18 +02:00
Jens Axboe	8691e5a8f6	smp_call_function: get rid of the unused nonatomic/retry argument It's never used and the comments refer to nonatomic and retry interchangably. So get rid of it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:35 +02:00
Ingo Molnar	7a14ce1d8c	nohz: reduce jiffies polling overhead Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-05-30 14:16:10 +02:00
Ingo Molnar	02ff375590	softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds Fix (probably theoretical only) rq->clock update bug: in tick_nohz_update_jiffies() [which is called on all irq entry on all cpus where the irq entry hits an idle cpu] we call touch_softlockup_watchdog() before we update jiffies. That works fine most of the time when idle timeouts are within 60 seconds. But when an idle timeout is beyond 60 seconds, jiffies is updated with a jump of more than 60 seconds, which causes a jump in cpu-clock of more than 60 seconds, triggering a false positive. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-05-30 14:15:02 +02:00
Mike Travis	cad0e458d1	clocksource/events: use performance variant for_each_cpu_mask_nr Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-23 18:39:06 +02:00
Heiko Carstens	4f95f81a48	clocksource: allow read access to available/current_clocksource There is no harm, when users can read the info and we ask often enough during debugging for this kind of information. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-03 18:11:48 +02:00
Heiko Carstens	4359a023a8	clocksource: Fix permissions for available_clocksource File permissions for /sys/devices/system/clocksource/clocksource0/available_clocksource are 600 which allows write access. But this is in fact a read only file. So change permissions to 400. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-03 18:11:48 +02:00
Roman Zippel	7dffa3c673	ntp: handle leap second via timer Remove the leap second handling from second_overflow(), which doesn't have to check for it every second anymore. With CONFIG_NO_HZ this also makes sure the leap second is handled close to the full second. Additionally this makes it possible to abort a leap second properly by resetting the STA_INS/STA_DEL status bits. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:59 -07:00
Roman Zippel	8383c42399	ntp: remove current_tick_length() current_tick_length used to do a little more, but now it just returns tick_length, which we can also access directly at the few places, where it's needed. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:59 -07:00
Roman Zippel	7fc5c78409	ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFT As TICK_LENGTH_SHIFT is used for more than just the tick length, the name isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:59 -07:00
Roman Zippel	153b5d054a	ntp: support for TAI This adds support for setting the TAI value (International Atomic Time). The value is reported back to userspace via timex (as we don't have a ntp_gettime() syscall). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:59 -07:00
Roman Zippel	9f14f669d1	ntp: increase time_offset resolution time_offset is already a 64bit value but its resolution barely used, so this makes better use of it by replacing SHIFT_UPDATE with TICK_LENGTH_SHIFT. Side note: the SHIFT_HZ in SHIFT_UPDATE was incorrect for CONFIG_NO_HZ and the primary reason for changing time_offset to 64bit to avoid the overflow. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Roman Zippel	074b3b8794	ntp: increase time_freq resolution This changes time_freq to a 64bit value and makes it static (the only outside user had no real need to modify it). Intermediate values were already 64bit, so the change isn't that big, but it saves a little in shifts by replacing SHIFT_NSEC with TICK_LENGTH_SHIFT. PPM_SCALE is then used to convert between user space and kernel space representation. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Roman Zippel	eea83d896e	ntp: NTP4 user space bits update This adds a few more things from the ntp nanokernel related to user space. It's now possible to select the resolution used of some values via STA_NANO and the kernel reports in which mode it works (pll/fll). If some values for adjtimex() are outside the acceptable range, they are now simply normalized instead of letting the syscall fail. I removed MOD_CLKA/MOD_CLKB as the mapping didn't really makes any sense, the kernel doesn't support setting the clock. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Roman Zippel	ee9851b218	ntp: cleanup ntp.c This is mostly a style cleanup of ntp.c and extracts part of do_adjtimex as ntp_update_offset(). Otherwise the functionality is still the same as before. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Roman Zippel	f8bd2258e2	remove div_long_long_rem x86 is the only arch right now, which provides an optimized for div_long_long_rem and it has the downside that one has to be very careful that the divide doesn't overflow. The API is a little akward, as the arguments for the unsigned divide are signed. The signed version also doesn't handle a negative divisor and produces worse code on 64bit archs. There is little incentive to keep this API alive, so this converts the few users to the new API. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Roman Zippel	71abb3af62	convert a few do_div users This converts a few users of do_div to div_[su]64 and this demonstrates nicely how it can reduce some expressions to one-liners. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Denis V. Lunev	c33fff0afb	kernel: use non-racy method for proc entries creation Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:22 -07:00

1 2 3 4 5 ...

295 Commits