Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.
In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.
For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.
If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.
This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.
The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.
Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count. I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case. Unfortunately this means you must special case the
rcu access case.
When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e. tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.
What that means is that the proc code that looks up a task and later
wants to sleep can now do:
rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
get_task_struct(task);
}
rcu_read_unlock();
The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.
Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called. Tasks can only
be looked up via rcu before release_task is called. All rcu protected
members of task_struct are freed by release_task.
Therefore we can move call_rcu from put_task_struct into release_task. And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.
The end result:
- get_task_struct is safe in an rcu context where we have just looked
up the task.
- put_task_struct() simplifies into its old pre rcu self.
This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
proc_check_chroot() does the check in a very unintuitive way (keeping a
copy of the argument, then modifying the argument), and has uncommented
sideeffects.
Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just got nuked in mainline. Bring it back because Eric's patches use it.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The core problem: setsid fails if it is called by init. The effect in 2.6.16
and the earlier kernels that have this problem is that if you do a "ps -j 1 or
ps -ej 1" you will see that init and several of it's children have process
group and session == 0. Instead of process group == session == 1. Despite
init calling setsid.
The reason it fails is that daemonize calls set_special_pids(1,1) on kernel
threads that are launched before /sbin/init is called.
The only remaining effect in that current->signal->leader == 0 for init
instead of 1. And the setsid call fails. No one has noticed because
/sbin/init does not check the return value of setsid.
In 2.4 where we don't have the pidhash table, and daemonize doesn't exist
setsid actually works for init.
I care a lot about pid == 1 not being a special case that we leave broken,
because of the container/jail work that I am doing.
- Carefully allow init (pid == 1) to call setsid despite the kernel using
its session.
- Use find_task_by_pid instead of find_pid because find_pid taking a
pidtype is going away.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The futex timeval is not checked for correctness. The change does not
break existing applications as the timeval is supplied by glibc (and glibc
always passes a correct value), but the glibc-internal tests for this
functionality fail.
Signed-off-by: Thomas Gleixner <tglx@tglx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To increase the strength of SCHED_BATCH as a scheduling hint we can
activate batch tasks on the expired array since by definition they are
latency insensitive tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On runqueue time is used to elevate priority in schedule().
In the code it currently requeues tasks even if their priority is not
elevated, which would end up placing them at the end of their runqueue
array effectively delaying them instead of improving their priority.
Bug spotted by Mike Galbraith <efault@gmx.de>
This patch removes this requeueing.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tasks waiting in SLEEP_NONINTERACTIVE state can now get to best priority so
they need to be included in the idle detection code.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We watch for tasks that sleep extended periods and don't allow one single
prolonged sleep period from elevating priority to maximum bonus to prevent cpu
bound tasks from getting high priority with single long sleeps. There is a
bug in the current code that also penalises tasks that already have high
priority. Correct that bug.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alterations to the pipe code in the kernel made it possible for relative
starvation to occur with tasks that slept waiting on a pipe getting unfair
priority bonuses even if they were otherwise fully cpu bound so the
TASK_NONINTERACTIVE flag was introduced which prevented any change to
sleep_avg while sleeping waiting on a pipe. This change also leads to the
converse though, preventing any priority boost from occurring in truly
interactive tasks that wait on pipes.
Convert the TASK_NONINTERACTIVE flag to set sleep_type to SLEEP_NONINTERACTIVE
which will allow a linear bonus to priority based on sleep time thus allowing
interactive tasks to get high priority if they sleep enough.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The activated flag in task_struct is used to track different sleep types and
its usage is somewhat obfuscated. Convert the variable to an enum with more
descriptive names without altering the function.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, count_active_tasks() calls both nr_running() &
nr_interruptible(). Each of these functions does a "for_each_cpu" & reads
values from the runqueue of each cpu. Although this is not a lot of
instructions, each runqueue may be located on different node. Depending on
the architecture, a unique TLB entry may be required to access each
runqueue.
Since there may be more runqueues than cpu TLB entries, a scan of all
runqueues can trash the TLB. Each memory reference incurs a TLB miss &
refill.
In addition, the runqueue cacheline that contains nr_running &
nr_uninterruptible may be evicted from the cache between the two passes.
This causes unnecessary cache misses.
Combining nr_running() & nr_interruptible() into a single function
substantially reduces the TLB & cache misses on large systems. This should
have no measureable effect on smaller systems.
On a 128p IA64 system running a memory stress workload, the new function
reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It seems that run_hrtimer_queue() is calling get_softirq_time() more
often than it needs to.
With this patch, it only calls get_softirq_time() if there's a
pending timer.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The removal of the data field in the hrtimer structure enforces the
embedding of the timer into another data structure. nanosleep now uses a
private implementation of the most common used timer callback function
(simple task wakeup).
In order to avoid the reimplentation of such functionality all over the
place a generic hrtimer_sleeper functionality is created.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an LED trigger for IDE disk activity to the ide-disk driver.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ensure ide-taskfile.c calls any driver specific end_request function if
present.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds LED drivers for LEDs found on the Sharp Zaurus c6000 model (tosa).
Signed-off-by: Dirk Opfer <dirk@opfer-online.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NEW_LEDS support for ixp4xx boards where LEDs are connected to the GPIO lines.
This includes a new generic ixp4xx driver (leds-ixp4xx-gpio.c name
"IXP4XX-GPIO-LED")
Signed-off-by: John Bowler <jbowler@acm.org>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds an LED driver for LEDs exported by the Sharp LOCOMO chip as found on some
models of Sharp Zaurus.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds LED drivers for LEDs found on the Sharp Zaurus c7x0 (corgi, shepherd,
husky) and cxx00 (akita, spitz, borzoi) models.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an LED trigger for the charger status as found on the Sharp Zaurus series
of devices.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an example of a complex LED trigger in the form of a generic timer which
triggers the LED its attached to at a user specified frequency and duty cycle.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for LED triggers to the LED subsystem. "Triggers" are events
which change the state of an LED. Two kinds of trigger are available, simple
ones which can be added to exising code with minimum disruption and complex
ones for implementing new or more complex functionality.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the foundations of a new LEDs subsystem. This patch adds a class which
presents LED devices within sysfs and allows their brightness to be
controlled.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The LED class/subsystem takes John Lenz's work and extends and alters it to
give what I think should be a fairly universal LED implementation.
The series consists of several logical units:
* LED Core + Class implementation
* LED Trigger Core implementation
* LED timer trigger (example of a complex trigger)
* LED device drivers for corgi, spitz and tosa Zaurus models
* LED device driver for locomo LEDs
* LED device driver for ARM ixp4xx LEDs
* Zaurus charging LED trigger
* IDE disk activity LED trigger
* NAND MTD activity LED trigger
Why?
====
LEDs are really simple devices usually amounting to a GPIO that can be turned
on and off so why do we need all this code? On handheld or embedded devices
they're an important part of an often limited user interface. Both users and
developers want to be able to control and configure what the LED does and the
number of different things they'd potentially want the LED to show is large.
A subsystem is needed to try and provide all this different functionality in
an architecture independent, simple but complete, generic and scalable manner.
The alternative is for everyone to implement just what they need hidden away
in different corners of the kernel source tree and to provide an inconsistent
interface to userspace.
Other Implementations
=====================
I'm aware of the existing arm led implementation. Currently the new subsystem
and the arm code can coexist quite happily. Its up to the arm community to
decide whether this new interface is acceptable to them. As far as I can see,
the new interface can do everything the existing arm implementation can with
the advantage that the new code is architecture independent and much more
generic, configurable and scalable.
I'm prepared to make the conversion to the LED subsystem (or assist with it)
if appropriate.
Implementation Details
======================
I've stripped a lot of code out of John's original LED class. Colours were
removed as LED colour is now part of the device name. Multiple colours are to
be handled as multiple led devices. This means you get full control over each
colour. I also removed the LED hardware timer code as the generic timer isn't
going to add much overhead and is just as useful. I also decided to have the
LED core track the current LED status (to ease suspend/resume handling)
removing the need for brightness_get implementations in the LED drivers.
An underlying design philosophy is simplicity. The aim is to keep a small
amount of code giving as much functionality as possible.
The major new idea is the led "trigger". A trigger is a source of led events.
Triggers can either be simple or complex. A simple trigger isn't
configurable and is designed to slot into existing subsystems with minimal
additional code. Examples are the ide-disk, nand-disk and zaurus-charging
triggers. With leds disabled, the code optimises away. Examples are
nand-disk and ide-disk.
Complex triggers whilst available to all LEDs have LED specific parameters and
work on a per LED basis. The timer trigger is an example.
You can change triggers in a similar manner to the way an IO scheduler is
chosen (via /sys/class/leds/somedevice/trigger).
So far there are only a handful of examples but it should easy to add further
LED triggers without too much interference into other subsystems.
Known Issues
============
The LED Trigger core cannot be a module as the simple trigger functions would
cause nightmare dependency issues. I see this as a minor issue compared to
the benefits the simple trigger functionality brings. The rest of the LED
subsystem can be modular.
Some leds can be programmed to flash in hardware. As this isn't a generic LED
device property, I think this should be exported as a device specific sysfs
attribute rather than part of the class if this functionality is required (eg.
to keep the led flashing whilst the device is suspended).
Future Development
==================
At the moment, a trigger can't be created specifically for a single LED.
There are a number of cases where a trigger might only be mappable to a
particular LED. The addition of triggers provided by the LED driver should
cover this option and be possible to add without breaking the current
interface.
A CPU activity trigger similar to that found in the arm led implementation
should be trivial to add.
This patch:
Add some brief documentation of the design decisions behind the LED class and
how it appears to users.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One of the LEDs driver files wants to use this.
Probably drivers/mtd/maps/ipaq-flash.c wants to convert as well - right now
it'll be tainting the kernel.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Bowler <jbowler@acm.org>
Cc: "'Richard Purdie'" <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add TIOCL_GETKMSGREDIRECT needed by the userland suspend tool to get the
current value of kmsg_redirect from the kernel so that it can save it and
restore it after resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Delete two useless kmalloc wrappers and use kmalloc/kzalloc. Some weird
NULL checks are also simplified.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_flock() currently has a race which can result in a double free in the
multi-thread case.
Thread 1 Thread 2
sys_flock(file, LOCK_EX)
sys_flock(file, LOCK_UN)
If Thread 2 removes the lock from inode->i_lock before Thread 1 tests for
list_empty(&lock->fl_link) at the end of sys_flock, then both threads will
end up calling locks_free_lock for the same lock.
Fix is to make flock_lock_file() do the same as posix_lock_file(), namely
to make a copy of the request, so that the caller can always free the lock.
This also has the side-effect of fixing up a reference problem in the
lockd handling of flock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IN_DELETE events are no longer generated for the removal of a file from a
watched directory.
This seems to be a result of clearing DCACHE_INOTIFY_PARENT_WATCHED in
d_delete() directly before calling fsnotify_nameremove().
Assuming the flag doesn't need to be cleared before dentry_iput(), this
should do the trick.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since these names on old MSDOS is used as device, so, current fat driver
doesn't allow a user to create those names. But many OSes and even Windows
can create those names actually, now.
This patch removes the reserved name check.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix memory migration so that it works regardless of what cpuset the invoking
task is in.
If a task invoked a memory migration, by doing one of:
1) writing a different nodemask to a cpuset 'mems' file, or
2) writing a tasks pid to a different cpuset's 'tasks' file,
where the cpuset had its 'memory_migrate' option turned on, then the
allocation of the new pages for the migrated task(s) was constrained
by the invoking tasks cpuset.
If this task wasn't in a cpuset that allowed the requested memory nodes, the
memory migration would happen to some other nodes that were in that invoking
tasks cpuset. This was usually surprising and puzzling behaviour: Why didn't
the pages move? Why did the pages move -there-?
To fix this, temporarilly change the invoking tasks 'mems_allowed' task_struct
field to the nodes the migrating tasks is moving to, so that new pages can be
allocated there.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix unsafe reference to a tasks mm struct, by moving the reference inside of a
convenient nearby properly guarded code block.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix cpuset comment involving case of a tasks cpuset pointer being NULL.
Thanks to "the_top_cpuset_hack", this code no longer sees NULL task->cpuset
pointers.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
local_t's were defined to be unsigned. This increases confusion because
atomic_t's are signed. The patch goes through and changes all implementations
to use signed longs throughout.
Also, x86-64 was using 32-bit quantities for the value passed into local_add()
and local_sub(). Fixed.
All (actually, both) existing users have been audited.
(Also s/__inline__/inline/ in x86_64/local.h)
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The "3c59x: use mii_check_media" patch introduced a netif_carrier_off in
vortex_up. 10base2 stoped working because of this. This is removed.
Tx/Rx reset is back in vortex_up because the 3c900B-Combo stops working after
changing from half duplex to full duplex when Tx/Rx reset is done with
vortex_timer.
Also brought back some mii stuff to be sure that it does not break something
else.
Thanks to Pete Clements <clem@clem.clem-digital.net> for reporting and testing.
Signed-off-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pre-2.6.16 patch "3c59x collision statistics fix" accidentally caused
vortex_error() to not run iowrite16(TxEnable, ioaddr + EL3_CMD) if we got a
maxCollisions interrupt but MAX_COLLISION_RESET is not set.
Thanks to Pete Clements <clem@clem.clem-digital.net> for reporting and testing.
Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The help text says that if you select CONFIG_LBD, then it will automatically
select CONFIG_LFS. That isn't currently the case, so update the text.
- Get rid of the cruft in the help text mentioning CONFIG_LBD
- Tell unsure users to select CONFIG_LFS.
- Remove the `default n'.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed a bug on the process accounting facility. In multi-threading
process, some data would be recorded incorrectly when the group_leader dies
earlier than one or more threads. The attached patch fixes this problem.
See below. 'bugacct' is a test program that create a worker thread after 4
seconds sleeping, then the group_leader dies soon. The worker thread
consume CPU/Memory for 6 seconds, then exit. We can estimate 10 seconds as
etime and 6 seconds as stime + utime. This is a sample program which the
group_leader dies earlier than other threads.
The results of same binary execution on different kernel are below.
-- accounted records --------------------
| btime | utime | stime | etime | minflt | majflt | comm |
original | 13:16:40 | 0.00 | 0.00 | 6.10 | 171 | 0 | bugacct |
patched | 13:20:21 | 5.83 | 0.18 | 10.03 | 32776 | 0 | bugacct |
(*) bugacct allocates 128MB memory, thus 128MB / 4KB = 32768 of minflt is
appropriate.
-- Test results in original kernel ------
$ date; time -p ./bugacct
Tue Mar 28 13:16:36 JST 2006 <- But pacct said btime is 13:16:40
real 10.11 <- But pacct said etime is 6.10
user 5.96 <- But pacct said utime is 0.00
sys 0.14 <- But pacct said stime is 0.00
$
-- Test results in patched kernel -------
$ date; time -p ./bugacct
Tue Mar 28 13:20:21 JST 2006
real 10.04
user 5.83
sys 0.19
$
In the original 2.6.16 kernel, pacct records btime, utime, stime, etime and
minflt incorrectly. In my opinion, this problem is caused by an assumption
that group_leader dies last.
The following section calculates process running time for etime and btime.
But it means running time of the thread that dies last, not process. The
start_time of the first thread in the process (group_leader) should be
reduced from uptime to calculate etime and btime correctly.
---- do_acct_process() in kernel/acct.c:
/* calculate run_time in nsec*/
do_posix_clock_monotonic_gettime(&uptime);
run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC
+ current->start_time.tv_nsec;
----
The following section calculates stime and utime of the process.
But it might count the utime and stime of the group_leader duplicatly
and ignore the utime and stime of the thread dies last, when one or
more threads remain after group_leader dead.
The ac_utime should be calculated as the sum of the signal->utime
and utime of the thread dies last. The ac_stime should be done also.
---- do_acct_process() in kernel/acct.c:
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime,
current->signal->utime));
ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime,
current->signal->stime));
ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
----
The part of the minflt/majflt calculation has same problem.
This patch solves those problems, I think.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the remaining semaphores to mutexes in the IPMI driver. The
watchdog was using a semaphore as a real semaphore (for IPC), so the
conversion there required adding a completion.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tidy up various coding standard things, mostly removing the space after !,
but also break some long lines and fix a few other spacing inconsistencies.
Also fixes some bad error reporting when deleting an IPMI user.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Matt Domsch noticed a startup race with the IPMI kernel thread, it was
possible (though extraordinarly unlikely) that a message could come in
before the upper layer was ready to handle it. This patch splits the
startup processing of an IPMI interface into two parts, one to get ready
and one to actually start the processes to receive messages from the
interface.
[akpm@osdl.org: cleanups]
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan sayeth "Based on Linus original comments about _GPL we should export
tty_insert_flip_char as EXPORT_SYMBOL because it used to be EXPORT_SYMBOL
equivalent (trivial inline). The other features are new extensions are were
not available to drivers before so need not be provided except as _GPL
functionality as far as I can see."
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=6294
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Philippe Vouters <Philippe.Vouters@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
EDAC_752X uses pci_scan_single_device(), which is only available if
CONFIG_HOTPLUG is enabled, so limit this driver with HOTPLUG.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Dave Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The boot cmdline is parsed in parse_early_param() and
parse_args(,unknown_bootoption).
And __setup() is used in obsolete_checksetup().
start_kernel()
-> parse_args()
-> unknown_bootoption()
-> obsolete_checksetup()
If __setup()'s callback (->setup_func()) returns 1 in
obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
handled.
If ->setup_func() returns 0, obsolete_checksetup() tries other
->setup_func(). If all ->setup_func() that matched a parameter returns 0,
a parameter is seted to argv_init[].
Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
If the app doesn't ignore those arguments, it will warning and exit.
This patch fixes a wrong usage of it, however fixes obvious one only.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>