cwq->nr_active is used to keep track of how many work items are active
for the cpu workqueue, where 'active' is defined as either pending on
global worklist or executing. This is used to implement the
max_active limit and workqueue freezing. If a work item is queued
after nr_active has already reached max_active, the work item doesn't
increment nr_active and is put on the delayed queue and gets activated
later as previous active work items retire.
try_to_grab_pending() which is used in the cancellation path
unconditionally decremented nr_active whether the work item being
cancelled is currently active or delayed, so cancelling a delayed work
item makes nr_active underflow. This breaks max_active enforcement
and triggers BUG_ON() in destroy_workqueue() later on.
This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
is set while a work item in on the delayed list and making
try_to_grab_pending() decrement nr_active iff the work item is
currently active.
The addition of the flag enlarges cwq alignment to 256 bytes which is
getting a bit too large. It's scheduled to be reduced back to 128
bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
devel cycle.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
It is possible that the BIOS will not grant control of all _OSC
features requested via acpi_pci_osc_control_set(), so it is
recommended to negotiate the final set of _OSC features with the
query flag set before calling _OSC to request control of these
features.
To implement it, rework acpi_pci_osc_control_set() so that the caller
can specify the mask of _OSC control bits to negotiate and the mask
of _OSC control bits that are absolutely necessary to it. Then,
acpi_pci_osc_control_set() will run _OSC queries in a loop until
the mask of _OSC control bits returned by the BIOS is equal to the
mask passed to it. Also, before running the _OSC request
acpi_pci_osc_control_set() will check if the caller's required
control bits are present in the final mask.
Using this mechanism we will be able to avoid situations in which the
BIOS doesn't grant control of certain _OSC features, because they
depend on some other _OSC features that have not been requested.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the worklist is global, having works pending after wq
destruction can easily lead to oops and destroy_workqueue() have
several BUG_ON()s to catch these cases. Unfortunately, BUG_ON()
doesn't tell much about how the work became pending after the final
flush_workqueue().
This patch adds WQ_DYING which is set before the final flush begins.
If a work is requested to be queued on a dying workqueue,
WARN_ON_ONCE() is triggered and the request is ignored. This clearly
indicates which caller is trying to queue a work on a dying workqueue
and keeps the system working in most cases.
Locking rule comment is updated such that the 'I' rule includes
modifying the field from destruction path.
Signed-off-by: Tejun Heo <tj@kernel.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
kobject_uevent: fix typo in comments
firmware_class: fix typo in error path
kobject: Break the kobject namespace defs into their own header
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (29 commits)
ARM: imx: fix build failure concerning otg/ulpi
USB: ftdi_sio: add product ID for Lenz LI-USB
USB: adutux: fix misuse of return value of copy_to_user()
USB: iowarrior: fix misuse of return value of copy_to_user()
USB: xHCI: update ring dequeue pointer when process missed tds
USB: xhci: Remove buggy assignment in next_trb()
USB: ftdi_sio: Add ID for Ionics PlugComputer
USB: serial: io_ti.c: don't return 0 if writing the download record failed
USB: otg: twl4030: fix wrong assumption of starting state
USB: gadget: Return -ENOMEM on memory allocation failure
USB: gadget: fix composite kernel-doc warnings
USB: ssu100: set tty_flags in ssu100_process_packet
USB: ssu100: add disconnect function for ssu100
USB: serial: export symbol usb_serial_generic_disconnect
USB: ssu100: rework logic for TIOCMIWAIT
USB: ssu100: add register parameter to ssu100_setregister
USB: ssu100: remove duplicate #defines in ssu100
USB: ssu100: refine process_packet in ssu100
USB: ssu100: add locking for port private data in ssu100
USB: r8a66597-udc: return -ENOMEM if kzalloc() fails
...
Warning(include/linux/usb/composite.h:284): No description found for parameter 'disconnect'
Warning(drivers/usb/gadget/composite.c:744): No description found for parameter 'c'
Warning(drivers/usb/gadget/composite.c:744): Excess function parameter 'cdev' description in 'usb_string_ids_n'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
netfilter: fix CONFIG_COMPAT support
isdn/avm: fix build when PCMCIA is not enabled
header: fix broken headers for user space
e1000e: don't check for alternate MAC addr on parts that don't support it
e1000e: disable ASPM L1 on 82573
ll_temac: Fix poll implementation
netxen: fix a race in netxen_nic_get_stats()
qlnic: fix a race in qlcnic_get_stats()
irda: fix a race in irlan_eth_xmit()
net: sh_eth: remove unused variable
netxen: update version 4.0.74
netxen: fix inconsistent lock state
vlan: Match underlying dev carrier on vlan add
ibmveth: Fix opps during MTU change on an active device
ehea: Fix synchronization between HW and SW send queue
bnx2x: Update bnx2x version to 1.52.53-4
bnx2x: Fix PHY locking problem
rds: fix a leak of kernel memory
netlink: fix compat recvmsg
netfilter: fix userspace header warning
...
Break the kobject namespace defs into their own header to avoid a header file
inclusion ordering problem between linux/sysfs.h and linux/kobject.h.
This fixes the build breakage on older versions of gcc.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently drivers must do an elevator_exit() + elevator_init()
to switch IO schedulers. There are a few problems with this:
- Since commit 1abec4fdbb,
elevator_init() requires a zeroed out q->elevator
pointer. The two existing in-kernel users don't do that.
- It will only work at initialization time, since using the
above two-staged construct does not properly quisce the queue.
So add elevator_change() which takes care of this, and convert
the elv_iosched_store() sysfs interface to use this helper as well.
Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reported-by: Kevin Vigor <kevin@vigor.nu>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
__packed is only defined in kernel space, so we should use
__attribute__((packed)) for the code shared between kernel and user space.
Two __attribute() annotations are replaced with __attribute__() too.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.
Reported-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent modprobe and udev versions allow to create device nodes
for modules which are not loaded. Only the first access will cause
the in-kernel module loader to pull-in the module. Systems which
never access the device node will not needlessly load the module,
and no longer need init scripts or other facilities to unconditionally
load it.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Since handle_sysrq() does not take tty as argument anymore we can
drop it from usb_serial_handle_sysrq_char() as well.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Sysrq operations do not accept tty argument anymore so no need to pass
it to us.
[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
caused by sysrq using bool but not including linux/types.h]
[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
driver]
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
kfifo_skip() is currently broken, due to the missing of the internal
helper function. Add it.
Signed-off-by: Andrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because list_empty() does not dereference any RCU-protected pointers, and
further does not pass such pointers to the caller (so that the caller
does not dereference them either), it is safe to use list_empty() on
RCU-protected lists. There is no need for a list_empty_rcu(). This
commit adds a comment stating this explicitly.
Requested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The CONFIG_PREEMPT_RCU kernel configuration parameter was recently
re-introduced, but as an indication of the type of RCU (preemptible
vs. non-preemptible) instead of as selecting a given implementation.
This commit uses CONFIG_PREEMPT_RCU to combine duplicate code
from include/linux/rcutiny.h and include/linux/rcutree.h into
include/linux/rcupdate.h. This commit also combines a few other pieces
of duplicate code that have accumulated.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It is illegal to wait for an SRCU grace period while within the
corresponding flavor of SRCU read-side critical section. Therefore,
this commit updates the srcu_read_lock() docbook accordingly.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Combine the duplicate definitions of ULONG_CMP_GE(), ULONG_CMP_LT(),
and rcu_preempt_depth() into include/linux/rcupdate.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When using a kernel debugger, a long sojourn in the debugger can get
you lots of RCU CPU stall warnings once you resume. This might not be
helpful, especially if you are using the system console. This patch
therefore allows RCU CPU stall warnings to be suppressed, but only for
the duration of the current set of grace periods.
This differs from Jason's original patch in that it adds support for
tiny RCU and preemptible RCU, and uses a slightly different method for
suppressing the RCU CPU stall warning messages.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Jason Wessel <jason.wessel@windriver.com>
The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU. This commit provides a fix.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.
The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.
Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).
As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
workloads, CONFIG_TINY_RCU is even better.
CONFIG_TREE_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
6170 825 28 7023 kernel/rcutree.o
----
7026 Total
CONFIG_TINY_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
2081 81 8 2170 kernel/rcutiny.o
----
2183 Total
CONFIG_TINY_RCU (non-preemptible)
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
719 25 0 744 kernel/rcutiny.o
---
757 Total
Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Noone is using tty argument so let's get rid of it.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This adds annotations for RCU operations in core kernel components
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/pid.c:386 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by rc.sysinit/1102:
#0: (tasklist_lock){.+.+..}, at: [<c1048340>] sys_setpgid+0x40/0x160
stack backtrace:
Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
Call Trace:
[<c105e714>] lockdep_rcu_dereference+0x94/0xb0
[<c104b4cd>] find_task_by_pid_ns+0x6d/0x70
[<c104b4e8>] find_task_by_vpid+0x18/0x20
[<c1048347>] sys_setpgid+0x47/0x160
[<c1002b50>] sysenter_do_call+0x12/0x36
Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.
We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers. If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain. To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option. Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.
There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.
This patch also updates a number of docbook comments that were showing
their age.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christopher Li <sparse@chrisli.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
"make headers_check" issued the following warning:
CHECK include/linux/netfilter (64 files)
usr/include/linux/netfilter/xt_ipvs.h:19: found __[us]{8,16,32,64} type without #include <linux/types.h>
Fix this by as suggested including linux/types.h.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of hardcoding the number of contexts for the recursions
barriers, define a cpp constant to make the code more
self-explanatory.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Now that software events don't have interrupt disabled anymore in
the event path, callchains can nest on any context. So seperating
nmi and others contexts in two buffers has become racy.
Fix this by providing one buffer per nesting level. Given the size
of the callchain entries (2040 bytes * 4), we now need to allocate
them dynamically.
v2: Fixed put_callchain_entry call after recursion.
Fix the type of the recursion, it must be an array.
v3: Use a manual pr cpu allocation (temporary solution until NMIs
can safely access vmalloc'ed memory).
Do a better separation between callchain reference tracking and
allocation. Make the "put" path lockless for non-release cases.
v4: Protect the callchain buffers with rcu.
v5: Do the cpu buffers allocations node affine.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Miller <davem@davemloft.net>
Cc: Borislav Petkov <bp@amd64.org>
- Most archs use one callchain buffer per cpu, except x86 that needs
to deal with NMIs. Provide a default perf_callchain_buffer()
implementation that x86 overrides.
- Centralize all the kernel/user regs handling and invoke new arch
handlers from there: perf_callchain_user() / perf_callchain_kernel()
That avoid all the user_mode(), current->mm checks and so...
- Invert some parameters in perf_callchain_*() helpers: entry to the
left, regs to the right, following the traditional (dst, src).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
callchain_store() is the same on every archs, inline it in
perf_event.h and rename it to perf_callchain_store() to avoid
any collision.
This removes repetitive code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
spi.h: missing kernel-doc notation, please fix
of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
of: Fix missing includes
ata: update for of_device to platform_device replacement
microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
microblaze: Fix of/address: Merge all of the bus translation code
booting-without-of: Remove nonexistent chapters from TOC, fix numbering
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).
One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.
However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.
A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.
Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.
Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.
Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9
us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75
So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.
Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.
Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
lglock: introduce special lglock and brlock spin locks
This patch introduces "local-global" locks (lglocks). These can be used to:
- Provide fast exclusive access to per-CPU data, with exclusive access to
another CPU's data allowed but possibly subject to contention, and to provide
very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
very slow exclusive serialisation of data (not necessarily per-CPU data).
Brlocks are also implemented as a short-hand notation for the latter use
case.
Thanks to Paul for local/global naming convention.
Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: fs_struct rwlock to spinlock
struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.
Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.
Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.
In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Added comments in kernel-doc notation for previously added struct fields.
Signed-off-by: Ernst Schwab <eschwab@online.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* master.kernel.org:/home/rmk/linux-2.6-arm:
VIDEO: amba clcd: don't disable an already disabled clock
ARM: Tighten check for allowable CPSR values
ARM: 6329/1: wire up sys_accept4() on ARM
ARM: 6328/1: Build with -fno-dwarf2-cfi-asm
ARM: 6326/1: kgdb: fix GDB_MAX_REGS no longer used
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:
arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().
do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.
Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the clock enable/disable tracking in the AMBA CLCD driver so
that the driver doesn't try to disable an already disabled clock,
thereby causing the clock (if shared) to become unbalanced.
This resolves a problem with CLCD on LPC32xx ARM platforms.
Reported-by: Kevin Wells <wellsk40@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There is no longer any functional difference between
__debug_show_held_locks() and debug_show_held_locks(),
so remove the former.
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281021054-4228-1-git-send-email-jkacur@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
gcc-4.6: ACPI: fix unused but set variables in ACPI
ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS
ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS
ACPI processor: remove deprecated ACPI procfs I/F
ACPI power_resource: remove unused procfs I/F
ACPI: remove deprecated ACPI procfs I/F
ACPI: introduce drivers/acpi/sysfs.c
ACPI: introduce module parameter acpi.aml_debug_output
ACPI: introduce drivers/acpi/debugfs.c
ACPI, APEI, ERST debug support
ACPI, APEI, Manage GHES as platform devices
ACPI, APEI, Rename CPER and GHES severity constants
ACPI, APEI, Fix a typo of error path of apei_resources_request
ACPI / ACPICA: Fix reference counting problems with GPE handlers
ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device
ACPI / Sleep: Drop acpi_suspend_finish()
ACPI / Sleep: Consolidate suspend and hibernation routines
ACPI / Wakeup: Simplify enabling of wakeup devices
ACPI / Sleep: Rework enabling wakeup devices
ACPI / Sleep: Free NVS copy if suspending of devices fails
Fixed up totally buggered "ACPI: fix unused but set variables in ACPI"
patch that doesn't even compile in the merge.
Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the
breakage before I even pulled. And a big "Grrr.." at Len for not even
bothering to compile the tree before asking me to pull.
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/cleanup:
defconfig reduction
kbuild: drop unifdef-y support
archs: replace unifdef-y with header-y
include: replace unifdef-y with header-y
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: (22 commits)
hwmon: (via-cputemp) Remove bogus "SHOW" global variable
hwmon: jc42 depends on I2C
hwmon: (pc87427) Add a maintainer
hwmon: (pc87427) Move sysfs file removal to a separate function
hwmon: (pc87427) Add temperature monitoring support
hwmon: (pc87427) Add support for the second logical device
hwmon: (pc87427) Add support for manual fan speed control
hwmon: (pc87427) Minor style cleanups
hwmon: (pc87427) Handle disabled fan inputs properly
hwmon: (w83627ehf) Add support for W83667HG-B
hwmon: (w83627ehf) Driver cleanup
hwmon: Add driver for SMSC EMC2103 temperature monitor and fan controller
hwmon: Remove in[0-*]_fault from sysfs-interface
hwmon: Add 4 current alarm/beep attributes to sysfs-interface
hwmon: Add 3 critical limit attributes to sysfs-interface
hwmon: (asc7621) Clean up and improve detect function
hwmon: (it87) Export labels for internal sensors
hwmon: (lm75) Add suspend/resume feature
hwmon: (emc1403) Add power support
hwmon: (ltc4245) Expose all GPIO pins as analog voltages
...
unifdef-y and header-y has same semantic.
So there is no need to have both.
Drop the unifdef-y variant and sort all lines again
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Add support for exposing all GPIO pins as analog voltages. Though this is
not an ideal use of the chip, some hardware engineers may decide that the
LTC4245 meets their design requirements when studying the datasheet.
The GPIO pins are sampled in round-robin fashion, meaning that a slow
reader will see stale data. A userspace application can detect this,
because it will get -EAGAIN when reading from a sysfs file which contains
stale data.
Users can choose to use this feature on a per-chip basis by using either
platform data or the OF device tree (where applicable).
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6:
spi/amba_pl022: Fix probe and remove hook section annotations.
spi/mpc5121: change annotations for probe and remove functions
spi/bitbang: reinitialize transfer parameters for every message
spi/spi-gpio: add support for controllers without MISO or MOSI pin
spi/bitbang: add support for SPI_MASTER_NO_{TX, RX} modes
SPI100k: Fix 8-bit and RX-only transfers
spi/mmc_spi: mmc_spi adaptations for SPI bus locking API
spi/mmc_spi: SPI bus locking API, using mutex
Fix trivial conflict in drivers/spi/mpc512x_psc_spi.c due to 'struct
of_device' => 'struct platform_device' rename and __init/__exit to
__devinit/__devexit fix.
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:
(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.
(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allows the new PCI domain aware DRM code to compile on m68k.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The last user is gone, so we can safely remove this
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
This reverts commit 3bcf3860a4 (and the
accompanying commit c1e5c95402 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).
The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.
Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (33 commits)
dm mpath: support discard
dm stripe: support discards
dm: split discard requests on target boundaries
dm stripe: optimize sector division
dm stripe: move sector translation to a function
dm: error return error for discards
dm delay: support discard
dm: zero silently drop discards
dm: use dm_target_offset macro
dm: factor out max_io_len_target_boundary
dm: use common __issue_target_request for flush and discard support
dm: linear support discard
dm crypt: simplify crypt_ctr
dm crypt: simplify crypt_config destruction logic
dm: allow autoloading of dm mod
dm: rename map_info flush_request to target_request_nr
dm ioctl: refactor dm_table_complete
dm snapshot: implement merge
dm: do not initialise full request queue when bio based
dm ioctl: make bio or request based device type immutable
...
* 'for-linus' of git://neil.brown.name/md:
Further tidyup of raid6 naming in lib/raid6
Make lib/raid6/test build correctly.
Rename raid6 files now they're in a 'raid6' directory.
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c: I2C bus multiplexer driver pca954x
i2c: Multiplexed I2C bus core support
i2c: Use a separate mutex for userspace client lists
i2c: Make i2c_default_probe self-sufficient
i2c: Drop dummy variable
i2c: Move adapter locking helpers to i2c-core
V4L/DVB: Use custom I2C probing function mechanism
i2c: Add support for custom probe function
i2c-dev: Use memdup_user
i2c-dev: Remove unnecessary kmalloc casts
* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
param: don't deref arg in __same_type() checks
param: update drivers/acpi/debug.c to new scheme
param: use module_param in drivers/message/fusion/mptbase.c
ide: use module_param_named rather than module_param_call
param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
param: lock myri10ge_fw_name against sysfs changes.
param: simple locking for sysfs-writable charp parameters
param: remove unnecessary writable charp
param: add kerneldoc to moduleparam.h
param: locking for kernel parameters
param: make param sections const.
param: use free hook for charp (fix leak of charp parameters)
param: add a free hook to kernel_param_ops.
param: silence .init.text references from param ops
Add param ops struct for hvc_iucv driver.
nfs: update for module_param_named API change
AppArmor: update for module_param_named API change
param: use ops in struct kernel_param, rather than get and set fns directly
param: move the EXPORT_SYMBOL to after the definitions.
...
The current computation, introduced with f12a15be63, of FSEC_PER_SEC using
the multiplication of (FSEC_PER_NSEC * NSEC_PER_SEC) is performed only
with 32bit integers on small machines, resulting in an overflow and a
*very* short intervals being programmed. An interrupt storm follows.
Note that we also have to specify FSEC_PER_SEC as being long long to
overcome the same limitations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (55 commits)
io-mapping: move asm include inside the config option
vgaarb: drop vga.h include
drm/radeon: Add probing of clocks from device-tree
drm/radeon: drop old and broken mesa warning
drm/radeon: Fix pci_map_page() error checking
drm: Remove count_lock for calling lastclose() after 58474713 (v2)
drm/radeon/kms: allow FG_ALPHA_VALUE on r5xx
drm/radeon/kms: another r6xx/r7xx CS checker fix
DRM: Replace kmalloc/memset combos with kzalloc
drm: expand gamma_set
drm/edid: Split mode lists out to their own header for readability
drm/edid: Rewrite mode parse to use the generic detailed block walk
drm/edid: Add detailed block walk for VTB extensions
drm/edid: Add detailed block walk for CEA extensions
drm: Remove unused fields from drm_display_info
drm: Use ENOENT consistently for the error return for an unmatched handle.
drm/radeon/kms: mark 3D power states as performance
drm: Only set DPMS once on the CRTC not after every encoder.
drm/radeon/kms: add additional quirk for Acer rv620 laptop
drm: Propagate error code from fb_create()
...
Fix up trivial conflicts in drivers/gpu/drm/drm_edid.c
* 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
x86: Detect whether we should use Xen SWIOTLB.
pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
xen/mmu: inhibit vmap aliases rather than trying to clear them out
vmap: add flag to allow lazy unmap to be disabled at runtime
xen: Add xen_create_contiguous_region
xen: Rename the balloon lock
xen: Allow unprivileged Xen domains to create iomap pages
xen: use _PAGE_IOMAP in ioremap to do machine mappings
Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
include/xen/xen-ops.h
Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SD/MMC cards tend to support an erase operation. In addition, eMMC v4.4
cards can support secure erase, trim and secure trim operations that are
all variants of the basic erase command.
SD/MMC device attributes "erase_size" and "preferred_erase_size" have been
added.
"erase_size" is the minimum size, in bytes, of an erase operation. For
MMC, "erase_size" is the erase group size reported by the card. Note that
"erase_size" does not apply to trim or secure trim operations where the
minimum size is always one 512 byte sector. For SD, "erase_size" is 512
if the card is block-addressed, 0 otherwise.
SD/MMC cards can erase an arbitrarily large area up to and
including the whole card. When erasing a large area it may
be desirable to do it in smaller chunks for three reasons:
1. A single erase command will make all other I/O on the card
wait. This is not a problem if the whole card is being erased, but
erasing one partition will make I/O for another partition on the
same card wait for the duration of the erase - which could be a
several minutes.
2. To be able to inform the user of erase progress.
3. The erase timeout becomes too large to be very useful.
Because the erase timeout contains a margin which is multiplied by
the size of the erase area, the value can end up being several
minutes for large areas.
"erase_size" is not the most efficient unit to erase (especially for SD
where it is just one sector), hence "preferred_erase_size" provides a good
chunk size for erasing large areas.
For MMC, "preferred_erase_size" is the high-capacity erase size if a card
specifies one, otherwise it is based on the capacity of the card.
For SD, "preferred_erase_size" is the allocation unit size specified by
the card.
"preferred_erase_size" is in bytes.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 83ba7b071f ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished. Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false. This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().
This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.
NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued. But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
that the latter can be avoided when under global dirty background
threshold (which is the normal state for most systems).
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add mfd core driver for TPS6586x PMICs family.
The driver provides I/O access for the sub-device drivers and performs
regstration of the sub-devices based on the platform requirements.
In addition it implements GPIOlib interface for the chip GPIOs.
TODO:
- add interrupt support
- add platform data for PWM, backlight leds and charger
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Mike Rapoport <mike.rapoport@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This adds all remaining definitions that are used by the core driver
to the .c file.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is needed for the mc13783-adc driver to decide if a touch screen is
connected. If so some channels are not available as generic hwmon inputs.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some STMPE devices support entering sleep mode automatically on a
specified timeout of inactivity on the I2C bus with the host system.
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Sundar R Iyer <sundar.iyer@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Later revisions of the WM8994 add some more GPIO functions, define them
in the header file.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This patch adds a MFD driver for the JZ4740 ADC unit. The driver is used to
demultiplex IRQs and synchronize access to shared registers between the
battery, hwmon and (future) touchscreen driver.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Add support for the STMPE family of I/O Expanders from
STMicroelectronics. These devices include upto 24 gpios and a varying
selection of blocks, including PWM, keypad, and touchscreen controllers.
This patch adds the MFD core.
[l.fu@pengutronix.de: fix stmpe811 enable hook]
[l.fu@pengutronix.de: add touchscreen platform data]
Acked-by: Luotao Fu <l.fu@pengutronix.de>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Split max_io_len_target_boundary out of max_io_len so that the discard
support can make use of it without duplicating max_io_len code.
Avoiding max_io_len's split_io logic enables DM's discard support to
submit the entire discard request to a target. But discards must still
be split on target boundaries.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Allow discards to be passed through to linear mappings if at least one
underlying device supports it. Discards will be forwarded only to
devices that support them.
A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.
Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add devname:mapper/control and MAPPER_CTRL_MINOR module alias
to support dm-mod module autoloading.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.
This patch performs different device initialisation in each of the two
cases. (We don't think it's necessary to add code to support changing
between the two types.)
Allowed md->type transitions:
DM_TYPE_NONE to DM_TYPE_BIO_BASED
DM_TYPE_NONE to DM_TYPE_REQUEST_BASED
We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.
Introduce 'type_lock' into the struct mapped_device to protect md->type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md->type_lock is held.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
drivers/md/dm-ioctl.c | 15 +++++++++++++++
drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++-------
drivers/md/dm.h | 5 +++++
include/linux/dm-ioctl.h | 4 ++--
4 files changed, 52 insertions(+), 9 deletions(-)
nouveau starting using these APIs, the first on non-x86 hw, and this
include isn't required on anything with real amounts of vmalloc space.
this fixes a build problem on powerpc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't actually need this include on any platform.
built on powerpc + x86, reported on m68k.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
isofs: Fix lseek() to position beyond 4 GB
vfs: remove unused MNT_STRICTATIME
vfs: show unreachable paths in getcwd and proc
vfs: only add " (deleted)" where necessary
vfs: add prepend_path() helper
vfs: __d_path: dont prepend the name of the root dentry
ia64: perfmon: add d_dname method
vfs: add helpers to get root and pwd
cachefiles: use path_get instead of lone dget
fs/sysv/super.c: add support for non-PDP11 v7 filesystems
V7: Adjust sanity checks for some volumes
Add v7 alias
v9fs: fixup for inode_setattr being removed
Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.
Add multiplexed bus core support. I2C multiplexer and switches
like pca954x get instantiated as new adapters per port.
Signed-off-by: Michael Lawnick <ml.lawnick@gmx.de>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Moving userspace-instantiated clients to separate lists wasn't nearly
enough to avoid deadlocks in multiplexed bus cases. We also want to
have a dedicated mutex to protect each list.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Michael Lawnick <ml.lawnick@gmx.de>
Uninline i2c adapter locking helper functions, move them to i2c-core,
and use them in i2c-core itself. The functions are still exported for
external users. This makes future updates to the locking model (which
will be needed for multiplexing support) possible and transparent.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Michael Lawnick <ml.lawnick@gmx.de>
Now that i2c-core offers the possibility to provide custom probing
function for I2C devices, let's make use of it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The probe method used by i2c_new_probed_device() may not be suitable
for all cases. Let the caller provide its own, optional probe
function.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (226 commits)
ARM: 6323/1: cam60: don't use __init for cam60_spi_{flash_platform_data,partitions}
ARM: 6324/1: cam60: move cam60_spi_devices to .init.data
ARM: 6322/1: imx/pca100: Fix name of spi platform data
ARM: 6321/1: fix syntax error in main Kconfig file
ARM: 6297/1: move U300 timer to dynamic clock lookup
ARM: 6296/1: clock U300 intcon and timer properly
ARM: 6295/1: fix U300 apb_pclk split
ARM: 6306/1: fix inverted MMC card detect in U300
ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID
ARM: 6294/1: etm: do a dummy read from OSSRR during initialization
ARM: 6292/1: coresight: add ETM management registers
ARM: 6288/1: ftrace: document mcount formats
ARM: 6287/1: ftrace: clean up mcount assembly indentation
ARM: 6286/1: fix Thumb-2 decompressor broken by "Auto calculate ZRELADDR"
ARM: 6281/1: video/imxfb.c: allow usage without BACKLIGHT_CLASS_DEVICE
ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h>
ARM: S5PV210: Fix on missing s3c-sdhci card detection method for hsmmc3
ARM: S5P: Fix on missing S5P_DEV_FIMC in plat-s5p/Kconfig
ARM: S5PV210: Override FIMC driver name on Aquila board
ARM: S5PC100: enable FIMC on SMDKC100
...
Fix up conflicts in arch/arm/mach-{s5pc100,s5pv210}/cpu.c due to
different subsystem 'setname' calls, and trivial port types in
include/linux/serial_core.h
Simply replace the whole kfifo.c and kfifo.h files with the new generic
version and fix the kerneldoc API template file.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the new version of the kfifo API files kfifo.c and kfifo.h.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For consistency with other kfifo routines, return bool, not int.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly mkfs-ed filesystems from Seventh Edition have last modification time
set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.
This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.
The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.
Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_get_cache_alignment returns the minimum DMA alignment. Architectures
defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN). So we
can unify dma_get_cache_alignment implementations.
Note that some architectures implement dma_get_cache_alignment wrongly.
dma_get_cache_alignment() should return the minimum DMA alignment. So
fully-coherent architectures should return 1. This patch also fixes this
issue.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now each architecture has the own dma_get_cache_alignment implementation.
dma_get_cache_alignment returns the minimum DMA alignment. Architectures
define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
buffer is DMA-safe; the buffer doesn't share a cache with the others). So
we can unify dma_get_cache_alignment implementations.
This patch:
dma_get_cache_alignment() needs to know if an architecture defines
ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if
architectures doesn't define it.
Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
(except for crypto).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
and zid can be calculated from zone. So remove it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly. thus
it doesn't need to initialize sc.nodemask because shrink_zone() doesn't
use it at all.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the OOM killer scans task, it check a task is under memcg or
not when it's called via memcg's context.
But, as Oleg pointed out, a thread group leader may have NULL ->mm
and task_in_mem_cgroup() may do wrong decision. We have to use
find_lock_task_mm() in memcg as generic OOM-Killer does.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gpios on the max730x chips have support for internal pullups while in
input mode.
This patch adds support for configuring these pullups via platform data.
A new member ("input_pullup_active") to the platform data struct is
introduced. A set bit in this variable activates the pullups while the
respective port is in input mode. This is a compatible enhancement since
unset bits lead to disables pullups which was the default in the original
driver.
_Note_: the 4 lowest bits in "input_pullup_active" are unused because the
first 4 ports of the controller are not used, too.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are some chips (like TI WL12xx series) that can be interfaced over
SDIO but don't support the SDIO specification, meaning that they are
missing CIA (Common I/O Area) with all it's registers. Current Linux SDIO
implementation relies on those registers to identify and configure the
card, so non-standard cards can not function and cause lots of warnings
from the core when it reads invalid data from non-existent registers.
After this patch, init_card() host callback can now set new quirk
MMC_QUIRK_NONSTD_SDIO, which means that SDIO core should not try to access
any standard SDIO registers and rely on init_card() to fill all SDIO
structures instead. As those cards are usually embedded chips, all the
required information can be obtained from machine board files by the host
driver when it's called through init_card() callback.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Adrian Hunter <adrian.hunter@nokia.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Kalle Valo <kvalo@adurom.com>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Kishore Kadiyala <kishore.kadiyala@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If you don't use CONFIG_MMC_UNSAFE_RESUME, as soon as you attempt to
suspend, the card will be removed, therefore this patch doesn't change the
behavior of this option.
However the removal will be done by pm notifier, which runs while
userspace is still not frozen and thus can freely use del_gendisk, without
the risk of deadlock which would happen otherwise.
Card detect workqueue is now disabled while userspace is frozen, Therefore
if you do use CONFIG_MMC_UNSAFE_RESUME, and remove the card during
suspend, the removal will be detected as soon as userspace is unfrozen,
again at the moment it is safe to call del_gendisk.
Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
[akpm@linux-foundation.org: clean up function prototype]
[akpm@linux-foundation.org: fix CONFIG_PM-n linkage, small cleanups]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The eMMC spec 4.4 and 4.3 + additional feature chips has CSD structure
version 3 and version 3 have to check the CSD_STRUCTURE byte in the
EXT_CSD register.
Also fix EXT_CSD revision message.
[akpm@linux-foundation.org: fix comment, per Chris Ball]
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Adrian Hunter <adrian.hunter@nokia.com>
Cc: Chris Ball <cjb@laptop.org>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add <linux/types.h> to <linux/virtio_9p.h> so that types are explicitly
defined:
linux/virtio_9p.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One straggler which was missed due to merge ordering issues.
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc allows this when arg is a function, but sparse complains:
drivers/char/ipmi/ipmi_watchdog.c:303:1: error: cannot dereference this type
drivers/char/ipmi/ipmi_watchdog.c:307:1: error: cannot dereference this type
drivers/char/ipmi/ipmi_watchdog.c:311:1: error: cannot dereference this type
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also reorders the macros with the most common ones at the top.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
There may be cases (most obviously, sysfs-writable charp parameters) where
a module needs to prevent sysfs access to parameters.
Rather than express this in terms of a big lock, the functions are
expressed in terms of what they protect against. This is clearer, esp.
if the implementation changes to a module-level or even param-level lock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Since this section can be read-only (they're in .rodata), they should
always have been const. Minor flow-through various functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This allows us to generalize the KPARAM_KMALLOCED flag, by calling a function
on every parameter when a module is unloaded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.
The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).
Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).
To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch adds voltage regulator driver for Maxim 8998 chip. This chip
is used on Samsung Aquila and GONI boards and provides following
functionalities:
- 4 BUCK voltage converters, 17 LDO power regulators and 5 other power
controllers
- battery charger
This patch adds basic driver for voltage regulators and MAX 8998 MFD core.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
If error hugepage is not in-use, we can fully recovery from error
by dequeuing it from freelist, so return RECOVERY.
Otherwise whether or not we can recovery depends on user processes,
so return DELAYED.
Dependency:
"HWPOISON, hugetlb: enable error handling path for hugepage"
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.
While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.
Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
we can collect processes using a hugepage through pagecache,
but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
we can neither collect processes nor unmap the hugepage.
This patch solves these problems.
This patch include the bug fix given by commit 23be7468e8, so reverts it.
Dependency:
"hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"
ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.
ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
is_vm_hugetlb_page() is a widely used inline function to insert hooks
into hugetlb code.
But we can't use it in pagemap.h because of circular dependency of
the header files. This patch removes this limitation.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Commit d0adde574b added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Prepend "(unreachable)" to path strings if the path is not reachable
from the current root.
Two places updated are
- the return string from getcwd()
- and symlinks under /proc/$PID.
Other uses of d_path() are left unchanged (we know that some old
software crashes if /proc/mounts is changed).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.
get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Newly mkfs-ed filesystems from Seventh Edition have last modification
time set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
different bytesex than we're using.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
"netpoll: Use 'bool' for netpoll_rx() return type." missed the case when
CONFIG_NETPOLL is disabled.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warnings in linux/i2c.h:
Warning(include/linux/i2c.h:176): No description found for parameter 'alert'
Warning(include/linux/i2c.h:259): No description found for parameter 'of_node'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (94 commits)
V4L/DVB: tvp7002: fix write to H-PLL Feedback Divider LSB register
V4L/DVB: dvb: siano: free spinlock before schedule()
V4L/DVB: media: video: pvrusb2: remove custom hex_to_bin()
V4L/DVB: drivers: usbvideo: remove custom implementation of hex_to_bin()
V4L/DVB: Report supported QAM modes on bt8xx
V4L/DVB: media: ir-keytable: null dereference in debug code
V4L/DVB: ivtv: convert to the new control framework
V4L/DVB: ivtv: convert gpio subdev to new control framework
V4L/DVB: wm8739: convert to the new control framework
V4L/DVB: cs53l32a: convert to new control framework
V4L/DVB: wm8775: convert to the new control framework
V4L/DVB: cx2341x: convert to the control framework
V4L/DVB: cx25840: convert to the new control framework
V4L/DVB: cx25840/ivtv: replace ugly priv control with s_config
V4L/DVB: saa717x: convert to the new control framework
V4L/DVB: msp3400: convert to the new control framework
V4L/DVB: saa7115: convert to the new control framework
V4L/DVB: v4l2: hook up the new control framework into the core framework
V4L/DVB: Documentation: add v4l2-controls.txt documenting the new controls API
V4L/DVB: v4l2-ctrls: Whitespace cleanups
...
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
watchdog: hpwdt: formatting of pointers in printk()
watchdog: Adding support for ARM Primecell SP805 Watchdog
watchdog: f71808e_wdt: new watchdog driver for Fintek F71808E and F71882FG
watchdog: sch311x_wdt.c: set parent before registeriing the misc device in probe() function
watchdog: wdt_pci.c: move ids to pci_ids.h
watchdog: s3c2410_wdt - Fix removing of platform device
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add USB-ID for PL-3601 Xbox 360 pad
Input: cy8ctmg100_ts - signedness bug
Input: elantech - report position also with 3 fingers
Input: elantech - discard the first 2 positions on some firmwares
Input: adxl34x - do not mark device as disabled on startup
Input: gpio_keys - add hooks to enable/disable device
Input: evdev - rearrange ioctl handling
Input: dynamically allocate ABS information
Input: switch to input_abs_*() access functions
Input: add static inline accessors for ABS properties
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (68 commits)
U6715 16550A serial driver support
Char: nozomi, set tty->driver_data appropriately
Char: nozomi, fix tty->count counting
serial: max3107: Fix gpiolib support
hsu: call PCI pm hooks in suspend/resume function
hsu: some code cleanup
hsu: add a periodic timer to check dma rx channel
hsu: driver for Medfield High Speed UART device
mxser: remove unnesesary NULL check
serial: add support for OX16PCI958 card
serial: 68328serial.c: remove dead (ALMA_ANS | DRAGONIXVZ | M68EZ328ADS)
timbuart: use __devinit and __devexit macros for probe and remove
serial: MMIO32 support for 8250_early.c
serial: mcf: don't take spinlocks in already protected functions
serial: general fixes in the serial_rs485 structure
serial: fix missing bit coverage of ASYNC_FLAGS
serial: "altera_uart: simplify altera_uart_console_putc()" checkpatch fixes
serial: crisv10: formatting of pointers in printk()
vt: Fix warning: statement with no effect due to vt_kern.h
tty_io: remove casts from void*
...
Fix kernel-doc warnings in linux/usb.h:
Warning(include/linux/usb.h:185): No description found for parameter 'resetting_device'
Warning(include/linux/usb.h:1212): No description found for parameter 'stream_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Logitech Harmony 700 series needs an extra delay during
initialization. This patch adds a USB quirk which enables such a delay
and adds the device to the quirks list.
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1) Introduce ulpi specific flags for control of the ulpi phy
2) Extend the generic ulpi driver with support for Function and
Interface control of upli phy
3) Update the platforms using the generic ulpi driver with new ulpi
flags
4) Remove the otg control flags not in use
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixes below compilation warning from ulpi.h
include/linux/usb/ulpi.h:145:
warning: 'struct otg_io_access_ops' declared inside parameter list
include/linux/usb/ulpi.h:145:
warning: its scope is only this definition or declaration,
which is probably not what you want
Signed-off-by: Ajay Kumar Gupta <ajay.gupta@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1395) adds code to hcd_pci_suspend() for handling wakeup
races. This is another general race pattern, similar to the "open
vs. unregister" race we're all familiar with. Here, the race is
between suspending a device and receiving a wakeup request from one of
the device's suspended children.
In particular, if a root-hub wakeup is requested at about the same
time as the corresponding USB controller is suspended, and if the
controller is enabled for wakeup, then the controller should either
fail to suspend or else wake right back up again.
During system sleep this won't happen very much, especially since host
controllers generally aren't enabled for wakeup during sleep. However
it is definitely an issue for runtime PM. Something like this will be
needed to prevent the controller from autosuspending while waiting for
a root-hub resume to take place. (That is, in fact, the common case,
for which there is an extra test.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1385) adds a "do_wakeup" parameter to the pci_suspend
method used by PCI-based host controller drivers. ehci-hcd in
particular needs to know whether or not to enable wakeup when
suspending a controller. Although that information is currently
available through device_may_wakeup(), when support is added for
runtime suspend this will no longer be true.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1393) converts several of the single-bit fields in
struct usb_hcd to atomic flags. This is for safety's sake; not all
CPUs can update bitfield values atomically, and these flags are used
in multiple contexts.
The flag fields that are set only during registration or removal can
remain as they are, since non-atomic accesses at those times will not
cause any problems.
(Strictly speaking, the authorized_default flag should become atomic
as well. I didn't bother with it because it gets changed only via
sysfs. It can be done later, if anyone wants.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Added a disconnect() callback to composite devices which
is called by composite glue when its disconnect callback
is called by gadget.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
usb_string_ids_tab() and usb_string_ids_n() functions added to
the composite framework. The first accepts an array of
usb_string object and for each registeres a string id and the
second registeres a given number of ids and returns the first.
This may simplify string ids registration since gadgets and
composite functions won't have to call usb_string_id() several
times and each time check for errer status -- all this will be
done with a single call.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
FunctionFS had a bit unique name for function used to add it
to USB configuration. Renamed as to match naming convention
of other functions.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
And audit all the users. None needed the BKL. That was easy
because there was only very few around.
Tested with allmodconfig build on x86-64
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
From: Andi Kleen <ak@linux.intel.com>
With this patch, the LPM capable EHCI host controller can put device
into L1 sleep state which is a mode that can enter/exit quickly, and
reduce power consumption.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
EHCI 1.1 addendum introduced several energy efficiency extensions for
EHCI USB host controllers:
1. LPM (link power management)
2. Per-port change
3. Shorter periodic frame list
4. Hardware prefetching
This patch is intended to define the HW bits and debug interface for
EHCI 1.1 addendum. The LPM and Per-port change patches will be sent out
after this patch.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
otg_io_write() function does not follow the declaration of
struct otg_io_access_ops.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1390) fixes a problem that crops up when a UHCI host
controller is unbound from uhci-hcd while there are still some active
URBs. The URBs have to be unlinked when the root hub is unregistered,
and uhci-hcd relies upon root-hub status polls as part of its
unlinking procedure. But usb_hcd_poll_rh_status() won't make those
status calls if hcd->rh_registered is clear, and the flag is cleared
_before_ the unregistration takes place.
Since hcd->rh_registered is used for other things and needs to be
cleared early, the solution is to add a new flag (rh_pollable) and use
it instead. It gets cleared _after_ the root hub is unregistered.
Now that the status polls don't end too soon, we have to make sure
they also don't occur too late -- after the root hub's usb_device
structure or the HCD's private structures are deallocated. Therefore
the patch adds usb_get_device() and usb_put_device() calls to protect
the root hub structure, and it adds an extra del_timer_sync() to
prevent the root-hub timer from causing an unexpected status poll.
This additional complexity would not be needed if the HCD framework
had provided separate stop() and release() callbacks instead of just
stop(). This lack could be fixed at some future time (although it
would require changes to every host controller driver); when that
happens this patch won't be needed any more.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
UART Features extract from STEricsson U6715 data-sheet (arm926 SoC for mobile phone):
* Fully compatible with industry standard 16C550 and 16C450 from various
manufacturers
* RX and TX 64 byte FIFO reduces CPU interrupts
* Full double buffering
* Modem control signals include CTS, RTS, (and DSR, DTR on UART1 only)
* Automatic baud rate selection
* Manual or automatic RTS/CTS smart hardware flow control
* Programmable serial characteristics:
– Baud rate generation (50 to 3.25M baud)
– 5, 6, 7 or 8-bit characters
– Even, odd or no-parity bit generation and detection
– 1, 1.5 or 2 stop bit generation
* Independent control of transmit, receive, line status, data set interrupts and FIFOs
* Full status-reporting capabilities
* Separate DMA signaling for RX and TX
* Timed interrupt to spread receive interrupt on known duration
* DMA time-out interrupt to allow detection of end of reception
* Carkit pulse coding and decoding compliant with USB carkit control interface [40]
In 16550A auto-configuration, if the fifo size is 64 then it's an U6 16550A port
Add set_termios hook & export serial8250_do_set_termios to change uart
clock following baudrate
Signed-off-by: Philippe Langlais <philippe.langlais@stericsson.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a PCI & UART driver, which suppors both PIO and DMA mode
UART operation. It has 3 identical UART ports and one internal
DMA controller.
Current FW will export 4 pci devices for hsu: 3 uart ports and 1
dma controller, each has one IRQ line. And we need to discuss the
device model, one PCI device covering whole HSU should be a better
model, but there is a problem of how to export the 4 IRQs info
Current driver set the highest baud rate to 2746800bps, which is
easy to scale down to 115200/230400.... To suport higher baud rate,
we need add special process, change DLAB/DLH/PS/DIV/MUL registers
all together.
921600 is the highest baud rate that has been tested with Bluetooth
modem connected to HSU port 0. Will test more when there is right
BT firmware.
Current version contains several work around for A0's Silicon bugs
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix several issues related to the RS485 interface:
- It adds the flag SER_RS485_RTS_BEFORE_SEND that was missing from the
serial_rs485 structure (even if "delay_rts_before_send" was existing)
- It adds a further "delay_rts_after_send" field for those drivers that
can have a delay after send (e.g., atmel_serial)
- It fixes the usage of the structure in the atmel_serial driver (where
"delay_rts_before_send" should be used instead of "delay_rts_after_send").
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Bernhard Roth <br@pwrnet.de>
Cc: Philippe De Muyter <phdm@macqel.be>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems that currently ASYNC_FLAGS is one bit short of covering all the
bits of the ASYNC user flags. In particular it does not cover the
ASYNC_AUTOPROBE bit.
ASYNCB_LAST_USER and ASYNCB_AUTOPROBE are both equal to 15.
Therefore:
ASYNC_AUTOPROBE = 1000 0000 0000 0000
ASYNC_FLAGS = 0111 1111 1111 1111
So ASYNC_FLAGS is not covering the ASYNC_AUTOPROBE bit.
This patch fixes the issue and with the patch the values will be:
ASYNC_AUTOPROBE = 1000 0000 0000 0000
ASYNC_FLAGS = 1111 1111 1111 1111
As a side note, doing a "git grep" I didn't find any use of
ASYNC_AUTOPROBE or ASYNCB_AUTOPROBE in the kernel, besides this include
file.
Signed-off-by: John Villalovos <john.l.villalovos@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Using:
gcc (GCC) 4.5.0 20100610 (prerelease)
with CONFIG_CONSOLE_TRANSLATIONS=n, the following warnings are seen:
drivers/char/vt_ioctl.c: In function ‘vt_ioctl’:
drivers/char/vt_ioctl.c:1309:4: warning: statement with no effect
drivers/char/vt.c: In function ‘vc_allocate’:
drivers/char/vt.c:774:3: warning: statement with no effect
drivers/video/console/vgacon.c: In function ‘vgacon_init’:
drivers/video/console/vgacon.c:587:3: warning: statement with no effect
drivers/video/console/vgacon.c: In function ‘vgacon_deinit’:
drivers/video/console/vgacon.c:606:2: warning: statement with no effect
drivers/video/console/fbcon.c: In function ‘fbcon_init’:
drivers/video/console/fbcon.c:1087:3: warning: statement with no effect
drivers/video/console/fbcon.c:1089:3: warning: statement with no effect
drivers/video/console/fbcon.c: In function ‘fbcon_set_disp’:
drivers/video/console/fbcon.c:1369:3: warning: statement with no effect
drivers/video/console/fbcon.c:1371:3: warning: statement with no effect
This is because several functions in include/linux/vt_kern.h are
defined to (0). Convert them to static inline functions to
silence the warnings and gain a bit of type safety.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At the moment there is only one platform type supported and there is is
hard wired, but with these changes the infrastructure is now there for
anyone else to provide methods for their hardware.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The tty locking now follows the rules for mutexes, so
we can replace the BKL usage with a new subsystem
wide mutex.
Using a regular mutex here will change the behaviour
when blocked on the BTM from spinning to sleeping,
but that should not be visible to the user.
Using the mutex also means that all the BTM is now
covered by lockdep.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This changes all remaining users of tty_lock_nested
to be non-recursive, which lets us kill this function.
As a consequence, we won't need to keep the lock count
any more, which allows more simplifications later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Calling wait_event_interruptible implicitly
releases the BKL when it sleeps, but we need
to do this explcitly when we have converted
it to a mutex.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As a preparation for replacing the big kernel lock
in the TTY layer, wrap all the callers in new
macros tty_lock, tty_lock_nested and tty_unlock.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This takes all the tty references through the expected interface points so
we can refcount them.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The vt layer isn't safely handling reference counts to tty object on the input
side. Add a tty port structure to the vt layer in order to implement this using
the standard helpers.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pass down the ldisc number so that the drivers don't have to peek into the
tty object themselves. This lets us get rid of another case of back referencing
port to tty which we don't want (because of races versus hangup/close).
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This lets us avoid problems with races on the flag changes
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jesse's initial patch commit said:
"At panic time (i.e. when oops_in_progress is set) we should try a bit
harder to update the screen and make sure output gets to the VT, since
some drivers are capable of flipping back to it.
So make sure we try to unblank and update the display if called from a
panic context."
I've enhanced this to add a flag to the vc that console layer can set to
indicate they want this behaviour to occur. This also adds support to
fbcon for that flag and adds an fb flag for drivers to indicate they want
to use the support. It enables this for KMS drivers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is against the 2.6.34 source.
Paraphrased from the 1989 BSD patch by David Borman @ cray.com:
These are the changes needed for the kernel to support
LINEMODE in the server.
There is a new bit in the termios local flag word, EXTPROC.
When this bit is set, several aspects of the terminal driver
are disabled. Input line editing, character echo, and mapping
of signals are all disabled. This allows the telnetd to turn
off these functions when in linemode, but still keep track of
what state the user wants the terminal to be in.
New ioctl:
TIOCSIG Generate a signal to processes in the
current process group of the pty.
There is a new mode for packet driver, the TIOCPKT_IOCTL bit.
When packet mode is turned on in the pty, and the EXTPROC bit
is set, then whenever the state of the pty is changed, the
next read on the master side of the pty will have the TIOCPKT_IOCTL
bit set. This allows the process on the server side of the pty
to know when the state of the terminal has changed; it can then
issue the appropriate ioctl to retrieve the new state.
Since the original BSD patches accompanied the source code for telnet
I've left that reference here, but obviously the feature is useful for
any remote terminal protocol, including ssh.
The corresponding feature has existed in the BSD tty driver since 1989.
For historical reference, a good copy of the relevant files can be found
here:
http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741
Signed-off-by: Howard Chu <hyc@symas.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit
Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.
* git://git.infradead.org/mtd-2.6: (79 commits)
mtd: Remove obsolete <mtd/compatmac.h> include
mtd: Update copyright notices
jffs2: Update copyright notices
mtd-physmap: add support users can assign the probe type in board files
mtd: remove redwood map driver
mxc_nand: Add v3 (i.MX51) Support
mxc_nand: support 8bit ecc
mxc_nand: fix correct_data function
mxc_nand: add V1_V2 namespace to registers
mxc_nand: factor out a check_int function
mxc_nand: make some internally used functions overwriteable
mxc_nand: rework get_dev_status
mxc_nand: remove 0xe00 offset from registers
mtd: denali: Add multi connected NAND support
mtd: denali: Remove set_ecc_config function
mtd: denali: Remove unuseful code in get_xx_nand_para functions
mtd: denali: Remove device_info_tag structure
mtd: m25p80: add support for the Winbond W25Q32 SPI flash chip
mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips
mtd: m25p80: add support for the EON EN25P{32, 64} SPI flash chips
...
Fix up trivial conflicts in drivers/mtd/maps/{Kconfig,redwood.c} due to
redwood driver removal.
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
fanotify: use both marks when possible
fsnotify: pass both the vfsmount mark and inode mark
fsnotify: walk the inode and vfsmount lists simultaneously
fsnotify: rework ignored mark flushing
fsnotify: remove global fsnotify groups lists
fsnotify: remove group->mask
fsnotify: remove the global masks
fsnotify: cleanup should_send_event
fanotify: use the mark in handler functions
audit: use the mark in handler functions
dnotify: use the mark in handler functions
inotify: use the mark in handler functions
fsnotify: send fsnotify_mark to groups in event handling functions
fsnotify: Exchange list heads instead of moving elements
fsnotify: srcu to protect read side of inode and vfsmount locks
fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
fsnotify: use _rcu functions for mark list traversal
fsnotify: place marks on object in order of group memory address
vfs/fsnotify: fsnotify_close can delay the final work in fput
fsnotify: store struct file not struct path
...
Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
These form the basis of the basic WRITE etc primitives, so we
need them to be always visible. Otherwise we see errors like:
mm/filemap.c:2164: error: 'REQ_WRITE' undeclared
fs/read_write.c:362: error: 'REQ_WRITE' undeclared
fs/splice.c:1108: error: 'REQ_WRITE' undeclared
fs/aio.c:1496: error: 'REQ_WRITE' undeclared
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fix etherdevice.h parameter name typo in kernel-doc:
Warning(include/linux/etherdevice.h:138): No description found for parameter 'hwaddr'
Warning(include/linux/etherdevice.h:138): Excess function parameter 'addr' description in 'dev_hw_addr_random'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (59 commits)
igbvf.txt: Add igbvf Documentation
igb.txt: Add igb documentation
e100/e1000*/igb*/ixgb*: Add missing read memory barrier
ixgbe: fix build error with FCOE_CONFIG without DCB_CONFIG
netxen: protect tx timeout recovery by rtnl lock
isdn: gigaset: use after free
isdn: gigaset: add missing unlock
solos-pci: Fix race condition in tasklet RX handling
pkt_sched: Fix sch_sfq vs tcf_bind_filter oops
net: disable preemption before call smp_processor_id()
tcp: no md5sig option size check bug
iwlwifi: fix locking assertions
iwlwifi: fix TX tracer
isdn: fix information leak
net: Fix napi_gro_frags vs netpoll path
usbnet: remove noisy and hardly useful printk
rtl8180: avoid potential NULL deref in rtl8180_beacon_work
ath9k: Remove myself from the MAINTAINERS list
libertas: scan before assocation if no BSSID was given
libertas: fix association with some APs by using extended rates
...
Getting and putting arrays of pointers with flex arrays is a PITA. You
have to remember to pass &ptr to the _put and you have to do weird and
wacky casting to get the ptr back from the _get. Add two functions
flex_array_get_ptr() and flex_array_put_ptr() to handle all of the magic.
[akpm@linux-foundation.org: simplification suggested by Joe]
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: James Morris <jmorris@namei.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A profile of a network benchmark showed iommu_num_pages rather high up:
0.52% iommu_num_pages
Looking at the profile, an integer divide is taking almost all of the time:
%
: c000000000376ea4 <.iommu_num_pages>:
1.93 : c000000000376ea4: fb e1 ff f8 std r31,-8(r1)
0.00 : c000000000376ea8: f8 21 ff c1 stdu r1,-64(r1)
0.00 : c000000000376eac: 7c 3f 0b 78 mr r31,r1
3.86 : c000000000376eb0: 38 84 ff ff addi r4,r4,-1
0.00 : c000000000376eb4: 38 05 ff ff addi r0,r5,-1
0.00 : c000000000376eb8: 7c 84 2a 14 add r4,r4,r5
46.95 : c000000000376ebc: 7c 00 18 38 and r0,r0,r3
45.66 : c000000000376ec0: 7c 84 02 14 add r4,r4,r0
0.00 : c000000000376ec4: 7c 64 2b 92 divdu r3,r4,r5
0.00 : c000000000376ec8: 38 3f 00 40 addi r1,r31,64
0.00 : c000000000376ecc: eb e1 ff f8 ld r31,-8(r1)
1.61 : c000000000376ed0: 4e 80 00 20 blr
Since every caller of iommu_num_pages passes in a constant power of two
we can inline this such that the divide is replaced by a shift. The
entire function is only a few instructions once optimised, so it is
a good candidate for inlining overall.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no more uses of NIPQUAD or NIPQUAD_FMT. Remove the definitions.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should use the __same_type() helper in __must_be_array().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some SoC chips, HW resources may be in use during any particular idle
period. As a consequence, the cpuidle states that the SoC is safe to
enter can change from idle period to idle period. In addition, the
latency and threshold of each cpuidle state can vary, depending on the
operating condition when the CPU becomes idle, e.g. the current cpu
frequency, the current state of the HW blocks, etc.
cpuidle core and the menu governor, in the current form, are geared
towards cpuidle states that are static, i.e. the availabiltiy of the
states, their latencies, their thresholds are non-changing during run
time. cpuidle does not provide any hook that cpuidle drivers can use to
adjust those values on the fly for the current idle period before the menu
governor selects the target cpuidle state.
This patch extends cpuidle core and the menu governor to handle states
that are dynamic. There are three additions in the patch and the patch
maintains backwards-compatibility with existing cpuidle drivers.
1) add prepare() to struct cpuidle_device. A cpuidle driver can hook
into the callback and cpuidle will call prepare() before calling the
governor's select function. The callback gives the cpuidle driver a
chance to update the dynamic information of the cpuidle states for the
current idle period, e.g. state availability, latencies, thresholds,
power values, etc.
2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare()
function, a cpuidle driver can set/clear the flag to indicate to the
menu governor whether a cpuidle state should be ignored, i.e. not
available, during the current idle period.
3) add power_specified bit to struct cpuidle_device. The menu governor
currently assumes that the cpuidle states are arranged in the order of
increasing latency, threshold, and power savings. This is true or can
be made true for static states. Once the state parameters are dynamic,
the latencies, thresholds, and power savings for the cpuidle states can
increase or decrease by different amounts from idle period to idle
period. So the assumption of increasing latency, threshold, and power
savings from Cn to C(n+1) can no longer be guaranteed.
It can be straightforward to calculate the power consumption of each
available state and to specify it in power_usage for the idle period.
Using the power_usage fields, the menu governor then selects the state
that has the lowest power consumption and that still satisfies all other
critieria. The power_specified bit defaults to 0. For existing cpuidle
drivers, cpuidle detects that power_specified is 0 and fills in a dummy
set of power_usage values.
Signed-off-by: Ai Li <aili@codeaurora.org>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When taking a memory snapshot in hibernate_snapshot(), all (directly
called) memory allocations use GFP_ATOMIC. Hence swap misusage during
hibernation never occurs.
But from a pessimistic point of view, there is no guarantee that no page
allcation has __GFP_WAIT. It is better to have a global indication "we
enter hibernation, don't use swap!".
This patch tries to freeze new-swap-allocation during hibernation. (All
user processes are frozenm so swapin is not a concern).
This way, no updates will happen to swap_map[] between
hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
is called. We can be assured that swap corruption will not occur.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On swapin it is fairly common for a page to be owned exclusively by one
process. In that case we want to add the page to the anon_vma of that
process's VMA, instead of to the root anon_vma.
This will reduce the amount of rmap searching that the swapout code needs
to do.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/oom_adj is now deprecated so that that it may eventually be
removed. The target date for removal is August 2012.
A warning will be printed to the kernel log if a task attempts to use this
interface. Future warning will be suppressed until the kernel is rebooted
to prevent spamming the kernel log.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions. The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.
Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead. This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits. This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.
The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory. "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit. The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.
The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.
Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs. In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.
Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it. It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability. Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered. The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.
/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa. Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning. Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity. This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 2.6.28 zone->prev_priority is unused. Then it can be removed
safely. It reduce stack usage slightly.
Now I have to say that I'm sorry. 2 years ago, I thought prev_priority
can be integrate again, it's useful. but four (or more) times trying
haven't got good performance number. Thus I give up such approach.
The rest of this changelog is notes on prev_priority and why it existed in
the first place and why it might be not necessary any more. This information
is based heavily on discussions between Andrew Morton, Rik van Riel and
Kosaki Motohiro who is heavily quotes from.
Historically prev_priority was important because it determined when the VM
would start unmapping PTE pages. i.e. there are no balances of note within
the VM, Anon vs File and Mapped vs Unmapped. Without prev_priority, there
is a potential risk of unnecessarily increasing minor faults as a large
amount of read activity of use-once pages could push mapped pages to the
end of the LRU and get unmapped.
There is no proof this is still a problem but currently it is not considered
to be. Active files are not deactivated if the active file list is smaller
than the inactive list reducing the liklihood that file-mapped pages are
being pushed off the LRU and referenced executable pages are kept on the
active list to avoid them getting pushed out by read activity.
Even if it is a problem, prev_priority prev_priority wouldn't works
nowadays. First of all, current vmscan still a lot of UP centric code. it
expose some weakness on some dozens CPUs machine. I think we need more and
more improvement.
The problem is, current vmscan mix up per-system-pressure, per-zone-pressure
and per-task-pressure a bit. example, prev_priority try to boost priority to
other concurrent priority. but if the another task have mempolicy restriction,
it is unnecessary, but also makes wrong big latency and exceeding reclaim.
per-task based priority + prev_priority adjustment make the emulation of
per-system pressure. but it have two issue 1) too rough and brutal emulation
2) we need per-zone pressure, not per-system.
Another example, currently DEF_PRIORITY is 12. it mean the lru rotate about
2 cycle (1/4096 + 1/2048 + 1/1024 + .. + 1) before invoking OOM-Killer.
but if 10,0000 thrreads enter DEF_PRIORITY reclaim at the same time, the
system have higher memory pressure than priority==0 (1/4096*10,000 > 2).
prev_priority can't solve such multithreads workload issue. In other word,
prev_priority concept assume the sysmtem don't have lots threads."
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We try to avoid livelocks of writeback when some steadily creates dirty
pages in a mapping we are writing out. For memory-cleaning writeback,
using nr_to_write works reasonably well but we cannot really use it for
data integrity writeback. This patch tries to solve the problem.
The idea is simple: Tag all pages that should be written back with a
special tag (TOWRITE) in the radix tree. This can be done rather quickly
and thus livelocks should not happen in practice. Then we start doing the
hard work of locking pages and sending them to disk only for those pages
that have TOWRITE tag set.
Note: Adding new radix tree tag grows radix tree node from 288 to 296
bytes for 32-bit archs and from 552 to 560 bytes for 64-bit archs.
However, the number of slab/slub items per page remains the same (13 and 7
respectively).
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement function for setting one tag if another tag is set for each item
in given range.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new anon-vma code, was suboptimal and it lead to erratic invocation of
ksm_does_need_to_copy. That leads to host hangs or guest vnc lockup, or
weird behavior. It's unclear why ksm_does_need_to_copy is unstable but
the point is that when KSM is not in use, ksm_does_need_to_copy must never
run or we bounce pages for no good reason. I suspect the same hangs will
happen with KVM swaps. But this at least fixes the regression in the
new-anon-vma code and it only let KSM bugs triggers when KSM is in use.
The code in do_swap_page likely doesn't cope well with a not-swapcache,
especially the memcg code.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Izik Eidus <ieidus@yahoo.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementation of tmpfs is not scalable. We found that
stat_lock is contended by multiple threads when we need to get a new page,
leading to useless spinning inside this spin lock.
This patch makes use of the percpu_counter library to maintain local count
of used blocks to speed up getting and returning of pages. So the
acquisition of stat_lock is unnecessary for getting and returning blocks,
improving the performance of tmpfs on system with large number of cpus.
On a 4 socket 32 core NHM-EX system, we saw improvement of 270%.
The implementation below has a slight chance of race between threads
causing a slight overshoot of the maximum configured blocks. However, any
overshoot is small, and is bounded by the number of cpus. This happens
when the number of used blocks is slightly below the maximum configured
blocks when a thread checks the used block count, and another thread
allocates the last block before the current thread does. This should not
be a problem for tmpfs, as the overshoot is most likely to be a few blocks
and bounded. If a strict limit is really desired, then configured the max
blocks to be the limit less the number of cpus in system.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add percpu_counter_compare that allows for a quick but accurate comparison
of percpu_counter with a given value.
A rough count is provided by the count field in percpu_counter structure,
without accounting for the other values stored in individual cpu counters.
The actual count is a sum of count and the cpu counters. However, count
field is never different from the actual value by a factor of
batch*num_online_cpu. We do not need to get actual count for comparison
if count is different from the given value by this factor and allows for
quick comparison without summing up all the per cpu counters.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No real bugs, just some dead code and some fixups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid quite a lot of warnings in header files in a gcc 4.6 -Wall builds
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define stubs for the numa_*_id() generic percpu related functions for
non-NUMA configurations in <asm-generic/topology.h> where the other
non-numa stubs live.
Fixes ia64 !NUMA build breakage -- e.g., tiger_defconfig
Back out now unneeded '#ifndef CONFIG_NUMA' guards from ia64 smpboot.c
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have been used naming try_set_zone_oom and clear_zonelist_oom.
The role of functions is to lock of zonelist for preventing parallel
OOM. So clear_zonelist_oom makes sense but try_set_zone_oome is rather
awkward and unmatched with clear_zonelist_oom.
Let's change it with try_set_zonelist_oom.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The three oom killer sysctl variables (sysctl_oom_dump_tasks,
sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better
declared in include/linux/oom.h rather than kernel/sysctl.c.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are various points in the oom killer where the kernel must determine
whether to panic or not. It's better to extract this to a helper function
to remove all the confusion as to its semantics.
Also fix a call to dump_header() where tasklist_lock is not read- locked,
as required.
There's no functional change with this patch.
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oom killer presently kills current whenever there is no more memory
free or reclaimable on its mempolicy's nodes. There is no guarantee that
current is a memory-hogging task or that killing it will free any
substantial amount of memory, however.
In such situations, it is better to scan the tasklist for nodes that are
allowed to allocate on current's set of nodes and kill the task with the
highest badness() score. This ensures that the most memory-hogging task,
or the one configured by the user with /proc/pid/oom_adj, is always
selected in such scenarios.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment suggests that when b_count equals zero it is calling
__wait_no_buffer to trigger some debug, but as there is no debug in
__wait_on_buffer the whole thing is redundant.
AFAICT from the git log this has been the case for at least 5 years, so
it seems safe just to remove this.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM reference counts can cause an anon_vma to exist after the processe it
belongs to have already exited. Because the anon_vma lock now lives in
the root anon_vma, we need to ensure that the root anon_vma stays around
until after all the "child" anon_vmas have been freed.
The obvious way to do this is to have a "child" anon_vma take a reference
to the root in anon_vma_fork. When the anon_vma is freed at munmap or
process exit, we drop the refcount in anon_vma_unlink and possibly free
the root anon_vma.
The KSM anon_vma reference count function also needs to be modified to
deal with the possibility of freeing 2 levels of anon_vma. The easiest
way to do this is to break out the KSM magic and make it generic.
When compiling without CONFIG_KSM, this code is compiled out.
Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Always (and only) lock the root (oldest) anon_vma whenever we do something
in an anon_vma. The recently introduced anon_vma scalability is due to
the rmap code scanning only the VMAs that need to be scanned. Many common
operations still took the anon_vma lock on the root anon_vma, so always
taking that lock is not expected to introduce any scalability issues.
However, always taking the same lock does mean we only need to take one
lock, which means rmap_walk on pages from any anon_vma in the vma is
excluded from occurring during an munmap, expand_stack or other operation
that needs to exclude rmap_walk and similar functions.
Also add the proper locking to vma_adjust.
Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Track the root (oldest) anon_vma in each anon_vma tree. Because we only
take the lock on the root anon_vma, we cannot use the lock on higher-up
anon_vmas to lock anything. This makes it impossible to do an indirect
lookup of the root anon_vma, since the data structures could go away from
under us.
However, a direct pointer is safe because the root anon_vma is always the
last one that gets freed on munmap or exit, by virtue of the same_vma list
order and unlink_anon_vmas walking the list forward.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline
function doing exactly the same.
This makes it easier to do the substitution to the root anon_vma lock in a
following patch.
We will deal with the handful of special locks (nested, dec_and_lock, etc)
separately.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style
used in page_lock_anon_vma and will come in really handy further down in
this patch series.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].
kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself. This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).
Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong"). This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.
The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).
The previous version of this patch was compile tested on x86-64.
[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
share a common ancestor with both; there seems to be quite some
degree of copy-and-paste coding here. The include/asm/highmem.h file
for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
the first parameter of kunmap_atomic() instead of void *?
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount. That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock. deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.
What we need is a way to tell if superblock has been successfully
set up. Unfortunately, neither ->s_root nor the check for
MS_ACTIVE quite fit. Cheap and easy way, suitable for backport:
new flag set by the (only) caller of ->get_sb(). If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).
Longer term we want to set that flag in ->get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking ->s_root as we do now).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
The mbcache code was written to support a variable number of indexes,
but all the existing users use exactly one index. Simplify to code to
support only that case.
There are also no users of the cache entry free operation, and none of
the users keep extra data in cache entries. Remove those features as
well.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a flags field to help glibc implementing statvfs(3) efficiently.
We copy the flag values from glibc, and add a new ST_VALID flag to
denote that f_flags is implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:
- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.
In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
builds path relative to fs root, called under dcache_lock,
doesn't append any nonsense to unlinked ones.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Essentially, the minimal variant of ->evict_inode(). It's
a trimmed-down clear_inode(), sans any fs callbacks. Once
it returns we know that no async writeback will be happening;
every ->evict_inode() instance should do that once and do that
before doing anything ->write_inode() could interfere with
(e.g. freeing the on-disk inode).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hybrid of ->clear_inode() and ->delete_inode(); if present, does
all fs work to be done when in-core inode is about to be gone,
for whatever reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok. Also clean up and document inode_change_ok
to make this obvious.
As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error. This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere. Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.
Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.
Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.
While we're at it also remove several unused arguments to block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated. Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to cont_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the only
remaining caller and rename the non-truncating version to nobh_write_begin.
Get rid of the superflous file argument to it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) count file openers correctly; i_count use was completely wrong
b) use new mutex for exclusion between final close/open/truncate,
to protect tailpacking logics. i_mutex use was wrong and resulted
in deadlocks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kmem_cache->cpu_slab is a percpu pointer but was missing __percpu
markup. Add it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This driver exports a video device node per each camera interface/
video postprocessor (FIMC) device contained in Samsung S5P SoC series.
The driver is based on v4l2-mem2mem framework.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Pawel Osciak <p.osciak@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Handling of autofs ioctl numbers does not need to be generic
and can easily be done directly in autofs itself.
This also pushes the BKL into autofs and autofs4 ioctl
methods.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>
Cc: Autofs <autofs@linux.kernel.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the VENDOR/DEVICE ids to pci_ids.h.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: check kmalloc() result
arch/tile: catch up on various minor cleanups.
arch/tile: avoid erroneous error return for PTRACE_POKEUSR.
tile: set ARCH_KMALLOC_MINALIGN
tile: remove homegrown L1_CACHE_ALIGN macro
arch/tile: Miscellaneous cleanup changes.
arch/tile: Split the icache flush code off to a generic <arch> header.
arch/tile: Fix bug in support for atomic64_xx() ops.
arch/tile: Shrink the tile-opcode files considerably.
arch/tile: Add driver to enable access to the user dynamic network.
arch/tile: Enable more sophisticated IRQ model for 32-bit chips.
Move list types from <linux/list.h> to <linux/types.h>.
Add wait4() back to the set of <asm-generic/unistd.h> syscalls.
Revert adding some arch-specific signal syscalls to <linux/syscalls.h>.
arch/tile: Do not use GFP_KERNEL for dma_alloc_coherent(). Feedback from fujita.tomonori@lab.ntt.co.jp.
arch/tile: core support for Tilera 32-bit chips.
Fix up the "generic" unistd.h ABI to be more useful.
There are three reasons to add this support:
1. users probably know the interface type of their flashs, then probe
can be faster if they give the right type in platform data since wrong
types will not be detected.
2. sometimes, detecting can cause destory to system. For example, for
kernel XIP, detecting can cause NOR enter a mode instructions can not
be fetched right, which will make kernel crash.
3. For a new probe which is not listed in the rom_probe_types, if users
assign it in board files, physmap can still probe it.
Signed-off-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (82 commits)
firewire: core: add forgotten dummy driver methods, remove unused ones
firewire: add isochronous multichannel reception
firewire: core: small clarifications in core-cdev
firewire: core: remove unused code
firewire: ohci: release channel in error path
firewire: ohci: use memory barriers to order descriptor updates
tools/firewire: nosy-dump: increment program version
tools/firewire: nosy-dump: remove unused code
tools/firewire: nosy-dump: use linux/firewire-constants.h
tools/firewire: nosy-dump: break up a deeply nested function
tools/firewire: nosy-dump: make some symbols static or const
tools/firewire: nosy-dump: change to kernel coding style
tools/firewire: nosy-dump: work around segfault in decode_fcp
tools/firewire: nosy-dump: fix it on x86-64
tools/firewire: add userspace front-end of nosy
firewire: nosy: note ioctls in ioctl-number.txt
firewire: nosy: use generic printk macros
firewire: nosy: endianess fixes and annotations
firewire: nosy: annotate __user pointers and __iomem pointers
firewire: nosy: fix device shutdown with active client
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (214 commits)
ALSA: hda - Add pin-fix for HP dc5750
ALSA: als4000: Fix potentially invalid DMA mode setup
ALSA: als4000: enable burst mode
ALSA: hda - Fix initial capsrc selection in patch_alc269()
ASoC: TWL4030: Capture route runtime DAPM ordering fix
ALSA: hda - Add PC-beep whitelist for an Intel board
ALSA: hda - More relax for pending period handling
ALSA: hda - Define AC_FMT_* constants
ALSA: hda - Fix beep frequency on IDT 92HD73xx and 92HD71Bxx codecs
ALSA: hda - Add support for HDMI HBR passthrough
ALSA: hda - Set Stream Type in Stream Format according to AES0
ALSA: hda - Fix Thinkpad X300 so SPDIF is not exposed
ALSA: hda - FIX to not expose SPDIF on Thinkpad X301, since it does not have the ability to use SPDIF
ASoC: wm9081: fix resource reclaim in wm9081_register error path
ASoC: wm8978: fix a memory leak if a wm8978_register fail
ASoC: wm8974: fix a memory leak if another WM8974 is registered
ASoC: wm8961: fix resource reclaim in wm8961_register error path
ASoC: wm8955: fix resource reclaim in wm8955_register error path
ASoC: wm8940: fix a memory leak if wm8940_register return error
ASoC: wm8904: fix resource reclaim in wm8904_register error path
...
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd4: fix file open accounting for RDWR opens
nfsd: don't allow setting maxblksize after svc created
nfsd: initialize nfsd versions before creating svc
net: sunrpc: removed duplicated #include
nfsd41: Fix a crash when a callback is retried
nfsd: fix startup/shutdown order bug
nfsd: minor nfsd read api cleanup
gcc-4.6: nfsd: fix initialized but not read warnings
nfsd4: share file descriptors between stateid's
nfsd4: fix openmode checking on IO using lock stateid
nfsd4: miscellaneous process_open2 cleanup
nfsd4: don't pretend to support write delegations
nfsd: bypass readahead cache when have struct file
nfsd: minor nfsd_svc() cleanup
nfsd: move more into nfsd_startup()
nfsd: just keep single lockd reference for nfsd
nfsd: clean up nfsd_create_serv error handling
nfsd: fix error handling in __write_ports_addxprt
nfsd: fix error handling when starting nfsd with rpcbind down
nfsd4: fix v4 state shutdown error paths
...
* 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits)
NFS: NFSv4.1 is no longer a "developer only" feature
NFS: NFS_V4 is no longer an EXPERIMENTAL feature
NFS: Fix /proc/mount for legacy binary interface
NFS: Fix the locking in nfs4_callback_getattr
SUNRPC: Defer deleting the security context until gss_do_free_ctx()
SUNRPC: prevent task_cleanup running on freed xprt
SUNRPC: Reduce asynchronous RPC task stack usage
SUNRPC: Move the bound cred to struct rpc_rqst
SUNRPC: Clean up of rpc_bindcred()
SUNRPC: Move remaining RPC client related task initialisation into clnt.c
SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task
SUNRPC: Make the credential cache hashtable size configurable
SUNRPC: Store the hashtable size in struct rpc_cred_cache
NFS: Ensure the AUTH_UNIX credcache is allocated dynamically
NFS: Fix the NFS users of rpc_restart_call()
SUNRPC: The function rpc_restart_call() should return success/failure
NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
NFSv4: Clean up the process of renewing the NFSv4 lease
NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
NFS: nfs_rename() should not have to flush out writebacks
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits)
nilfs2: reject filesystem with unsupported block size
nilfs2: avoid rec_len overflow with 64KB block size
nilfs2: simplify nilfs_get_page function
nilfs2: reject incompatible filesystem
nilfs2: add feature set fields to super block
nilfs2: clarify byte offset in super block format
nilfs2: apply read-ahead for nilfs_btree_lookup_contig
nilfs2: introduce check flag to btree node buffer
nilfs2: add btree get block function with readahead option
nilfs2: add read ahead mode to nilfs_btnode_submit_block
nilfs2: fix buffer head leak in nilfs_btnode_submit_block
nilfs2: eliminate inline keywords in btree implementation
nilfs2: get maximum number of child nodes from bmap object
nilfs2: reduce repetitive calculation of max number of child nodes
nilfs2: optimize calculation of min/max number of btree node children
nilfs2: remove redundant pointer checks in bmap lookup functions
nilfs2: get rid of nilfs_bmap_union
nilfs2: unify bmap set_target_v operations
nilfs2: get rid of nilfs_btree uses
nilfs2: get rid of nilfs_direct uses
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
ext4: Adding error check after calling ext4_mb_regular_allocator()
ext4: Fix dirtying of journalled buffers in data=journal mode
ext4: re-inline ext4_rec_len_(to|from)_disk functions
jbd2: Remove t_handle_lock from start_this_handle()
jbd2: Change j_state_lock to be a rwlock_t
jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
ext4: Add mount options in superblock
ext4: force block allocation on quota_off
ext4: fix freeze deadlock under IO
ext4: drop inode from orphan list if ext4_delete_inode() fails
ext4: check to make make sure bd_dev is set before dereferencing it
jbd2: Make barrier messages less scary
ext4: don't print scary messages for allocation failures post-abort
ext4: fix EFBIG edge case when writing to large non-extent file
ext4: fix ext4_get_blocks references
ext4: Always journal quota file modifications
ext4: Fix potential memory leak in ext4_fill_super
ext4: Don't error out the fs if the user tries to make a file too big
ext4: allocate stripe-multiple IOs on stripe boundaries
ext4: move aio completion after unwritten extent conversion
...
Fix up conflicts in fs/ext4/inode.c as per Ted.
Fix up xfs conflicts as per earlier xfs merge.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Fix dirtying of journalled buffers in data=journal mode
ext3: default to ordered mode
quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
quota: Change quota error message to print out disk and function name
MAINTAINERS: Update entries of ext2 and ext3
MAINTAINERS: Update address of Andreas Dilger
ext3: Avoid filesystem corruption after a crash under heavy delete load
ext3: remove vestiges of nobh support
ext3: Fix set but unused variables
quota: clean up quota active checks
quota: Clean up the namespace in dqblk_xfs.h
quota: check quota reservation on remove_dquot_ref
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[DNS RESOLVER] Minor typo correction
DNS: Fixes for the DNS query module
cifs: Include linux/err.h for IS_ERR and PTR_ERR
DNS: Make AFS go to the DNS for AFSDB records for unknown cells
DNS: Separate out CIFS DNS Resolver code
cifs: account for new creduid=0x%x parameter in spnego upcall string
cifs: reduce false positives with inode aliasing serverino autodisable
CIFS: Make cifs_convert_address() take a const src pointer and a length
cifs: show features compiled in as part of DebugData
cifs: update README
Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...
Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
Stephen reports:
After merging the block tree, today's linux-next build (x86_64
allmodconfig) failed like this:
usr/include/linux/fs.h:11: included file 'linux/blk_types.h' is not exported
Caused by commit 9d3dbbcd9a84518ff5e32ffe671d06a48cf84fd9 ("bio, fs:
separate out bio_types.h and define READ/WRITE constants in terms of
BIO_RW_* flags").
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It was a now abandoned attempt to throttle resync bandwidth
based on the delay it causes on the bulk data socket.
It has no userbase yet, and has been disabled by
9173465ccb51c09cc3102a10af93e9f469a0af6f already.
This removes the now unused code.
The basic feature, namely using up "idle" bandwith
of network and disk IO subsystem, with minimal impact
to application IO, is being reimplemented differently.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.
This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.
This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.
We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.
Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
this change as well.
This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch simplifies bdi code a little by removing the 'pending_list' which is
redundant. Indeed, currently the forker thread ('bdi_forker_thread()') is
working like this:
1. In a loop, fetch all bdi's which have works but have no writeback thread and
move them to the 'pending_list'.
2. If the list is empty, sleep for 5 sec.
3. Otherwise, take one bdi from the list, fork the writeback thread for this
bdi, and repeat the loop.
IOW, it first moves everything to the 'pending_list', then process only one
element, and so on. This patch simplifies the algorithm, which is now as
follows.
1. Find the first bdi which has a work and remove it from the global list of
bdi's (bdi_list).
2. If there was not such bdi, sleep 5 sec.
3. Fork the writeback thread for this bdi and repeat the loop.
IOW, now we find the first bdi to process, process it, and so on. This is
simpler and involves less lists.
The bonus now is that we can get rid of a couple of functions, as well as
remove complications which involve 'rcu_call()' and 'bdi->rcu_head'.
This patch also makes sure we use 'list_add_tail_rcu()', instead of plain
'list_add_tail()', but this piece of code is going to be removed in the next
patch anyway.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.
hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else. This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer
This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()' -> 'bdi_forker_thread()'
because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.
This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".
Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits. READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.
This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again. A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.
Neil also spotted missing RWA_MASK conversion.
Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-bisected-by: Vladislav Bolkhovitin <vst@vlnb.net>
Root-caused-by: Neil Brown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Filesystems can call sb_issue_discard on a memory reclaim path
(e.g. ext4 calls sb_issue_discard during journal commit).
Use GFP_NOFS in sb_issue_discard to avoid recursing back into the FS.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
block/compat_ioctl.c: In function 'compat_blkdev_ioctl':
block/compat_ioctl.c:754: error: 'BLKTRACESETUP32' undeclared (first use in this function)
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.
It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
SCSI-ml needs a way to mark a request as flush request in
q->prepare_flush_fn because it needs to identify them later (e.g. in
q->request_fn or prep_rq_fn).
queue_flush sets REQ_HARDBARRIER in rq->cmd_flags however the block
layer also sends normal REQ_TYPE_FS requests with REQ_HARDBARRIER. So
SCSI-ml can't use REQ_HARDBARRIER to identify flush requests.
We could change the block layer to clear REQ_HARDBARRIER bit before
sending non flush requests to the lower layers. However, intorudcing
the new flag looks cleaner (surely easier).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
No real bugs I believe, just some dead code, and some
shut up code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>