Commit Graph

21264 Commits

Author SHA1 Message Date
Avi Kivity e36d96f7cf KVM: Keep slot ID in memory slot structure
May be used for distinguishing between internal and user slots, or for sorting
slots in size order.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:47:07 +03:00
Avi Kivity 0719837c08 KVM: Reduce atomic operations on vcpu->requests
Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
each request generates a large number of unneeded atomics if a bit is set.

Replace with a separate test/clear sequence.  This is safe since there is
no clear_bit() outside the vcpu thread.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:47:06 +03:00
Avi Kivity a8eeb04a44 KVM: Add mini-API for vcpu->requests
Makes it a little more readable and hackable.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:47:05 +03:00
Avi Kivity a1f4d39500 KVM: Remove memory alias support
As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:47:00 +03:00
Sheng Yang 2d5b5a6655 KVM: x86: XSAVE/XRSTOR live migration support
This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01 10:46:37 +03:00
Dexuan Cui 2acf923e38 KVM: VMX: Enable XSAVE/XRSTOR for guest
This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:46:31 +03:00
Grant Likely 22ae782f86 of/address: Clean up function declarations
This patch moves the declaration of of_get_address(), of_get_pci_address(),
and of_pci_address_to_resource() out of arch code and into the common
linux/of_address header file.

This patch also fixes some of the asm/prom.h ordering issues.  It still
includes some header files that it ideally shouldn't be, but at least the
ordering is consistent now so that of_* overrides work.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-08-01 01:42:42 -06:00
Avi Kivity d94e1dc9af KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:37 +03:00
Huang Ying bf998156d2 KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]

Reported-by: Max Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:26 +03:00
Russell King 7b70c4275f Merge branch 'devel-stable' into devel
Conflicts:
	arch/arm/kernel/entry-armv.S
	arch/arm/kernel/setup.c
	arch/arm/mm/init.c
2010-07-31 14:20:16 +01:00
Russell King ceb0885d3b Merge branch 'misc' into devel
Conflicts:
	arch/arm/mm/init.c
2010-07-31 14:20:02 +01:00
Russell King b31fc7af78 Merge branches 'at91', 'ep93xx', 'kexec', 'iop', 'lmb', 'nomadik', 'nuc', 'pl', 'spear' and 'versatile' into devel 2010-07-31 14:19:35 +01:00
Russell King 7cfe249475 ARM: AMBA: Add pclk support to AMBA bus infrastructure
Some platforms gate the pclk (APB - the bus - clock) to the peripherals
for power saving, along with the functional clock.  When devices are
accessed without pclk enabled, the kernel will oops.

This gives them two options:

1. Leave all clocks on all the time.
2. Attempt to gate pclk along with the functional clock.

(With some hardware, pclk and the functional clock are gated by a single
bit in a register.)

(1) has the disadvantage that it causes increased power usage, which is
bad news for battery operated devices.  (2) can lead to kernel oops if
registers are accessed without the functional clock being enabled.

So, introduce the apb_pclk signal in such a way existing drivers don't
need to be updated.  Essentially, this means we guarantee that:

1. pclk will be enabled whenever the driver is bound to a device -
   from probe() to remove() time.
2. pclk will also be enabled when reading the primecell IDs from the device.

In order to allow drivers to be incrementally updated to achieve greater
power savings, we provide two additional calls to allow drivers to
manage the pclk - amba_pclk_enable()/amba_pclk_disable().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-31 13:07:27 +01:00
Steffen Klassert 0500e9b3f1 padata: Remove padata_get_cpumask
A function that copies the padata cpumasks to a user buffer
is a bit error prone. The cpumask can change any time so we
can't be sure to have the right cpumask when using this function.
A user who is interested in the padata cpumasks should register
to the padata cpumask notifier chain instead. Users of
padata_get_cpumask are already updated, so we can remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:06 +08:00
Steffen Klassert c635696c7c padata: Pass the padata cpumasks to the cpumask_change_notifier chain
We pass a pointer to the new padata cpumasks to the cpumask_change_notifier
chain. So users can access the cpumasks without the need of an extra
padata_get_cpumask function.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:05 +08:00
Steffen Klassert 65ff577e6b padata: Rearrange set_cpumask functions
padata_set_cpumask needs to be protected by a lock. We make
__padata_set_cpumasks unlocked and static. So this function
can be used by the exported and locked padata_set_cpumask and
padata_set_cpumasks functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:04 +08:00
Steffen Klassert e6cc117076 padata: Rename padata_alloc functions
We rename padata_alloc to padata_alloc_possible because this
function allocates a padata_instance and uses the cpu_possible
mask for parallel and serial workers. Also we rename __padata_alloc
to padata_alloc to avoid to export underlined functions. Underlined
functions are considered to be private to padata. Users are updated
accordingly.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31 19:53:04 +08:00
Samuli Konttila 60347c194a Input: cy8ctmg110 - capacitive touchscreen support
Add support for the cy8ctmg110 capacitive touchscreen used on some
embedded devices.

(Some clean up by Alan Cox)

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-31 02:37:17 -07:00
Trond Myklebust b608b283a9 NFS: kswapd must not block in nfs_release_page
See https://bugzilla.kernel.org/show_bug.cgi?id=16056

If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-07-30 15:38:42 -04:00
Trond Myklebust d3c7b7ccc1 NFSv4: Add support for the RELEASE_LOCKOWNER operation
This is needed by NFSv4.0 servers in order to keep the number of locking
stateids at a manageable level.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:46:10 -04:00
Trond Myklebust f11ac8db5d NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.
This patch fixes bugzilla entry 14501:
  https://bugzilla.kernel.org/show_bug.cgi?id=14501

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-07-30 14:41:56 -04:00
Ben Hutchings 30da552428 PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.

However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
  last MSI message written
- Use the new functions where appropriate

Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:41:39 -07:00
Narendra K 911e1c9b05 PCI: export SMBIOS provided firmware instance and label to sysfs
This patch exports SMBIOS provided firmware instance and label of
onboard PCI devices to sysfs.  New files are:
  /sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
  /sys/bus/pci/devices/.../index which contains the firmware device type
instance for the given device.

Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:36:01 -07:00
Jacob Pan 253d2e5498 PCI: disable mmio during bar sizing
It is a known issue that mmio decoding shall be disabled while doing PCI
bar sizing. Host bridge and other devices (PCI PIC) shall be excluded for
certain platforms. This patch mainly comes from Mathew Willcox's
patch in http://kerneltrap.org/mailarchive/linux-kernel/2007/9/13/258969.

A new flag bit "mmio_alway_on" is added to pci_dev with the intention that
devices with their mmio decoding cannot be disabled during BAR sizing shall
have this bit set, preferrablly in their quirks.

Without this patch, Intel Moorestown platform graphics unit will be
corrupted during bar sizing activities.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:29:35 -07:00
Anatolij Gustschin 12b15e8328 of/spi: call of_register_spi_devices() from spi core code
Move of_register_spi_devices() call from drivers to
spi_register_master(). Also change the function to use
the struct device_node pointer from master spi device
instead of passing it as function argument.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-30 00:03:59 -06:00
Grant Likely 559e2b7ee7 of: Provide default of_node_to_nid() implementation.
of_node_to_nid() is only relevant in a few architectures.  Don't force
everyone to implement it anyway.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-30 00:03:58 -06:00
Grant Likely c660122538 of/device: Make of_device_make_bus_id() usable by other code.
The AMBA bus should also use of_device_make_bus_id() when populating device
out of device tree data.  This patch makes the function non-static, and
adds a suitable prototype in of_device.h

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-30 00:03:58 -06:00
David Howells 8f92054e7c CRED: Fix __task_cred()'s lockdep check and banner comment
Fix __task_cred()'s lockdep check by removing the following validation
condition:

	lockdep_tasklist_lock_is_held()

as commit_creds() does not take the tasklist_lock, and nor do most of the
functions that call it, so this check is pointless and it can prevent
detection of the RCU lock not being held if the tasklist_lock is held.

Instead, add the following validation condition:

	task->exit_state >= 0

to permit the access if the target task is dead and therefore unable to change
its own credentials.

Fix __task_cred()'s comment to:

 (1) discard the bit that says that the caller must prevent the target task
     from being deleted.  That shouldn't need saying.

 (2) Add a comment indicating the result of __task_cred() should not be passed
     directly to get_cred(), but rather than get_task_cred() should be used
     instead.

Also put a note into the documentation to enforce this point there too.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-29 15:16:18 -07:00
David Howells de09a9771a CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.

What happens is that get_task_cred() can race with commit_creds():

	TASK_1			TASK_2			RCU_CLEANER
	-->get_task_cred(TASK_2)
	rcu_read_lock()
	__cred = __task_cred(TASK_2)
				-->commit_creds()
				old_cred = TASK_2->real_cred
				TASK_2->real_cred = ...
				put_cred(old_cred)
				  call_rcu(old_cred)
		[__cred->usage == 0]
	get_cred(__cred)
		[__cred->usage == 1]
	rcu_read_unlock()
							-->put_cred_rcu()
							[__cred->usage == 1]
							panic()

However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.

If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.

We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.

Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:

kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
 ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
 [<ffffffff810698cd>] put_cred+0x13/0x15
 [<ffffffff81069b45>] commit_creds+0x16b/0x175
 [<ffffffff8106aace>] set_current_groups+0x47/0x4e
 [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
 RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-29 15:16:17 -07:00
Stefan Richter 872e330e38 firewire: add isochronous multichannel reception
This adds the DMA context programming and userspace ABI for multichannel
reception, i.e. for listening on multiple channel numbers by means of a
single DMA context.

The use case is reception of more streams than there are IR DMA units
offered by the link layer.  This is already implemented by the older
ohci1394 + ieee1394 + raw1394 stack.  And as discussed recently on
linux1394-devel, this feature is occasionally used in practice.

The big drawbacks of this mode are that buffer layout and interrupt
generation necessarily differ from single-channel reception:  Headers
and trailers are not stripped from packets, packets are not aligned with
buffer chunks, interrupts are per buffer chunk, not per packet.

These drawbacks also cause a rather hefty code footprint to support this
rarely used OHCI-1394 feature.  (367 lines added, among them 94 lines of
added userspace ABI documentation.)

This implementation enforces that a multichannel reception context may
only listen to channels to which no single-channel context on the same
link layer is presently listening to.  OHCI-1394 would allow to overlay
single-channel contexts by the multi-channel context, but this would be
a departure from the present first-come-first-served policy of IR
context creation.

The implementation is heavily based on an earlier one by Jay Fenlason.
Thanks Jay.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-29 23:09:18 +02:00
Russell King 129961ecaf Merge branch 'wells/lpc32xx-arch_v2' of git://git.lpclinux.com/linux-2.6-lpc into devel-stable 2010-07-29 15:48:02 +01:00
Rabin Vincent bb8f563c84 ARM: 6243/1: mmci: pass power_mode to the translate_vdd callback
Platforms may have some external power control which need to be
controlled from board specific code.  Rename the translate_vdd()
callback to vdd_handler() and pass it the power mode.

Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-29 15:39:05 +01:00
Ian Campbell 685fd0b4ea irq: Add new IRQ flag IRQF_NO_SUSPEND
A small number of users of IRQF_TIMER are using it for the implied no
suspend behaviour on interrupts which are not timer interrupts.

Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to
__IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: xen-devel@lists.xensource.com
Cc: linux-input@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
Cc: devicetree-discuss@lists.ozlabs.org
LKML-Reference: <1280398595-29708-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-29 13:24:57 +02:00
Linus Torvalds 8785eb1e7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  davinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high
  regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
  wm8350-regulator: fix wm8350_register_regulator error handling
  ab3100: fix off-by-one value range checking for voltage selector
2010-07-28 19:59:55 -07:00
Eric Paris 1968f5eed5 fanotify: use both marks when possible
fanotify currently, when given a vfsmount_mark will look up (if it exists)
the corresponding inode mark.  This patch drops that lookup and uses the
mark provided.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:55 -04:00
Eric Paris ce8f76fb73 fsnotify: pass both the vfsmount mark and inode mark
should_send_event() and handle_event() will both need to look up the inode
event if they get a vfsmount event.  Lets just pass both at the same time
since we have them both after walking the lists in lockstep.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:54 -04:00
Eric Paris 02436668d9 fsnotify: remove global fsnotify groups lists
The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:54 -04:00
Eric Paris 43709a288e fsnotify: remove group->mask
group->mask is now useless.  It was originally a shortcut for fsnotify to
save on performance.  These checks are now redundant, so we remove them.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:54 -04:00
Eric Paris 03930979af fsnotify: remove the global masks
Because we walk the object->fsnotify_marks list instead of the global
fsnotify groups list we don't need the fsnotify_inode_mask and
fsnotify_vfsmount_mask as these were simply shortcuts in fsnotify() for
performance.  They are now extra checks, rip them out.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:54 -04:00
Eric Paris 3a9b16b407 fsnotify: send fsnotify_mark to groups in event handling functions
With the change of fsnotify to use srcu walking the marks list instead of
walking the global groups list we now know the mark in question.  The code can
send the mark to the group's handling functions and the groups won't have to
find those marks themselves.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:52 -04:00
Eric Paris 75c1be487a fsnotify: srcu to protect read side of inode and vfsmount locks
Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side.  This patch protects the read side of those lists
with a new single srcu.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:52 -04:00
Eric Paris 700307a29a fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
Currently fsnotify check is mark->group is NULL to decide if
fsnotify_destroy_mark() has already been called or not.  With the upcoming
rcu work it is a heck of a lot easier to use an explicit flag than worry
about group being set to NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:52 -04:00
Eric Paris 3bcf3860a4 fsnotify: store struct file not struct path
Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file.  To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:51 -04:00
Eric Paris f70ab54cc6 fsnotify: fsnotify_add_notify_event should return an event
Rather than the horrific void ** argument and such just to pass the
fanotify_merge event back to the caller of fsnotify_add_notify_event() have
those things return an event if it was different than the event suggusted to
be added.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:50 -04:00
Eric Paris 80af258867 fanotify: groups can specify their f_flags for new fd
Currently fanotify fds opened for thier listeners are done with f_flags
equal to O_RDONLY | O_LARGEFILE.  This patch instead takes f_flags from the
fanotify_init syscall and uses those when opening files in the context of
the listener.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:50 -04:00
Eric Paris 20dee624ca fsnotify: check to make sure all fsnotify bits are unique
This patch adds a check to make sure that all fsnotify bits are unique and we
cannot accidentally use the same bit for 2 different fsnotify event types.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:50 -04:00
Eric Paris f874e1ac21 inotify: force inotify and fsnotify use same bits
inotify uses bits called IN_* and fsnotify uses bits called FS_*.  These
need to line up.  This patch adds build time checks to make sure noone can
change these bits so they are not the same.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:49 -04:00
Eric Paris 8c1934c8d7 inotify: allow users to request not to recieve events on unlinked children
An inotify watch on a directory will send events for children even if those
children have been unlinked.  This patch add a new inotify flag IN_EXCL_UNLINK
which allows a watch to specificy they don't care about unlinked children.
This should fix performance problems seen by tasks which add a watch to
/tmp and then are overrun with events when other processes are reading and
writing to unlinked files they created in /tmp.

https://bugzilla.kernel.org/show_bug.cgi?id=16296

Requested-by: Matthias Clasen <mclasen@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 10:18:49 -04:00
Anuj Aggarwal 7d14831e21 regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

In TPS6507x, depending on the status of DEFDCDC{2,3} pin either
DEFDCDC{2,3}_LOW or DEFDCDC{2,3}_HIGH register needs to be read or
programmed to change the output voltage.

The current driver assumes DEFDCDC{2,3} pins are always tied low
and thus operates only on DEFDCDC{2,3}_LOW register. This need
not always be the case (as is found on OMAP-L138 EVM).

Unfortunately, software cannot read the status of DEFDCDC{2,3} pins.
So, this information is passed through platform data depending on
how the board is wired.

Signed-off-by: Anuj Aggarwal <anuj.aggarwal@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2010-07-28 15:09:26 +01:00
Eric Paris 08ae89380a fanotify: drop the useless priority argument
The priority argument in fanotify is useless.  Kill it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:03 -04:00
Eric Paris fb1cfb88c8 fsnotify: initialize mask in fsnotify_perm
akpm got a warning the fsnotify_mask could be used uninitialized in
fsnotify_perm().  It's not actually possible but his compiler complained
about it.  This patch just initializes it to 0 to shut up the compiler.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:02 -04:00
Eric Paris b2d879096a fanotify: userspace interface for permission responses
fanotify groups need to respond to events which include permissions types.
To do so groups will send a response using write() on the fanotify_fd they
have open.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:02 -04:00
Eric Paris 9e66e4233d fanotify: permissions and blocking
This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events.  This is done using the
new fsnotify secondary queue.  No userspace interface is provided actually
respond to or request these events.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:02 -04:00
Eric Paris c4ec54b40d fsnotify: new fsnotify hooks and events types for access decisions
introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code.  This hook is used to allow fsnotify groups to make access
control decisions about events on the system.  We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:01 -04:00
Dave Young d14f172948 sysctl extern cleanup: inotify
Extern declarations in sysctl.c should be move to their own head file, and
then include them in relavant .c files.

Move inotify_table extern declaration to linux/inotify.h

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:01 -04:00
Alexey Dobriyan 6e006701cc dnotify: move dir_notify_enable declaration
Move dir_notify_enable declaration to where it belongs -- dnotify.h .

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:01 -04:00
Eric Paris 59b0df211b fsnotify: use unsigned char * for dentry->d_name.name
fsnotify was using char * when it passed around the d_name.name string
internally but it is actually an unsigned char *.  This patch switches
fsnotify to use unsigned and should silence some pointer signess warnings
which have popped out of xfs.  I do not add -Wpointer-sign to the fsnotify
code as there are still issues with kstrdup and strlen which would pop
out needless warnings.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:01 -04:00
Eric Paris 6e5f77b32e fsnotify: intoduce a notification merge argument
Each group can define their own notification (and secondary_q) merge
function.  Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event.  But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue.  This patch just implements a
void ** argument which is passed to the merge function.  fanotify can use
this field to pass the new event back to higher layers.

Signed-off-by: Eric Paris <eparis@redhat.com>
for fanotify to properly deal with permissions events
2010-07-28 09:59:01 -04:00
Eric Paris cb2d429faf fsnotify: add group priorities
This introduces an ordering to fsnotify groups.  With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important.  But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.

eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.)  Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:01 -04:00
Eric Paris 4d92604cc9 fanotify: clear all fanotify marks
fanotify listeners may want to clear all marks.  They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:00 -04:00
Eric Paris c9778a98e7 fanotify: allow ignored_masks to survive modify
Some users may want to truely ignore an inode even if it has been modified.
Say you are wanting a mount which contains a log file and you really don't
want any notification about that file.  This patch allows the listener to
do that.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:00 -04:00
Eric Paris c908370fc1 fsnotify: allow ignored_mask to survive modification
Some inodes a group may want to never hear about a set of events even if
the inode is modified.  We add a new mark flag which indicates that these
marks should not have their ignored_mask cleared on modification.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:00 -04:00
Eric Paris b9e4e3bd04 fanotify: allow users to set an ignored_mask
Change the sys_fanotify_mark() system call so users can set ignored_masks
on inodes.  Remember, if a user new sets a real mask, and only sets ignored
masks, the ignore will never be pinned in memory.  Thus ignored_masks can
be lost under memory pressure and the user may again get events they
previously thought were ignored.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:00 -04:00
Eric Paris 33af5e32e0 fsnotify: ignored_mask - excluding notification
The ignored_mask is a new mask which is part of fsnotify marks.  A group's
should_send_event() function can use the ignored mask to determine that
certain events are not of interest.  In particular if a group registers a
mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
ignored_mask for individual inodes and not send open events for those
inodes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:59:00 -04:00
Eric Paris 90b1e7a578 fsnotify: allow marks to not pin inodes in core
inotify marks must pin inodes in core.  dnotify doesn't technically need to
since they are closed when the directory is closed.  fanotify also need to
pin inodes in core as it works today.  But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest.  I claim that these should be liberally sent to the
kernel and should not pin the inode in core.  If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.

This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning.  We do not pin a mark until it has a non-zero mask entry.  If a
listener new sets a mask we never pin the inode.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:59 -04:00
Andreas Gruenbacher 88380fe66e fanotify: remove fanotify.h declarations
fanotify_mark_validate functions are all needlessly declared in headers as
static inlines.  Instead just do the checks where they are needed for code
readability.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:59 -04:00
Andreas Gruenbacher eac8e9e80c fanotify: rename FAN_MARK_ON_VFSMOUNT to FAN_MARK_MOUNT
the term 'vfsmount' isn't sensicle to userspace.  instead call is 'mount.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:59 -04:00
Eric Paris 0ff21db9fc fanotify: hooks the fanotify_mark syscall to the vfsmount code
Create a new fanotify_mark flag which indicates we should attach the mark
to the vfsmount holding the object referenced by dfd and pathname rather
than the inode itself.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:59 -04:00
Eric Paris 1c529063a3 fanotify: should_send_event needs to handle vfsmounts
currently should_send_event in fanotify only cares about marks on inodes.
This patch extends that interface to indicate that it cares about events
that happened on vfsmounts.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:57 -04:00
Andreas Gruenbacher ca9c726eea fsnotify: Infrastructure for per-mount watches
Per-mount watches allow groups to listen to fsnotify events on an entire
mount.  This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:57 -04:00
Eric Paris 0d48b7f01f fsnotify: vfsmount marks generic functions
Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:57 -04:00
Andreas Gruenbacher 2504c5d63b fsnotify/vfsmount: add fsnotify fields to struct vfsmount
This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode.  They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:57 -04:00
Eric Paris 5444e2981c fsnotify: split generic and inode specific mark code
currently all marking is done by functions in inode-mark.c.  Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:57 -04:00
Andreas Gruenbacher 32c3263221 fanotify: Add pids to events
Pass the process identifiers of the triggering processes to fanotify
listeners: this information is useful for event filtering and logging.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:56 -04:00
Eric Paris a1014f1023 fanotify: send events using read
Send events to userspace by reading the file descriptor from fanotify_init().
One will get blocks of data which look like:

struct fanotify_event_metadata {
	__u32 event_len;
	__u32 vers;
	__s32 fd;
	__u64 mask;
	__s64 pid;
	__u64 cookie;
} __attribute__ ((packed));

Simple code to retrieve and deal with events is below

	while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
		struct fanotify_event_metadata *metadata;

		metadata = (void *)buf;
		while(FAN_EVENT_OK(metadata, len)) {
			[PROCESS HERE!!]
			if (metadata->fd >= 0 && close(metadata->fd) != 0)
				goto fail;
			metadata = FAN_EVENT_NEXT(metadata, len);
		}
	}

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:56 -04:00
Eric Paris 2a3edf8604 fanotify: fanotify_mark syscall implementation
NAME
	fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object

SYNOPSIS
	int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
			  int dfd, const char *pathname)

DESCRIPTION
	fanotify_mark() is used to add remove or modify a mark on a filesystem
	object.  Marks are used to indicate that the fanotify group is
	interested in events which occur on that object.  At this point in
	time marks may only be added to files and directories.

	fanotify_fd must be a file descriptor returned by fanotify_init()

	The flags field must contain exactly one of the following:

	FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
	FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
		from the mark

	The following values can be OR'd into the flags field:

	FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
	FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)

	dfd may be any of the following:
	AT_FDCWD: the object will be lookup up based on pathname similar
		to open(2)

	file descriptor of a directory: if pathname is not NULL the
		object to modify will be lookup up similar to openat(2)

	file descriptor of the final object: if pathname is NULL the
		object to modify will be the object referenced by dfd

	The mask is the bitwise OR of the set of events of interest such as:
	FAN_ACCESS		- object was accessed (read)
	FAN_MODIFY		- object was modified (write)
	FAN_CLOSE_WRITE		- object was writable and was closed
	FAN_CLOSE_NOWRITE	- object was read only and was closed
	FAN_OPEN		- object was opened
	FAN_EVENT_ON_CHILD	- interested in objected that happen to
				  children.  Only relavent when the object
				  is a directory
	FAN_Q_OVERFLOW		- event queue overflowed (not implemented)

RETURN VALUE
	On success, this system call returns 0. On error, -1 is
	returned, and errno is set to indicate the error.

ERRORS
	EINVAL An invalid value was specified in flags.

	EINVAL An invalid value was specified in mask.

	EINVAL An invalid value was specified in ignored_mask.

	EINVAL fanotify_fd is not a file descriptor as returned by
	fanotify_init()

	EBADF fanotify_fd is not a valid file descriptor

	EBADF dfd is not a valid file descriptor and path is NULL.

	ENOTDIR dfd is not a directory and path is not NULL

	EACCESS no search permissions on some part of the path

	ENENT file not found

	ENOMEM Insufficient kernel memory is available.

CONFORMING TO
	These system calls are Linux-specific.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:56 -04:00
Eric Paris bbaa4168b2 fanotify: sys_fanotify_mark declartion
This patch simply declares the new sys_fanotify_mark syscall

int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
		  int dfd const char *pathname)

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:55 -04:00
Eric Paris 52c923dd07 fanotify: fanotify_init syscall implementation
NAME
	fanotify_init - initialize an fanotify group

SYNOPSIS
	int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);

DESCRIPTION
	fanotify_init() initializes a new fanotify instance and returns a file
	descriptor associated with the new fanotify event queue.

	The following values can be OR'd into the flags field:

	FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
		Using this flag saves extra calls to fcntl(2) to achieve the same
		result.

	FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
		See the description of the O_CLOEXEC flag in open(2) for reasons why
		this may be useful.

	The event_f_flags argument is unused and must be set to 0

	The priority argument is unused and must be set to 0

RETURN VALUE
	On success, this system call return a new file descriptor. On error, -1 is
	returned, and errno is set to indicate the error.

ERRORS
	EINVAL An invalid value was specified in flags.

	EINVAL A non-zero valid was passed in event_f_flags or in priority

	ENFILE The system limit on the total number of file descriptors has been reached.

	ENOMEM Insufficient kernel memory is available.

CONFORMING TO
	These system calls are Linux-specific.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:55 -04:00
Eric Paris 11637e4b7d fanotify: fanotify_init syscall declaration
This patch defines a new syscall fanotify_init() of the form:

int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
		      unsigned int priority)

This syscall is used to create and fanotify group.  This is very similar to
the inotify_init() syscall.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:55 -04:00
Eric Paris ff0b16a985 fanotify: fscking all notification system
fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question.  This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification.  These are useful for on access scanners or hierachical storage
management schemes.

This patch just implements the basics of the fsnotify functions.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:54 -04:00
Signed-off-by: Wu Fengguang 12ed2e36c9 fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict
sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
something unique.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:54 -04:00
Eric Paris ecf081d1a7 vfs: introduce FMODE_NONOTIFY
This is a new f_mode which can only be set by the kernel.  It indicates
that the fd was opened by fanotify and should not cause future fanotify
events.  This is needed to prevent fanotify livelock.  An example of
obvious livelock is from fanotify close events.

Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?

The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify.  Thus when that file is used it will not generate future
events.

This patch simply defines the bit.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:54 -04:00
Eric Paris 841bdc10f5 fsnotify: rename mark_entry to just mark
previously I used mark_entry when talking about marks on inodes.  The
_entry is pretty useless.  Just use "mark" instead.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Eric Paris d07754412f fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark
the _entry portion of fsnotify functions is useless.  Drop it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Eric Paris e61ce86737 fsnotify: rename fsnotify_mark_entry to just fsnotify_mark
The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Andreas Gruenbacher 72acc85442 fsnotify: kill FSNOTIFY_EVENT_FILE
Some fsnotify operations send a struct file.  This is more information than
we technically need.  We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:53 -04:00
Eric Paris 098cf2fc77 fsnotify: add flags to fsnotify_mark_entries
To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris 4136510dd6 fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union
vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris 2823e04de4 fsnotify: put inode specific fields in an fsnotify_mark in a union
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy.  This patch just implements the
inode struct and the union.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris 3a9fb89f4c fsnotify: include vfsmount in should_send_event when appropriate
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases.  We pass a vfsmount to this function so groups can make
their own determinations.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris 7131485a93 fsnotify: mount point listeners list and global mask
currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.)  This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point.  This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.

The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode.  With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation.   By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock.  vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:52 -04:00
Eric Paris 19c2a0e1a2 fsnotify: rename fsnotify_groups to fsnotify_inode_groups
Simple renaming patch.  fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris 0d2e2a1d00 fsnotify: drop mask argument from fsnotify_alloc_group
Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:51 -04:00
Eric Paris ffab83402f fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris 74be0cc828 fsnotify: remove group_num altogether
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added.  I no longer think this is a
necessary thing to do and so we remove the group_num.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:50 -04:00
Eric Paris 1201a5361b fsnotify: replace an event on a list
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event.  This patch implements the replace functionality of that
process.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris b4e4e14073 fsnotify: clone existing events
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller.  Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created.  Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris 74766bbfa9 fsnotify: per group notification queue merge types
inotify only wishes to merge a new event with the last event on the
notification fifo.  fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together.  This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code.  This allows each use of fsnotify to provide
their own merge functionality.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:49 -04:00
Eric Paris 28c60e37f8 fsnotify: send struct file when sending events to parents when possible
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:48 -04:00
Eric Paris 2a12a9d781 fsnotify: pass a file instead of an inode to open, read, and write
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process.  Close was the only operation that already was passing
a struct file to the notification hook.  This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:32 -04:00
Eric Paris 8112e2d6a7 fsnotify: include data in should_send calls
fanotify is going to need to look at file->private_data to know if an event
should be sent or not.  This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 7b0a04fbfb fsnotify: provide the data type to should_send_event
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener.  Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 2dfc1cae4c inotify: remove inotify in kernel interface
nothing uses inotify in the kernel, drop it!

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:31 -04:00
Eric Paris 28a3a7eb3b audit: reimplement audit_trees using fsnotify rather than inotify
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris 40554c3dae fsnotify: allow addition of duplicate fsnotify marks
This patch allows a task to add a second fsnotify mark to an inode for the
same group.  This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function.   This
is useful if a user wants to add a new mark before removing the old one.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris 9e1c74321d fsnotify: duplicate fsnotify_mark_entry data between 2 marks
Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:17 -04:00
Eric Paris e9fd702a58 audit: convert audit watches to use fsnotify instead of inotify
Audit currently uses inotify to pin inodes in core and to detect when
watched inodes are deleted or unmounted.  This patch uses fsnotify instead
of inotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28 09:58:16 -04:00
Sridhar Samudrala d7926ee38f cgroups: Add an API to attach a task to current task's cgroup
Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2010-07-28 15:45:12 +03:00
Vinod Koul b3c567e474 intel_mid: Add Mrst & Mfld DMA Drivers
This patch add DMA drivers for DMA controllers in Langwell chipset
of Intel(R) Moorestown platform and DMA controllers in Penwell of
Intel(R) Medfield platfrom

This patch adds support for Moorestown DMAC1 and DMAC2 controllers.
It also add support for Medfiled GP DMA and DMAC1 controllers.
These controllers supports memory to peripheral and peripheral to
memory transfers. It support only single block transfers.

This driver is based on Kernel DMA engine
Anyone who wishes to use this controller should use DMA engine APIs

This controller exposes DMA_SLAVE capabilities and notifies the client drivers
of DMA transaction completion

Config option required to be enabled CONFIG_INTEL_MID_DMAC=y

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-07-27 23:32:57 -07:00
David S. Miller bb7e95c8fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c

Merge bnx2x bug fixes in by hand... :-/

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-27 21:01:35 -07:00
Richard Röjfors 94fe8c683c ks8842: Support DMA when accessed via timberdale
This patch adds support for RX and TX DMA via the DMA API,
this is only supported when the KS8842 is accessed via timberdale.

There is no support for DMA on the generic bus interface it self,
a state machine inside the FPGA is handling RX and TX transfers to/from
buffers in the FPGA. The host CPU can do DMA to and from these buffers.

The FPGA has to handle the RX interrupts, so these must be enabled in
the ks8842 but not in the FPGA. The driver must not disable the RX interrupt
that would mean that the data transfers into the FPGA buffers would stop.

The host shall not enable TX interrupts since TX is handled by the FPGA,
the host is notified by DMA callbacks when transfers are finished.

Which DMA channels to use are added as parameters in the platform data struct.

Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-27 20:48:19 -07:00
Linus Torvalds a376bca610 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  s2io: fixing DBG_PRINT() macro
  ath9k: fix dma direction for map/unmap in ath_rx_tasklet
  net: dev_forward_skb should call nf_reset
  net sched: fix race in mirred device removal
  tun: avoid BUG, dump packet on GSO errors
  bonding: set device in RLB ARP packet handler
  wimax/i2400m: Add PID & VID for Intel WiMAX 6250
  ipv6: Don't add routes to ipv6 disabled interfaces.
  net: Fix skb_copy_expand() handling of ->csum_start
  net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c
  macvtap: Limit packet queue length
  ixgbe/igb: catch invalid VF settings
  bnx2x: Advance a module version
  bnx2x: Protect statistics ramrod and sequence number
  bnx2x: Protect a SM state change
  wireless: use netif_rx_ni in ieee80211_send_layer2_update
2010-07-27 09:21:00 -07:00
Joerg Roedel 7a42c4ff02 Merge branches 'iommu-api/2.6.36' and 'amd-iommu/2.6.36' into iommu/2.6.36 2010-07-27 18:19:32 +02:00
Christoph Hellwig 552ef8024f direct-io: move aio_complete into ->end_io
Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27 11:56:06 -04:00
Theodore Ts'o 47def82672 jbd2: Remove __GFP_NOFAIL from jbd2 layer
__GFP_NOFAIL is going away, so add our own retry loop.  Also add
jbd2__journal_start() and jbd2__journal_restart() which take a gfp
mask, so that file systems can optionally (re)start transaction
handles using GFP_KERNEL.  If they do this, then they need to be
prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
to reflect that error up to userspace.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27 11:56:05 -04:00
John Stultz 852db46d55 clocksource: Add __clocksource_updatefreq_hz/khz methods
To properly handle clocksources that change frequencies
at the clocksource->enable() point, this patch adds
a method that will update the clocksource's mult/shift and
max_idle_ns values.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-12-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz 0fb86b0629 timekeeping: Make xtime and wall_to_monotonic static
This patch makes xtime and wall_to_monotonic static, as planned in
Documentation/feature-removal-schedule.txt. This will allow for
further cleanups to the timekeeping core.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-10-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz 8ab4351a4c hrtimer: Cleanup direct access to wall_to_monotonic
Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.

This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
John Stultz 7615856ebf timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:54 +02:00
John Stultz ce3bf7ab22 time: Implement timespec_add
After accidentally misusing timespec_add_safe, I wanted to make sure
we don't accidently trip over that issue again, so I created a simple
timespec_add() function which we can use to replace the instances
of timespec_add_safe() that don't want the overflow detection.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:53 +02:00
Linus Walleij ac3e3fb424 ARM: 6158/2: PL011 baudrate extension for ST-Ericssons derivative
Implementation of the ST-Ericsson baudrate extension in the PL011
block. In this modified variant it is possible to change the
sampling factor from 16 to 8, and thanks to this we can get higher
baudrates while still using the same peripheral clock.

Also replace the simple division to determine the baud divisor
with DIV_ROUND_CLOSEST() rather than a simple integer division.

Cc: Alessandro Rubini <rubini@unipv.it>
Cc: Jerzy Kasenberg <jerzy.kasenberg@tieto.com>
Signed-off-by: Marcin Mielczarczyk <marcin.mielczarczyk@tieto.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-27 10:43:47 +01:00
Linus Walleij ec489aa8f9 ARM: 6157/2: PL011 TX/RX split of LCR for ST-Ericssons derivative
In the ST-Ericsson version of the PL011 the TX and RX have different
control registers.

Cc: Alessandro Rubini <rubini@unipv.it>
Signed-off-by: Marcin Mielczarczyk <marcin.mielczarczyk@tieto.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-27 10:43:47 +01:00
Russell King 98864ff58d ARM: OMAP: Convert OMAPFB and VRAM SDRAM reservation to LMB
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-27 08:48:23 +01:00
Christoph Hellwig 40e2e97316 direct-io: move aio_complete into ->end_io
Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-07-26 16:09:02 -05:00
Xiaolong Chen ba9f507a1b Input: adp5588-keys - export unused GPIO pins
This patch allows exporting GPIO pins not used by the keypad itself
to be accessible from elsewhere.

Signed-off-by: Xiaolong Chen <xiao-long.chen@motorola.com>
Signed-off-by: Yuanbo Ye <yuan-bo.ye@motorola.com>
Signed-off-by: Tao Hu <taohu@motorola.com>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-26 01:17:41 -07:00
Ryusuke Konishi 89c0fd014d nilfs2: reject filesystem with unsupported block size
This inserts sanity check that refuses to mount a filesystem with
unsupported block size.

Previously, kernel code of nilfs was looking only limitation of
devices though mkfs.nilfs2 limits the range of block sizes; there was
no check that prevents rec_len overflow with larger block sizes.

With this change, block sizes larger than 64KB or smaller than 1KB
will get rejected explicitly by kernel.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-25 23:29:21 +09:00
Ryusuke Konishi 6cda9fa257 nilfs2: avoid rec_len overflow with 64KB block size
With 64KB blocksize, a directory entry can have size 64KB which does
not fit into 16 bits we have for entry length.  So this patch stores
0xffff instead and converts value when read from / written to disk.

Nilfs derives its directory implementation from ext2 filesystem, and
this draws upon the corresponding change on ext2.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-25 20:46:43 +09:00
Eric Dumazet fed66381d6 net: pskb_expand_head() optimization
Move frags[] at the end of struct skb_shared_info, and make
pskb_expand_head() copy only the used part of it instead of whole array.

This should avoid kmemcheck warnings and speedup pskb_expand_head() as
well, avoiding a lot of cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 21:05:57 -07:00
Stefan Assmann c1f79426e2 sysfs: add attribute to indicate hw address assignment type
Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.

For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.

This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 20:49:29 -07:00
Rafael J. Wysocki 72ad5d77fb ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-24 23:26:09 -04:00
Jonas Bonn c0dd394ca5 of: remove of_default_bus_ids
This list used was by only two platforms with all other platforms defining an
own list of valid bus id's to pass to of_platform_bus_probe.  This patch:

i)   copies the default list to the two platforms that depended on it (powerpc)
ii)  remove the usage of of_default_bus_ids in of_platform_bus_probe
iii) removes the definition of the list from all architectures that defined it

Passing a NULL 'matches' parameter to of_platform_bus_probe is still valid; the
function returns no error in that case as the NULL value is equivalent to an
empty list.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
[grant.likely@secretlab.ca: added __initdata annotations, warn on and return error on missing match table, and fix whitespace errors]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-24 09:58:22 -06:00
Grant Likely 94a0cb1fc6 of/device: Replace of_device with platform_device in includes and core code
of_device is currently just an #define alias to platform_device until it
gets removed entirely.  This patch removes references to it from the
include directories and the core drivers/of code.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24 09:58:21 -06:00
Grant Likely 2959604296 of: remove asm/of_device.h
It is mostly unused now.  Sparc has a few defines left in it, but they
can be moved to other headers.  Removing this header means that new
architectures adding CONFIG_OF support don't need to also add this
header file.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24 09:57:52 -06:00
Grant Likely 129ac799ad of: remove asm/of_platform.h
Only thing left in it is of_instantiate_rtc() which can be moved to
asm/prom.h on PowerPC and is unused in microblaze.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24 09:57:52 -06:00
Grant Likely 1ab1d63a85 of/platform: remove all of_bus_type and of_platform_bus_type references
Both of_bus_type and of_platform_bus_type are just #define aliases
for the platform bus.  This patch removes all references to them and
switches to the of_register_platform_driver()/of_unregister_platform_driver()
API for registering.

Subsequent patches will convert each user of of_register_platform_driver()
into plain platform_drivers without the of_platform_driver shim.  At which
point the of_register_platform_driver()/of_unregister_platform_driver()
functions can be removed.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24 09:57:52 -06:00
Grant Likely eca3930163 of: Merge of_platform_bus_type with platform_bus_type
of_platform_bus was being used in the same manner as the platform_bus.
The only difference being that of_platform_bus devices are generated
from data in the device tree, and platform_bus devices are usually
statically allocated in platform code.  Having them separate causes
the problem of device drivers having to be registered twice if it
was possible for the same device to appear on either bus.

This patch removes of_platform_bus_type and registers all of_platform
bus devices and drivers on the platform bus instead.  A previous patch
made the of_device structure an alias for the platform_device structure,
and a shim is used to adapt of_platform_drivers to the platform bus.

After all of of_platform_bus drivers are converted to be normal platform
drivers, the shim code can be removed.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24 09:57:51 -06:00
Grant Likely 4e4f62bf73 Merge commit 'v2.6.35-rc6' into devicetree/next
Conflicts:
	arch/sparc/kernel/prom_64.c
2010-07-24 09:49:13 -06:00
Patrick Pannuto 22b8f15c2f timer: Added usleep[_range] timer
usleep[_range] are finer precision implementations of msleep
and are designed to be drop-in replacements for udelay where
a precise sleep / busy-wait is unnecessary. They also allow
an easy interface to specify slack when a precise (ish)
wakeup is unnecessary to help minimize wakeups

Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: akinobu.mita@gmail.com
Cc: sboyd@codeaurora.org
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <4C44CDD2.1070708@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 15:08:12 +02:00
Changli Gao 49daf6a226 xt_quota: report initial quota value instead of current value to userspace
We should copy the initial value to userspace for iptables-save and
to allow removal of specific quota rules.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23 14:07:47 +02:00
Stefan Richter 8e2b2b46ea firewire: cdev: improve FW_CDEV_IOC_ALLOCATE
In both the ieee1394 stack and the firewire stack, the core treats
kernelspace drivers better than userspace drivers when it comes to
CSR address range allocation:  The former may request a register to be
placed automatically at a free spot anywhere inside a specified address
range.  The latter may only request a register at a fixed offset.

Hence, userspace drivers which do not require a fixed offset potentially
need to implement a retry loop with incremented offset in each retry
until the kernel does not fail allocation with EBUSY.  This awkward
procedure is not fundamentally necessary as the core already provides a
superior allocation API to kernelspace drivers.

Therefore change the ioctl() ABI by addition of a region_end member in
the existing struct fw_cdev_allocate.  Userspace and kernelspace APIs
work the same way now.

There is a small cost to pay by clients though:  If client source code
is required to compile with older kernel headers too, then any use of
the new member fw_cdev_allocate.region_end needs to be enclosed by
#ifdef/#endif directives.  However, any client program that seriously
wants to use address range allocations will require a kernel of cdev ABI
version >= 4 at runtime and a linux/firewire-cdev.h header of >= 4
anyway.  This is because v4 brings FW_CDEV_EVENT_REQUEST2.  The only
client program in which build-time compatibility with struct
fw_cdev_allocate as found in older kernel headers makes sense is
libraw1394.

(libraw1394 uses the older broken FW_CDEV_EVENT_REQUEST to implement a
makeshift, incorrect transaction responder that does at least work
somewhat in many simple scenarios, relying on guesswork by libraw1394
and by libraw1394 based applications.  Plus, address range allocation
and transaction responder is only one of many features that libraw1394
needs to provide, and these other features need to work with kernel and
kernel-headers as old as possible.  Any new linux/firewire-cdev.h based
client that implements a transaction responder should never attempt to
do it like libraw1394;  instead it should make a header and kernel of v4
or later a hard requirement.)

While we are at it, update the struct fw_cdev_allocate documentation to
better reflect the recent fw_cdev_event_request2 ABI addition.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:28 +02:00
Stefan Richter cc550216ae firewire: cdev: add PHY pinging
This extends the FW_CDEV_IOC_SEND_PHY_PACKET ioctl() for /dev/fw* to be
useful for ping time measurements.  One application for it would be gap
count optimization in userspace that is based on ping times rather than
hop count.  (The latter is implemented in firewire-core itself but is
not applicable to beta PHYs that act as repeater.)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:28 +02:00
Stefan Richter bf54e1462b firewire: cdev: add PHY packet reception
Add an FW_CDEV_IOC_RECEIVE_PHY_PACKETS ioctl() and
FW_CDEV_EVENT_PHY_PACKET_RECEIVED poll()/read() event for /dev/fw*.
This can be used to get information from remote PHYs by remote access
PHY packets.

This is also the 2nd half of the functionality (the receive part) to
support a userspace implementation of a VersaPHY transaction layer.

Safety considerations:

  - PHY packets are generally broadcasts, hence some kind of elevated
    privileges should be required of a process to be able to listen in
    on PHY packets.  This implementation assumes that a process that is
    allowed to open the /dev/fw* of a local node does have this
    privilege.

    There was an inconclusive discussion about introducing POSIX
    capabilities as a means to check for user privileges for these
    kinds of operations.

Other limitations:

  - PHY packet reception may be switched on by ioctl() but cannot be
    switched off again.  It would be trivial to provide an off switch,
    but this is not worth the code.  The client should simply close()
    the fd then, or just ignore further events.

  - For sake of simplicity of API and kernel-side implementation, no
    filter per packet content is provided.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:28 +02:00
Stefan Richter 850bb6f23b firewire: cdev: add PHY packet transmission
Add an FW_CDEV_IOC_SEND_PHY_PACKET ioctl() for /dev/fw* which can be
used to implement bus management related functionality in userspace.

This is also half of the functionality (the transmit part) that is
needed to support a userspace implementation of a VersaPHY transaction
layer.

Safety considerations:

  - PHY packets are generally broadcasts and may have interesting
    effects on PHYs and the bus, e.g. make asynchronous arbitration
    impossible due to too low gap count.  Hence some kind of elevated
    privileges should be required of a process to be able to send
    PHY packets.  This implementation assumes that a process that is
    allowed to open the /dev/fw* of a local node does have this
    privilege.

    There was an inconclusive discussion about introducing POSIX
    capabilities as a means to check for user privileges for these
    kinds of operations.

  - The kernel does not check integrity of the supplied packet data.
    That would be far too much code, considering the many kinds of
    PHY packets.  A process which got the privilege to send these
    packets is trusted to do it correctly.

Just like with the other "send packet" ioctls, a non-blocking API is
chosen; i.e. the ioctl may return even before AT DMA started.  After
transmission, an event for poll()/read() is enqueued.  Most users are
going to need a blocking API, but a blocking userspace wrapper is easy
to implement, and the second of the two existing libraw1394 calls
raw1394_phy_packet_write() and raw1394_start_phy_packet_write() can be
better supported that way.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:28 +02:00
Stefan Richter d505e6e871 firewire: cdev: some clarifications to the API documentation
Response events:
  - are generated on more occasions than their documentation claimed.

CSR allocation:
  - An already occupied CSR can be determined from errno==EBUSY.

Bus resets:
  - Note that FW_CDEV_IOC_INITIATE_BUS_RESET is nonblocking and that the
    client is not required to observe a grace period since kernels
    2.6.36+ will enforce it now (commit 02d37bed).

  - The possible values of fw_cdev_initiate_bus_reset.type are listed in
    the kerneldoc comment already.

  - Clarify that an application that uses FW_CDEV_IOC_ADD_DESCRIPTOR and
    FW_CDEV_IOC_REMOVE_DESCRIPTOR does not have to issue a bus reset.

Isochronous I/O contexts:
  - At most one can be created per open file descriptor.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:27 +02:00
Stefan Richter 18d0cdfd1a firewire: normalize status values in packet callbacks
core-transaction.c transmit_complete_callback() and close_transaction()
expect packet callback status to be an ACK or RCODE, and ACKs get
translated to RCODEs for transaction callbacks.

An old comment on the packet callback API (been there from the initial
submission of the stack) and the dummy_driver implementation of
send_request/send_response deviated from this as they also included
-ERRNO in the range of status values.

Let's narrow status values down to ACK and RCODE to prevent surprises.
RCODE_CANCELLED is chosen as the dummy_driver's RCODE as its meaning of
"transaction timed out" comes closest to what happens when a transaction
coincides with card removal.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-23 13:36:27 +02:00
Tejun Heo 181a51f6e0 slow-work: kill it
slow-work doesn't have any user left.  Kill it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-23 13:14:49 +02:00
Eric Dumazet e8648a1fdb netfilter: add xt_cpu match
In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.

With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)

Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.

Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.

Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
        -j REDIRECT --to-port 8080

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
        -j REDIRECT --to-port 8081

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
        -j REDIRECT --to-port 8082

iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
        -j REDIRECT --to-port 8083

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23 12:59:36 +02:00
Jan Kara 43d2932d88 quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
Quota code never touches file data. It just modifies i_blocks + i_bytes
of inodes and inode flags of quota files. So use mark_inode_dirty_sync
instead of mark_inode_dirty.

Signed-off-by: Jan Kara <jack@suse.cz>
2010-07-23 12:50:46 +02:00
Hannes Eder 9c3e1c3967 netfilter: xt_ipvs (netfilter matcher for IPVS)
This implements the kernel-space side of the netfilter matcher xt_ipvs.

[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
[ Patrick: added xt_ipvs.h to Kbuild ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23 12:42:58 +02:00
Ingo Molnar 3a01736e70 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-07-23 09:10:29 +02:00
Ryusuke Konishi 1a80a1763f nilfs2: add feature set fields to super block
This adds three new fields to nilfs_super_block structure, compatible
feature set, readonly-compatible feature set, and incompatible feature
set in order to prepare for future disk format modifications.

The role of these fields conforms to those of ext3 or other
filesystems.  Most important flags are the incompatible feature set;
it is used to refuse to mount the filesystem which sets an
incompatible feature the kernel doesn't know about.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:16 +09:00
Ryusuke Konishi 2f1b7cd29f nilfs2: clarify byte offset in super block format
This inserts comments indicating hexadecimal offset in declaration of
nilfs_super_block structure so that people can know offset of its
fields without counting from the head.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:16 +09:00
Stefano Stabellini 183d03cc4f xen: Xen PCI platform device driver.
Add the xen pci platform device driver that is responsible
for initializing the grant table and xenbus in PV on HVM mode.
Few changes to xenbus and grant table are necessary to allow the delayed
initialization in HVM mode.
Grant table needs few additional modifications to work in HVM mode.

The Xen PCI platform device raises an irq every time an event has been
delivered to us. However these interrupts are only delivered to vcpu 0.
The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
that is a little wrapper around __xen_evtchn_do_upcall, the traditional
Xen upcall handler, the very same used with traditional PV guests.

When running on HVM the event channel upcall is never called while in
progress because it is a normal Linux irq handler (and we cannot switch
the irq chip wholesale to the Xen PV ones as we are running QEMU and
might have passed in PCI devices), therefore we cannot be sure that
evtchn_upcall_pending is 0 when returning.
For this reason if evtchn_upcall_pending is set by Xen we need to loop
again on the event channels set pending otherwise we might loose some
event channel deliveries.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:46:09 -07:00
Tejun Heo d098adfb7d fscache: drop references to slow-work
fscache no longer uses slow-work.  Drop references to it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-22 22:58:58 +02:00
Tejun Heo 8af7c12436 fscache: convert operation to use workqueue instead of slow-work
Make fscache operation to use only workqueue instead of combination of
workqueue and slow-work.  FSCACHE_OP_SLOW is dropped and
FSCACHE_OP_FAST is renamed to FSCACHE_OP_ASYNC and uses newly added
fscache_op_wq workqueue to execute op->processor().
fscache_operation_init_slow() is dropped and fscache_operation_init()
now takes @processor argument directly.

* Unbound workqueue is used.

* fscache_retrieval_work() is no longer necessary as OP_ASYNC now does
  the equivalent thing.

* sysctl fscache.operation_max_active added to control concurrency.
  The default value is nr_cpus clamped between 2 and
  WQ_UNBOUND_MAX_ACTIVE.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-22 22:58:47 +02:00
Tejun Heo 8b8edefa2f fscache: convert object to use workqueue instead of slow-work
Make fscache object state transition callbacks use workqueue instead
of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
is created.  get/put callbacks are renamed and modified to take
@object and called directly from the enqueue wrapper and the work
function.  While at it, make all open coded instances of get/put to
use fscache_get/put_object().

* Unbound workqueue is used.

* work_busy() output is printed instead of slow-work flags in object
  debugging outputs.  They mean basically the same thing bit-for-bit.

* sysctl fscache.object_max_active added to control concurrency.  The
  default value is nr_cpus clamped between 4 and
  WQ_UNBOUND_MAX_ACTIVE.

* slow_work_sleep_till_thread_needed() is replaced with fscache
  private implementation fscache_object_sleep_till_congested() which
  waits on fscache_object_wq congestion.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-22 22:58:34 +02:00
Eric Dumazet 963bfeeeec net: RTA_MARK addition
Add a new rt attribute, RTA_MARK, and use it in
rt_fill_info()/inet_rtm_getroute() to support following commands :

ip route get 192.168.20.110 mark NUMBER
ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER
ip route list cache [192.168.20.110] mark NUMBER

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:46:21 -07:00
Tejun Heo e120153ddf workqueue: fix how cpu number is stored in work->data
Once a work starts execution, its data contains the cpu number it was
on instead of pointing to cwq.  This is added by commit 7a22ad75
(workqueue: carry cpu number in work data once execution starts) to
reliably determine the work was last on even if the workqueue itself
was destroyed inbetween.

Whether data points to a cwq or contains a cpu number was
distinguished by comparing the value against PAGE_OFFSET.  The
assumption was that a cpu number should be below PAGE_OFFSET while a
pointer to cwq should be above it.  However, on architectures which
use separate address spaces for user and kernel spaces, this doesn't
hold as PAGE_OFFSET is zero.

Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the
data field contains.  If the flag is set, it's pointing to a cwq;
otherwise, it contains a cpu number.

Reported on s390 and microblaze during linux-next testing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Reported-by: Michal Simek <michal.simek@petalogix.com>
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Michal Simek <monstr@monstr.eu>
2010-07-22 22:39:22 +02:00
Herbert Xu 8a35747a5d macvtap: Limit packet queue length
Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length.  This patch includes his work to change
the default macvtap TX queue length to 500.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:08:56 -07:00
Linus Torvalds e916beab22 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
  debug_core,kdb: fix kgdb_connected bit set in the wrong place
  Fix merge regression from external kdb to upstream kdb
  repair gdbstub to match the gdbserial protocol specification
  kdb: break out of kdb_ll() when command is terminated
2010-07-22 11:44:26 -07:00
Marc Kleine-Budde e955cead03 CAN: Add Flexcan CAN controller driver
This core is found on some Freescale SoCs and also some Coldfire
SoCs. Support for Coldfire is missing though at the moment as
they have an older revision of the core which does not have RX FIFO
support.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2010-07-22 18:06:25 +02:00
Jason Wessel edd63cb6b9 sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-07-21 19:27:07 -05:00
Ingo Molnar dca45ad8af Merge branch 'linus' into sched/core
Merge reason: Move from the -rc3 to the almost-rc6 base.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:45:08 +02:00
Ingo Molnar 23c2875725 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-07-21 21:44:18 +02:00
Ingo Molnar 9dcdbf7a33 Merge branch 'linus' into perf/core
Merge reason: Pick up the latest perf fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21 21:43:06 +02:00
Mike Frysinger 9849ed4d72 tracing/documentation: Document dynamic ftracer internals
Add more details to the dynamic function tracing design implementation.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
LKML-Reference: <1279610015-10250-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-21 11:00:25 -04:00
Jiaying Zhang fb5ffb0e16 quota: Change quota error message to print out disk and function name
The current quota error message doesn't always print the disk name, so
it is hard to identify the "bad" disk when quota error happens.

This patch changes the standardized quota error message to print out disk name
and function name. It also uses a combination of cpp macro and inline function
to provide better type checking and to lower the text size of the message.

[Jan Kara: Export __quota_error]

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-07-21 16:05:58 +02:00
Christoph Hellwig 4c4d390122 ext3: remove vestiges of nobh support
The nobh option was only supported for writeback mode, but given that all
write paths (except mmapped writed) actually create buffer heads, it
effectively was a no-op already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-07-21 16:01:47 +02:00
Christoph Hellwig 189eef59e7 quota: clean up quota active checks
The various quota operations check for any quota beeing active on
a superblock, and the inode not having the noquota flag.

Merge these two checks into a dquot_active check and move that
into dquot.c as that's the only place where it's needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-07-21 16:01:47 +02:00
Christoph Hellwig ade7ce31c2 quota: Clean up the namespace in dqblk_xfs.h
Almost all identifiers use the FS_* namespace, so rename the missing few
XFS_* ones to FS_* as well.  Without this some people might get upset
about having too many XFS names in generic code.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-07-21 16:01:46 +02:00
Lai Jiangshan bc289ae98b tracing: Reduce latency and remove percpu trace_seq
__print_flags() and __print_symbolic() use percpu trace_seq:

1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
2) It is percpu data and it wastes more memory for multi-cpus system.
3) It disables preemption when it executes its core routine
   "trace_seq_printf(s, "%s: ", #call);" and introduces latency.

So we move this trace_seq to struct trace_iterator.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 22:05:34 -04:00
Li Zefan e870e9a124 tracing: Allow to disable cmdline recording
We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
 -------------------------------------------------
 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
------ ------ ------ ------ ------ ------- -------
  2.19   2.3   2.21   2.56   2.13     2.54    2.07
  2.39   2.51  2.35   2.75   2.27     2.81    2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 21:52:33 -04:00
Linus Torvalds f4b23cc2d5 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/r600: fix possible NULL pointer derefernce
  drm/radeon/kms: add quirk for ASUS HD 3600 board
  include/linux/vgaarb.h: add missing part of include guard
  drm/nouveau: Fix crashes during fbcon init on single head cards.
  drm/nouveau: fix pcirom vbios shadow breakage from acpi rom patch
  drm/radeon/kms: fix shared ddc harder
  drm/i915: enable low power render writes on GEN3 hardware.
  drm/i915: Define MI_ARB_STATE bits
  vmwgfx: return -EFAULT if copy_to_user fails
  fb: handle allocation failure in alloc_apertures()
  drm: radeon: check kzalloc() result
  drm/ttm: Fix build on architectures without AGP
  drm/radeon/kms: fix gtt MC base alignment on rs4xx/rs690/rs740 asics
  drm/radeon/kms: fix possible mis-detection of sideport on rs690/rs740
  drm/radeon/kms: fix legacy tv-out pal mode
2010-07-20 18:29:25 -07:00
Doug Goldstein a6a1a095ec include/linux/vgaarb.h: add missing part of include guard
vgaarb.h was missing the #define of the #ifndef at the top for the guard
to prevent multiple #include's from causing re-define errors

Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-07-21 09:51:15 +10:00
Paul E. McKenney 844b9a8707 vfs: fix RCU-lockdep false positive due to /proc
If a single-threaded process does a file-descriptor operation, and some
other process accesses that same file descriptor via /proc, the current
rcu_dereference_check_fdtable() can give a false-positive RCU-lockdep
splat due to the reference count being increased by the /proc access after
the reference-count check in fget_light() but before the check in
rcu_dereference_check_fdtable().

This commit prevents this false positive by checking for a single-threaded
process.  To avoid #include hell, this commit uses the wrapper for
thread_group_empty(current) defined by rcu_my_thread_group_empty()
provided in a separate commit.

Located-by: Miles Lane <miles.lane@gmail.com>
Located-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:41 -07:00
Frederic Weisbecker eb7beb5c09 tracing: Remove special traces
Special traces type was only used by sysprof. Lets remove it now
that sysprof ftrace plugin has been dropped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2010-07-20 14:31:07 +02:00
Dan Carpenter 772a2f9b48 fb: handle allocation failure in alloc_apertures()
If the kzalloc() fails we should return NULL.  All the places that call
alloc_apertures() check for this already.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: James Simmons <jsimmons@infradead.org>
Acked-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-07-20 15:24:09 +10:00
David S. Miller e7c38157c6 ipv6: Make IP6CB(skb)->nhoff 16-bit.
Even with jumbograms I cannot see any way in which we would need
to records a larger than 65535 valued next-header offset.

The maximum extension header length is (256 << 3) == 2048.
There are only a handful of extension headers specified which
we'd even accept (say 5 or 6), therefore the largest next-header
offset we'd ever have to contend with is something less than
say 16k.

Therefore make it a u16 instead of a u32.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-19 22:01:26 -07:00
Eric Dumazet bd27290a59 net: 64bit stats for netdev_queue
Since struct netdev_queue tx_bytes/tx_packets/tx_dropped are already
protected by _xmit_lock, its easy to convert these fields to u64 instead
of unsigned long.
This completes 64bit stats for devices using them (vlan, macvlan, ...)

Strictly, we could avoid the locking in dev_txq_stats_fold() on 64bit
arches, but its slow path and we prefer keep it simple.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-19 09:35:40 -07:00
Tom Lyon 323f99cbc3 iommu-api: Extension to check for interrupt remapping
This patch allows IOMMU users to determine whether the
hardware and software support safe, isolated interrupt
remapping.  Not all Intel IOMMUs have the hardware, and the
software for AMD is not there yet.

Signed-off-by: Tom Lyon <pugs@cisco.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-07-19 15:44:25 +02:00
Daniel Mack 3a343ee450 HID: add HID_QUIRK_HIDINPUT_FORCE
For devices with exotic HID report descriptors, it might be necessary to
make the HID core force the registration of an input device. Make that
possible by introducing a new quirk type.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-19 11:54:16 +02:00
Davidlohr Bueso 7a2e3659b6 reiserfs: typo comment fix
Fix trivial typo in code comment (change adn for and), also change comment
style for proper coding style.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-19 11:02:51 +02:00
Dan Kruchinin 5e017dc3f8 padata: Added sysfs primitives to padata subsystem
Added sysfs primitives to padata subsystem. Now API user may
embedded kobject each padata instance contains into any sysfs
hierarchy. For now padata sysfs interface provides only
two objects:
    serial_cpumask   [RW] - cpumask for serial workers
    parallel_cpumask [RW] - cpumask for parallel workers

Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-19 13:50:19 +08:00
Dan Kruchinin e15bacbebb padata: Make two separate cpumasks
The aim of this patch is to make two separate cpumasks
for padata parallel and serial workers respectively.
It allows user to make more thin and sophisticated configurations
of padata framework. For example user may bind parallel and serial workers to non-intersecting
CPU groups to gain better performance. Also each padata instance has notifiers chain for its
cpumasks now. If either parallel or serial or both masks were changed all
interested subsystems will get notification about that. It's especially useful
if padata user uses algorithm for callback CPU selection according to serial cpumask.

Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-19 13:50:19 +08:00
Dave Chinner 7f8275d0d6 mm: add context argument to shrinker callback
The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-19 14:56:17 +10:00
Richard Cochran c1f19b51d1 net: support time stamping in phy devices.
This patch adds a new networking option to allow hardware time stamps
from PHY devices. When enabled, likely candidates among incoming and
outgoing network packets are offered to the PHY driver for possible
time stamping. When accepted by the PHY driver, incoming packets are
deferred for later delivery by the driver.

The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
and callbacks for transmit and receive time stamping. Drivers may
optionally implement these functions.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-18 19:15:26 -07:00
Richard Cochran 15f0127d1d net: added a BPF to help drivers detect PTP packets.
Certain kinds of hardware time stamping units in both MACs and PHYs have
the limitation that they can only time stamp PTP packets. Drivers for such
hardware are left with the task of correctly matching skbs to time stamps.
This patch adds a BPF that drivers can use to classify PTP packets when
needed.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-18 19:15:26 -07:00
Richard Cochran 28b041139e net: preserve ifreq parameter when calling generic phy_mii_ioctl().
The phy_mii_ioctl() function unnecessarily throws away the original ifreq.
We need access to the ifreq in order to support PHYs that can perform
hardware time stamping.

Two maverick drivers filter the ioctl commands passed to phy_mii_ioctl().
This is unnecessary since phylib will check the command in any case.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-18 19:15:25 -07:00
Richard Cochran 4507a71507 net: add driver hook for tx time stamping.
This patch adds a hook for transmit time stamps. The transmit hook
allows a software fallback for transmit time stamps, for MACs
lacking time stamping hardware. Using the hook will still require
adding an inline function call to each MAC driver.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-18 19:15:25 -07:00
Arjan van de Ven 8d4b9d1bfe PM / Runtime: Add runtime PM statistics (v3)
In order for PowerTOP to be able to report how well the new runtime PM is
working for the various drivers, the kernel needs to export some basic
statistics in sysfs.

This patch adds two sysfs files in the runtime PM domain that expose the
total time a device has been active, and the time a device has been
suspended.

With this PowerTOP can compute the activity percentage

Active %age = 100 * (delta active) / (delta active + delta suspended)

and present the information to the user.

I've written the PowerTOP code (slated for version 1.12) already, and the
output looks like this:

Runtime Device Power Management statistics
Active  Device name
 10.0%	06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller

[version 2: fix stat update bugs noticed by Alan Stern]
[version 3: rebase to -next and move the sysfs declaration]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19 02:01:06 +02:00
Rafael J. Wysocki ce4410116c PM / Suspend: Fix ordering of calls in suspend error paths
The ACPI suspend code calls suspend_nvs_free() at a wrong place,
which may lead to a memory leak if there's an error executing
acpi_pm_prepare(), because acpi_pm_finish() will not be called in
that case.  However, the root cause of this problem is the
apparently confusing ordering of calls in suspend error paths that
needs to be fixed.

In addition to that, fix a typo in a label name in suspend.c.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Len Brown <len.brown@intel.com>
2010-07-19 02:00:35 +02:00
James Bottomley 82f682514a pm_qos: Get rid of the allocation in pm_qos_add_request()
All current users of pm_qos_add_request() have the ability to supply
the memory required by the pm_qos routines, so make them do this and
eliminate the kmalloc() with pm_qos_add_request().  This has the
double benefit of making the call never fail and allowing it to be
called from atomic context.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19 02:00:34 +02:00
James Bottomley 12e4d0cc2e plist: Add plist_last
plist is currently used by the scheduler, which only needs to know the
highest item in the list.  This adds plist_last which allows you to
find the lowest.  This is necessary for using plists to implement a
fast search of dynamic ranges in pm_qos which can have both highest
and lowest criteria.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19 01:58:48 +02:00
Rafael J. Wysocki c125e96f04 PM: Make it possible to avoid races between wakeup and system sleep
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.

Generally, there are two problems in that area.  First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended.  Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.

To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.

The /sys/power/wakeup_count file may be read from or written to by
user space.  Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter.  If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.

[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count.  Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state.  Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well.  Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]

Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.

To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-19 01:58:48 +02:00
Alan Stern b14e033e17 PNPACPI: Add support for remote wakeup
This patch (as1354) adds remote-wakeup support to the pnpacpi driver.
The new can_wakeup method also allows other PNP protocol drivers
(pnpbios or iaspnp) to add wakeup support, but I don't know enough
about how they work to actually do it.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19 01:58:48 +02:00
Alan Stern 2430d12c94 PM: describe kernel policy regarding wakeup defaults (v. 2)
This patch (as1381b) updates a comment describing the kernel's policy
toward enabling wakeup by default.

It also makes device_set_wakeup_capable() actually do something when
CONFIG_PM isn't enabled.  It's not clear this is necessary; however if
it isn't then device_init_wakeup() and device_can_wakeup() should also
be do-nothing routines.  Furthermore, I don't expect this change to
have any noticeable effect -- but if it does then clearly the old
behavior was wrong.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19 01:58:48 +02:00
Linus Torvalds 2044f2282d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fall back to original BIOS BAR addresses
2010-07-18 15:05:22 -07:00
Linus Torvalds bea9a6d239 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Silence gcc warning in ocfs2_write_zero_page().
  jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
  ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
  ocfs2: Don't duplicate pages past i_size during CoW.
  ocfs2: tighten up strlen() checking
  ocfs2: Make xattr reflink work with new local alloc reservation.
  ocfs2: make xattr extension work with new local alloc reservation.
  ocfs2: Remove the redundant cpu_to_le64.
  ocfs2/dlm: don't access beyond bitmap size
  ocfs2: No need to zero pages past i_size.
  ocfs2: Zero the tail cluster when extending past i_size.
  ocfs2: When zero extending, do it by page.
  ocfs2: Limit default local alloc size within bitmap range.
  ocfs2: Move orphan scan work to ocfs2_wq.
  fs/ocfs2/dlm: Add missing spin_unlock
2010-07-18 10:09:25 -07:00
Peter Zijlstra 396e894d28 sched: Revert nohz_ratelimit() for now
Norbert reported that nohz_ratelimit() causes his laptop to burn about
4W (40%) extra. For now back out the change and see if we can adjust
the power management code to make better decisions.

Reported-by: Norbert Preining <preining@logic.at>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-17 12:05:44 +02:00
Benjamin Herrenschmidt 2f495c398e net/phy/marvell: Expose IDs and flags in a .h and add dns323 LEDs setup flag
This moves the various known Marvell PHY IDs to include/linux/marvell_phy.h
along with dev_flags definitions for use by the driver.

I then added a flag that changes the PHY init code to setup the LEDs
config to the values needed to operate a dns323 rev C1 NAS.

I moved the existing "resistance" flag to the .h as well, though I've
been unable to find whoever sets this to convert it to use that constant.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
2010-07-16 22:01:58 -04:00
Bjorn Helgaas 58c84eda07 PCI: fall back to original BIOS BAR addresses
If we fail to assign resources to a PCI BAR, this patch makes us try the
original address from BIOS rather than leaving it disabled.

Linux tries to make sure all PCI device BARs are inside the upstream
PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
Windows does similar reassignment.

Before this patch, if we could not move a BAR into an aperture, we left
the resource unassigned, i.e., at address zero.  Windows leaves such BARs
at the original BIOS addresses, and this patch makes Linux do the same.

This is a bit ugly because we disable the resource long before we try to
reassign it, so we have to keep track of the BIOS BAR address somewhere.
For lack of a better place, I put it in the struct pci_dev.

I think it would be cleaner to attempt the assignment immediately when the
claim fails, so we could easily remember the original address.  But we
currently claim motherboard resources in the middle, after attempting to
claim PCI resources and before assigning new PCI resources, and changing
that is a fairly big job.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263

Reported-by: Andrew <nitr0@seti.kr.ua>
Tested-by: Andrew <nitr0@seti.kr.ua>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-16 11:39:48 -07:00
Linus Torvalds f469461df6 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Add alignment to syscall metadata declarations
  perf: Sync callchains with period based hits
  perf: Resurrect flat callchains
  perf: Version String fix, for fallback if not from git
  perf: Version String fix, using kernel version
2010-07-16 11:26:33 -07:00
Linus Torvalds cc10b6ffd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: w90p910_ts - fix call to setup_timer()
  Input: synaptics - fix wrong dimensions check
  Input: i8042 - mark stubs in i8042.h "static inline"
2010-07-16 08:22:40 -07:00
Michael S. Tsirkin 22cb516696 netfilter: correct CHECKSUM header and export it
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-16 14:08:20 +02:00
Christoph Lameter af537b0a6c slub: Use kmem_cache flags to detect if slab is in debugging mode.
The cacheline with the flags is reachable from the hot paths after the
percpu allocator changes went in. So there is no need anymore to put a
flag into each slab page. Get rid of the SlubDebug flag and use
the flags in kmem_cache instead.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:08 +03:00
Jiri Slaby c022a0acad rlimits: implement prlimit64 syscall
This patch adds the code to support the sys_prlimit64 syscall which
modifies-and-returns the rlim values of a selected process atomically.
The first parameter, pid, being 0 means current process.

Unlike the current implementation, it is a generic interface,
architecture indepentent so that we needn't handle compat stuff
anymore. In the future, after glibc start to use this we can deprecate
sys_setrlimit and sys_getrlimit in favor to clean up the code finally.

It also adds a possibility of changing limits of other processes. We
check the user's permissions to do that and if it succeeds, the new
limits are propagated online. This is good for large scale
applications such as SAP or databases where administrators need to
change limits time by time (e.g. on crashes increase core size). And
it is unacceptable to restart the service.

For safety, all rlim users now either use accessors or doesn't need
them due to
- locking
- the fact a process was just forked and nobody else knows about it
  yet (and nobody can't thus read/write limits)
hence it is safe to modify limits now.

The limitation is that we currently stay at ulong internal
representation. So the rlim64_is_infinity check is used where value is
compared against ULONG_MAX on 32-bit which is the maximum value there.

And since internally the limits are held in struct rlimit, converters
which are used before and after do_prlimit call in sys_prlimit64 are
introduced.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:48 +02:00
Jiri Slaby 5b41535aac rlimits: redo do_setrlimit to more generic do_prlimit
It now allows also reading of limits. I.e. all read and writes will
later use this function.

It takes two parameters, new and old limits which can be both NULL.
If new is non-NULL, the value in it is set to rlimits.
If old is non-NULL, current rlimits are stored there.
If both are non-NULL, old are stored prior to setting the new ones,
atomically.
(Similar to sigaction.)

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:48 +02:00
Jiri Slaby 6a1d5e2c85 rlimits: add rlimit64 structure
Add a platform independent structure for resource limits to use with
a new prlimit64 syscall. This structure is the same which uses glibc
for 64-bit limits.

Also add corresponding infinity which is a 64-bit full of bit-ones.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:47 +02:00
Jiri Slaby 7855c35da7 rlimits: split sys_setrlimit
Create do_setrlimit from sys_setrlimit and declare do_setrlimit
in the resource header. This is the first phase to have generic
do_prlimit which allows to be called from read, write and compat
rlimits code.

The new do_setrlimit also accepts a task pointer to change the limits
of. Currently, it cannot be other than current, but this will change
with locking later.

Also pass tsk->group_leader to security_task_setrlimit to check
whether current is allowed to change rlimits of the process and not
its arbitrary thread because it makes more sense given that rlimit are
per process and not per-thread.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:46 +02:00
Jiri Slaby 5ab46b345e rlimits: add task_struct to update_rlimit_cpu
Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
2010-07-16 09:48:45 +02:00
Jiri Slaby 8fd00b4d70 rlimits: security, add task_struct to setrlimit
Add task_struct to task_setrlimit of security_operations to be able to set
rlimit of task other than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
2010-07-16 09:48:45 +02:00
Dmitry Torokhov 20da92de8e Input: change input handlers to use bool when possible
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-15 23:52:33 -07:00
Henrik Rydberg 40d007e7df Input: introduce MT event slots
With the rapidly increasing number of intelligent multi-contact and
multi-user devices, the need to send digested, filtered information
from a set of different sources within the same device is imminent.
This patch adds the concept of slots to the MT protocol. The slots
enumerate a set of identified sources, such that all MT events
can be passed independently and selectively per identified source.

The protocol works like this: Instead of sending a SYN_MT_REPORT
event immediately after the contact data, one sends an ABS_MT_SLOT
event immediately before the contact data. The input core will only
emit events for slots with modified MT events. It is assumed that
the same slot is used for the duration of an initiated contact.

Acked-by: Ping Cheng <pingc@wacom.com>
Acked-by: Chase Douglas <chase.douglas@canonical.com>
Acked-by: Rafi Rubin <rafi@seas.upenn.edu>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-15 23:52:03 -07:00
Jan Kara 13ceef099e jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
      jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
      any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback.  This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-15 15:17:47 -07:00
Michael S. Tsirkin edf0e1fb0d netfilter: add CHECKSUM target
This adds a `CHECKSUM' target, which can be used in the iptables mangle
table.

You can use this target to compute and fill in the checksum in
a packet that lacks a checksum.  This is particularly useful,
if you need to work around old applications such as dhcp clients,
that do not work well with checksum offloads, but don't want to
disable checksum offload in your device.

The problem happens in the field with virtualized applications.
For reference, see Red Hat bz 605555, as well as
http://www.spinics.net/lists/kvm/msg37660.html

Typical expected use (helps old dhclient binary running in a VM):
iptables -A POSTROUTING -t mangle -p udp --dport bootpc \
	-j CHECKSUM --checksum-fill

Includes fixes by Jan Engelhardt <jengelh@medozas.de>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-15 17:20:46 +02:00
Pablo Neira Ayuso cca5cf91c7 nfnetlink_log: do not expose NFULNL_COPY_DISABLED to user-space
This patch moves NFULNL_COPY_PACKET definition from
linux/netfilter/nfnetlink_log.h to net/netfilter/nfnetlink_log.h
since this copy mode is only for internal use.

I have also changed the value from 0x03 to 0xff. Thus, we avoid
a gap from user-space that may confuse users if we add new
copy modes in the future.

This change was introduced in:
http://www.spinics.net/lists/netfilter-devel/msg13535.html

Since this change is not included in any stable Linux kernel,
I think it's safe to make this change now. Anyway, this copy
mode does not make any sense from user-space, so this patch
should not break any existing setup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-15 11:27:41 +02:00
Joonyoung Shim 4cf51c383d Input: Add ATMEL QT602240 touchscreen driver
The chip's full name is AT42QT602240 or ATMXT224. This is a capacitive
touchscreen supporting 10-contact multitouch and using I2C interface.

Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Acked-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-07-14 21:58:52 -07:00
Andres Salomon 035ebefc73 of/sparc: move is_root_node() to of.h
Rename is_root_node() to of_node_is_root() and make it available for
all archs to use, as it's not PROM-specific.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-14 17:08:03 -06:00
Steffen Klassert 5f1a8c1bc7 padata: simplify serialization mechanism
We count the number of processed objects on a percpu basis,
so we need to go through all the percpu reorder queues to calculate
the sequence number of the next object that needs serialization.
This patch changes this to count the number of processed objects
global. So we can calculate the sequence number and the percpu
reorder queue of the next object that needs serialization without
searching through the percpu reorder queues. This avoids some
accesses to memory of foreign cpus.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14 20:29:30 +08:00
Steffen Klassert 4c87917029 padata: Check for valid padata instance on start
This patch introduces the PADATA_INVALID flag which is
checked on padata start. This will be used to mark a padata
instance as invalid, if the padata cpumask does not intersect
with the active cpumask. we change padata_start to return an
error if the PADATA_INVALID is set. Also we adapt the only
padata user, pcrypt to this change.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14 20:29:28 +08:00
Yinghai Lu 95f72d1ed4 lmb: rename to memblock
via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/

Suggested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14 17:14:00 +10:00
Stefan Richter 02d37bed18 firewire: core: integrate software-forced bus resets with bus management
Bus resets which are triggered
  - by the kernel drivers after updates of the local nodes' config ROM,
  - by userspace software via ioctl
shall be deferred until after >=2 seconds after the last bus reset.

If multiple modifications of the local nodes' config ROM happen in a row,
only a single bus reset should happen after them.

When the local node's link goes from inactive to active or vice versa,
and at the two occasions of bus resets mentioned above --- and if the
current gap count differs from 63 --- the bus reset should be preceded
by a PHY configuration packet that reaffirms the gap count.  Otherwise a
bus manager would have to reset the bus again right after that.

This is necessary to promote bus stability, e.g. leave grace periods for
allocations and reallocations of isochronous channels and bandwidth,
SBP-2 reconnections etc.; see IEEE 1394 clause 8.2.1.

This change implements all of the above by moving bus reset initiation
into a delayed work (except for bus resets which are triggered by the
bus manager workqueue job and are performed there immediately).  It
comes with a necessary addition to the card driver methods that allows
to get the current gap count from PHY registers.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-13 09:58:27 +02:00
Miklos Szeredi 2d45ba381a fuse: add retrieve request
Userspace filesystem can request data to be retrieved from the inode's
mapping.  This request is synchronous and the retrieved data is queued
as a new request.  If the write to the fuse device returns an error
then the retrieve request was not completed and a reply will not be
sent.

Only present pages are returned in the retrieve reply.  Retrieving
stops when it finds a non-present page and only data prior to that is
returned.

This request doesn't change the dirty state of pages.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-07-12 14:41:40 +02:00
Miklos Szeredi a1d75f2582 fuse: add store request
Userspace filesystem can request data to be stored in the inode's
mapping.  This request is synchronous and has no reply.  If the write
to the fuse device returns an error then the store request was not
fully completed (but may have updated some pages).

If the stored data overflows the current file size, then the size is
extended, similarly to a write(2) on the filesystem.

Pages which have been completely stored are marked uptodate.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-07-12 14:41:40 +02:00
Suresh Jayaraman 49a3df804b fscache: fix missing kerneldoc annotation
.. and make kerneldoc scripts happy.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-11 22:22:23 +02:00
Suresh Jayaraman ab0cfb928a fscache: fix a trivial typo in the comment
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-11 22:21:26 +02:00
Uwe Kleine-König b27d63d8f8 fix comment typos concerning "sequential"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-11 21:41:23 +02:00
Justin P. Mattock 69c8f52b38 fix #warning about using kernel headers in userpsace
Move the preprocessor #warning message:
warning: #warning Attempt to use kernel headers from user space,
see http://kernelnewbies.org/KernelHeaders
from kernel.h to types.h.

And also fixe the #warning message due to the preprocessor not being able to
read the web address due to it thinking it was the start of a comment.  also
remove the extra #ifndef _KERNEL_ since it's already there.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-11 21:38:56 +02:00
Ben Hutchings d77535162e net: Document that dev_get_stats() returns the given pointer
Document that dev_get_stats() returns the same stats pointer it was
given.  Remove const qualification from the returned pointer since the
caller may do what it likes with that structure.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-09 17:41:57 -07:00
Ben Hutchings 3cfde79c6c net: Get rid of rtnl_link_stats64 / net_device_stats union
In commit be1f3c2c02 "net: Enable 64-bit
net device statistics on 32-bit architectures" I redefined struct
net_device_stats so that it could be used in a union with struct
rtnl_link_stats64, avoiding the need for explicit copying or
conversion between the two.  However, this is unsafe because there is
no locking required and no lock consistently held around calls to
dev_get_stats() and use of the statistics structure it returns.

In commit 28172739f0 "net: fix 64 bit
counters on 32 bit arches" Eric Dumazet dealt with that problem by
requiring callers of dev_get_stats() to provide storage for the
result.  This means that the net_device::stats64 field and the padding
in struct net_device_stats are now redundant, so remove them.

Update the comment on net_device_ops::ndo_get_stats64 to reflect its
new usage.

Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
that is what all its callers are really using and it is no longer
going to be compatible with struct net_device_stats.

Eric Dumazet suggested the separate function for the structure
conversion.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-09 17:41:56 -07:00
Steven Rostedt 44a54f787c tracing: Add alignment to syscall metadata declarations
For some reason if we declare a static variable and then assign it
later, and the assignment contains a __attribute__((__aligned__(#))),
some versions of gcc will ignore it.

This caused the syscall meta data to not be compact in its section
and caused a kernel oops when the section was being read.

The fix for these versions of gcc seems to be to add the aligned
attribute to the declaration as well.

This fixes the BZ regression:

  https://bugzilla.kernel.org/show_bug.cgi?id=16353

Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Tested-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTinkKVmB0fpVeqUkMeqe3ZYeXJdI8xDuzJEOjYwh@mail.gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-09 15:53:04 -04:00
Kenji Kaneshige ffa71f33a8 x86, ioremap: Fix incorrect physical address handling in PAE mode
Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-09 11:42:03 -07:00
Karl Hiramoto 7313bb8f3d atm: propagate signal changes via notifier
Add notifier chain for changes in atm_dev.

Clients like br2684 will call register_atmdevice_notifier() to be notified of
changes. Drivers will call atm_dev_signal_change() to notify clients like
br2684 of the change.

On DSL and ATM devices it's usefull to have a know if you have a carrier
signal. netdevice LOWER_UP changes can be propagated to userspace via netlink
monitor.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-09 00:09:20 -07:00
Linus Torvalds c77e9e6826 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  writeback: simplify the write back thread queue
  writeback: split writeback_inodes_wb
  writeback: remove writeback_inodes_wbc
  fs-writeback: fix kernel-doc warnings
  splice: check f_mode for seekable file
  splice: direct_splice_actor() should not use pos in sd
2010-07-08 08:06:40 -07:00
Stefan Richter 250b2b6dd4 firewire: cdev: fix fw_cdev_event_bus_reset.bm_node_id
Fix an obscure ABI feature that is a bit of a hassle to implement.
However, somebody put it into the ABI, so let's fill in a sensible
value there.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-07-08 16:52:02 +02:00
Linus Torvalds 2aa72f6121 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
  NET: SB1250: Initialize .owner
  vxge: show startup message with KERN_INFO
  ll_temac: Fix missing iounmaps
  bridge: Clear IPCB before possible entry into IP stack
  bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference
  net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined
  net/ne: fix memory leak in ne_drv_probe()
  xfrm: fix xfrm by MARK logic
  virtio_net: fix oom handling on tx
  virtio_net: do not reschedule rx refill forever
  s2io: resolve statistics issues
  linux/net.h: fix kernel-doc warnings
  net: decreasing real_num_tx_queues needs to flush qdisc
  sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lock
  qlge: fix a eeh handler to not add a pending timer
  qlge: Replacing add_timer() to mod_timer()
  usbnet: Set parent device early for netdev_printk()
  net: Revert "rndis_host: Poll status channel before control channel"
  netfilter: ip6t_REJECT: fix a dst leak in ipv6 REJECT
  drivers: bluetooth: bluecard_cs.c: Fixed include error, changed to linux/io.h
  ...
2010-07-07 19:56:00 -07:00
David S. Miller 597e608a84 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-07-07 15:59:38 -07:00
Eric Dumazet 28172739f0 net: fix 64 bit counters on 32 bit arches
There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-07 14:58:56 -07:00
Artem Bityutskiy 140236b4b1 VFS: introduce s_dirty accessors
This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
'sb_mark_clean()', and 'sb_is_dirty()'. They simply
set 'sb->s_dirt' or test 'sb->s_dirt'. The plan is to make
every FS use these accessors later instead of manipulating
the 'sb->s_dirt' flag directly.

Ultimately, this change is a preparation for the periodic
superblock synchronization optimization which is about
preventing the "sync_supers" kernel thread from waking up
even if there is nothing to synchronize.

This patch does not do any functional change, just adds
accessor functions.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-06 17:32:07 -07:00
Linus Torvalds 47a716cf0c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rbtree: Undo augmented trees performance damage and regression
  x86, Calgary: Limit the max PHB number to 256
2010-07-06 17:16:09 -07:00
Chris Metcalf a2262d8a23 Merge branch 'master' into for-linus 2010-07-06 13:45:24 -04:00
Chris Metcalf de5d9bf654 Move list types from <linux/list.h> to <linux/types.h>.
This allows a list_head (or hlist_head, etc.) to be used from places
that used to be impractical, in particular <asm/processor.h>, which
used to cause include file recursion: <linux/list.h> includes
<linux/prefetch.h>, which always includes <asm/processor.h> for the
prefetch macros, as well as <asm/system.h>, which often includes
<asm/processor.h> directly or indirectly.

This avoids a lot of painful workaround hackery on the tile
architecture, where we use a list_head in the thread_struct to chain
together tasks that are activated on a particular hardwall.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
2010-07-06 13:33:54 -04:00
Artem Bityutskiy 8eab945c56 sunrpc: make the cache cleaner workqueue deferrable
This patch makes the cache_cleaner workqueue deferrable, to prevent
unnecessary system wake-ups, which is very important for embedded
battery-powered devices.

do_cache_clean() is called every 30 seconds at the moment, and often
makes the system wake up from its power-save sleep state. With this
change, when the workqueue uses a deferrable timer, the
do_cache_clean() invocation will be delayed and combined with the
closest "real" wake-up. This improves the power consumption situation.

Note, I tried to create a DECLARE_DELAYED_WORK_DEFERRABLE() helper
macro, similar to DECLARE_DELAYED_WORK(), but failed because of the
way the timer wheel core stores the deferrable flag (it is the
LSBit in the time->base pointer). My attempt to define a static
variable with this bit set ended up with the "initializer element is
not constant" error.

Thus, I have to use run-time initialization, so I created a new
cache_initialize() function which is called once when sunrpc is
being initialized.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-06 12:27:48 -04:00
Christoph Hellwig 83ba7b071f writeback: simplify the write back thread queue
First remove items from work_list as soon as we start working on them.  This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished.  Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly.  Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work.  Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there.  Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06 08:59:53 +02:00
Christoph Hellwig edadfb10ba writeback: split writeback_inodes_wb
The case where we have a superblock doesn't require a loop here as we scan
over all inodes in writeback_sb_inodes. Split it out into a separate helper
to make the code simpler.  This also allows to get rid of the sb member in
struct writeback_control, which was rather out of place there.

Also update the comments in writeback_sb_inodes that explain the handling
of inodes from wrong superblocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06 08:54:08 +02:00
Christoph Hellwig 9c3a8ee8a1 writeback: remove writeback_inodes_wbc
This was just an odd wrapper around writeback_inodes_wb.  Removing this
also allows to get rid of the bdi member of struct writeback_control
which was rather out of place there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06 08:54:03 +02:00
Ben Hutchings bcfcc450ba net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined
netif_vdbg() was originally defined as entirely equivalent to
netdev_vdbg(), but I assume that it was intended to take the same
parameters as netif_dbg() etc.  (Currently it is only used by the
sfc driver, in which I worked on that assumption.)

In commit a4ed89c I changed the definition used when VERBOSE_DEBUG is
not defined, but I failed to notice that the definition used when
VERBOSE_DEBUG is defined was also not as I expected.  Change that to
match netif_dbg() as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-05 20:08:05 -07:00
Grant Likely 9fd049927c of/i2c: Generalize OF support
This patch cleans up the i2c OF support code to make it selectable by
all architectures and allow for automatic registration of i2c devices.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-05 16:14:52 -06:00
Grant Likely f9f5a4669f of/device: Move struct of_device define outside of CONFIG_OF_DEVICE test
Some code uses of_device even when CONFIG_OF_DEVICE is not set.  This
patch makes of_device valid all the time by moving it outside of the
ifdef CONFIG_OF_DEVICE test.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
2010-07-05 16:14:51 -06:00
Grant Likely 8cec0e7b4c of/device: Add OF style matching helper function
Add of_driver_match_device() helper function.  This function can be used
by bus types to determine if a driver works with a device when using OF
style matching.  If CONFIG_OF is unselected, then it is a nop.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Michal Simek <monstr@monstr.eu>
CC: Grant Likely <grant.likely@secretlab.ca>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: linux-kernel@vger.kernel.org
CC: microblaze-uclinux@itee.uq.edu.au
CC: linuxppc-dev@ozlabs.org
CC: devicetree-discuss@lists.ozlabs.org
2010-07-05 16:14:51 -06:00