Make rpc_exit() non-inline, and ensure that it always wakes up a task that
has been queued.
Kill off the now unused rpc_wake_up_task().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch allows the user to configure the credential cache hashtable size
using a new module parameter: auth_hashtable_size
When set, this parameter will be rounded up to the nearest power of two,
with a maximum allowed value of 1024 elements.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Samsung SoCs use the own OneNAND controler and detect OneNAND chip at power on.
To use this feature, introduce the chip_probe function.
Also remove workaround for Samsung SoCs.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
usleep_range is a finer precision implementations of msleep
and is designed to be a drop-in replacement for udelay where
a precise sleep / busy-wait is unnecessary.
Since an easy interface to hrtimers could lead to an undesired
proliferation of interrupts, we provide only a "range" API,
forcing the caller to think about an acceptable tolerance on
both ends and hopefully avoiding introducing another interrupt.
INTRO
As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not
precise enough for many drivers (yes, sleep precision is an unfair notion,
but consistently sleeping for ~an order of magnitude greater than requested
is worth fixing). This patch adds a usleep API so that udelay does not have
to be used. Obviously not every udelay can be replaced (those in atomic
contexts or being used for simple bitbanging come to mind), but there are
many, many examples of
mydriver_write(...)
/* Wait for hardware to latch */
udelay(100)
in various drivers where a busy-wait loop is neither beneficial nor
necessary, but msleep simply does not provide enough precision and people
are using a busy-wait loop instead.
CONCERNS FROM THE RFC
Why is udelay a problem / necessary? Most callers of udelay are in device/
driver initialization code, which is serial...
As I see it, there is only benefit to sleeping over a delay; the
notion of "refactoring" areas that use udelay was presented, but
I see usleep as the refactoring. Consider i2c, if the bus is busy,
you need to wait a bit (say 100us) before trying again, your
current options are:
* udelay(100)
* msleep(1) <-- As noted above, actually as high as ~20ms
on some platforms, so not really an option
* Manually set up an hrtimer to try again in 100us (which
is what usleep does anyway...)
People choose the udelay route because it is EASY; we need to
provide a better easy route.
Device / driver / boot code is *currently* serial, but every few
months someone makes noise about parallelizing boot, and IMHO, a
little forward-thinking now is one less thing to worry about
if/when that ever happens
udelay's could be preempted
Sure, but if udelay plans on looping 1000 times, and it gets
preempted on loop 200, whenever it's scheduled again, it is
going to do the next 800 loops.
Is the interruptible case needed?
Probably not, but I see usleep as a very logical parallel to msleep,
so it made sense to include the "full" API. Processors are getting
faster (albeit not as quickly as they are becoming more parallel),
so if someone wanted to be interruptible for a few usecs, why not
let them? If this is a contentious point, I'm happy to remove it.
OTHER THOUGHTS
I believe there is also value in exposing the usleep_range option; it gives
the scheduler a lot more flexibility and allows the programmer to express
his intent much more clearly; it's something I would hope future driver
writers will take advantage of.
To get the results in the NUMBERS section below, I literally s/udelay/usleep
the kernel tree; I had to go in and undo the changes to the USB drivers, but
everything else booted successfully; I find that extremely telling in and
of itself -- many people are using a delay API where a sleep will suit them
just fine.
SOME ATTEMPTS AT NUMBERS
It turns out that calculating quantifiable benefit on this is challenging,
so instead I will simply present the current state of things, and I hope
this to be sufficient:
How many udelay calls are there in 2.6.35-rc5?
udealy(ARG) >= | COUNT
1000 | 319
500 | 414
100 | 1146
20 | 1832
I am working on Android, so that is my focus for this. The following table
is a modified usleep that simply printk's the amount of time requested to
sleep; these tests were run on a kernel with udelay >= 20 --> usleep
"boot" is power-on to lock screen
"power collapse" is when the power button is pushed and the device suspends
"resume" is when the power button is pushed and the lock screen is displayed
(no touchscreen events or anything, just turning on the display)
"use device" is from the unlock swipe to clicking around a bit; there is no
sd card in this phone, so fail loading music, video, camera
ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us)
boot | 22 | 1250
power-collapse | 9 | 1200
resume | 5 | 500
use device | 59 | 7700
The most interesting category to me is the "use device" field; 7700us of
busy-wait time that could be put towards better responsiveness, or at the
least less power usage.
Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: apw@canonical.com
Cc: corbet@lwn.net
Cc: arjan@linux.intel.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both rpc_restart_call_prepare() and rpc_restart_call() test for the
RPC_TASK_KILLED flag, and fail to restart the RPC call if that flag is set.
This patch allows callers to know whether or not the restart was
successful, so that they can perform cleanups etc in case of failure.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This should remove the last exclusive lock from start_this_handle(),
so that we should now be able to start multiple transactions at the
same time on large SMP systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores. So
change it to be a read/write spinlock.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (22 commits)
9p: fix sparse warnings in new xattr code
fs/9p: remove sparse warning in vfs_inode
fs/9p: destroy fid on failed remove
fs/9p: Prevent parallel rename when doing fid_lookup
fs/9p: Add support user. xattr
net/9p: Implement TXATTRCREATE 9p call
net/9p: Implement attrwalk 9p call
9p: Implement LOPEN
fs/9p: This patch implements TLCREATE for 9p2000.L protocol.
9p: Implement TMKDIR
9p: Implement TMKNOD
9p: Define and implement TSYMLINK for 9P2000.L
9p: Define and implement TLINK for 9P2000.L
9p: Define and implement TLINK for 9P2000.L
9p: Implement client side of setattr for 9P2000.L protocol.
9p: getattr client implementation for 9P2000.L protocol.
fs/9p: Pass the correct user credentials during attach
net/9p: Handle the server returned error properly
9p: readdir implementation for 9p2000.L
9p: Make use of iounit for read/write
...
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (291 commits)
ARM: AMBA: Add pclk support to AMBA bus infrastructure
ARM: 6278/2: fix regression in RealView after the introduction of pclk
ARM: 6277/1: mach-shmobile: Allow users to select HZ, default to 128
ARM: 6276/1: mach-shmobile: remove duplicate NR_IRQS_LEGACY
ARM: 6246/1: mmci: support larger MMCIDATALENGTH register
ARM: 6245/1: mmci: enable hardware flow control on Ux500 variants
ARM: 6244/1: mmci: add variant data and default MCICLOCK support
ARM: 6243/1: mmci: pass power_mode to the translate_vdd callback
ARM: 6274/1: add global control registers definition header file for nuc900
mx2_camera: fix type of dma buffer virtual address pointer
mx2_camera: Add soc_camera support for i.MX25/i.MX27
arm/imx/gpio: add spinlock protection
ARM: Add support for the LPC32XX arch
ARM: LPC32XX: Arch config menu supoport and makefiles
ARM: LPC32XX: Phytec 3250 platform support
ARM: LPC32XX: Misc support functions
ARM: LPC32XX: Serial support code
ARM: LPC32XX: System suspend support
ARM: LPC32XX: GPIO, timer, and IRQ drivers
ARM: LPC32XX: Clock driver
...
lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Moorestown has PMIC chip which contains GPIO blocks. The PMIC chip is
connected to Langwell by SPI interface. So this GPIO driver will be regarded
as SPI GPIO expander though the actual GPIO access is through IPC and SRAM.
The SPI master contoller will probe this device driver by parsing SPIB table.
Cleaned up for new IPC, GPE removed and some printk and other tidying by
Alan Cox. Fixes for points noted by Matthew Garrett
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
In some cases (for instance with kernel threads) it may be desireable to
use on-stack deferrable timers to get their power saving benefits. Add
interfaces to support this for the IPS driver.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Separate the memory region from the framebuffer device a little bit.
It's now possible to select the memory region used by the framebuffer
device using the new mem_idx parameter of omapfb_plane_info. If the
mem_idx is specified it will be interpreted as an index into the
memory regions array, if it's not specified the framebuffer's index is
used instead. So by default each framebuffer keeps using it's own
memory region which preserves backwards compatibility.
This allows cloning the same memory region to several overlays and yet
each overlay can be controlled independently since they can be
associated with separate framebuffer devices.
Signed-off-by: Ville Syrjälä <ville.syrjala@nokia.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@nokia.com>
The header file l2tp.h should be exported to the installed include/linux/
tree for userspace programs.
This patch fixes compilation errors in L2TP userspace apps which want to
use the new L2TP support introduced in 2.6.35.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit fc6055a5ba (net: Introduce
skb_orphan_try()) allows an early orphan of the skb and takes care on
tx timestamping, which needs the sk-reference in the skb on driver level.
So does the can-raw socket, which has not been taken into account here.
The patch below adds a 'prevent_sk_orphan' bit in the skb tx shared info,
which fixes the problem discovered by Matthias Fuchs here:
http://marc.info/?t=128030411900003&r=1&w=2
Even if it's not a primary tx timestamp topic it fits well into some skb
shared tx context. Or should be find a different place for the information to
protect the sk reference until it reaches the driver level?
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
SBE 2T3E3 cards use DECchips 21143 but they need a different driver.
Don't even try to use a normal tulip driver with them.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
These devices were never released to the public.
Reviewed-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UVC host and gadget drivers both define constants and structures in
private header files. Move all those definitions to linux/usb/video.h
where they can be shared by the two drivers (and be available for
userspace applications).
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Use the macros instead of hardcoding numerical constants for the
controls information bitfield.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Some (North American) providers use a non-standard mode called
"8psk turbo fec". Since there is no flag in the driver that
would allow an application to determine whether a particular
device can handle "turbo fec", the attached patch introduces
FE_CAN_TURBO_FEC.
Since there is no flag in the SI data that would indicate
that a transponder uses "turbo fec", VDR will assume that
all 8psk transponders on DVB-S use "turbo fec".
Tested-by: Derek Kelly <user.vdr@gmail.com>
Signed-off-by: Klaus Schmidinger <Klaus.Schmidinger@tvdr.de>
Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Found with makes headers_check:
include/linux/virtio_9p.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Fang Wenqi <antonf@turbolinux.com.cn>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This is a revision to PATCH 2/2 that I sent. Link:
http://lists.infradead.org/pipermail/linux-mtd/2010-July/030911.html
Added new flag for scanning of both bytes 1 and 6 of the OOB for
a BB marker (instead of simply one or the other).
The "check_pattern" and "check_short_pattern" functions were updated
to include support for scanning the two different locations in the OOB.
In order to handle increases in variety of necessary scanning patterns,
I implemented dynamic memory allocation of nand_bbt_descr structs
in new function 'nand_create_default_bbt_descr()'. This replaces
some increasingly-unwieldy, statically-declared descriptors. It can
replace several more (e.g. "flashbased" structs). However, I do not
test the flashbased options personally.
How this was tested:
I referenced 30+ data sheets (covering 100+ parts), and I tested a
selection of 10 different chips to varying degrees. Particularly, I
tested the creation of bad-block descriptors and basic BB scanning on
three parts:
ST NAND04GW3B2D, 2K page
ST NAND128W3A, 512B page
Samsung K9F1G08U0A, 2K page
To test these, I wrote some fake bad block markers to the flash (in OOB
bytes 1, 6, and elsewhere) to see if the scanning routine would detect
them properly. However, this method was somewhat limited because the
driver I am using has some bugs in its OOB write functionality.
Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Conflicts:
drivers/firewire/core-card.c
drivers/firewire/core-cdev.c
and forgotten #include <linux/time.h> in drivers/firewire/ohci.c
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
NAND_BB_LAST_PAGE used to be in nand.h, but it pertained to bad block
management and so belongs next to NAND_BBT_SCAN2NDPAGE in bbm.h. Also,
its previous flag value (0x00000400) conflicted with NAND_BBT_SCANALLPAGES
so I changed its value to 0x00008000. All uses of the name were modified to
provide consistency with other "NAND_BBT_*" flags.
Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patchs adds a way for user space programs to find out whether a
flash sector is locked. An optional driver method in the mtd_info struct
provides the information.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Update lsm_audit for AppArmor specific data, and add the core routines for
AppArmor uses for auditing.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
Currently there are a number of applications (nautilus being the main one) which
calls access() on files in order to determine how they should be displayed. It
is normal and expected that nautilus will want to see if files are executable
or if they are really read/write-able. access() should return the real
permission. SELinux policy checks are done in access() and can result in lots
of AVC denials as policy denies RWX on files which DAC allows. Currently
SELinux must dontaudit actual attempts to read/write/execute a file in
order to silence these messages (and not flood the logs.) But dontaudit rules
like that can hide real attacks. This patch addes a new common file
permission audit_access. This permission is special in that it is meaningless
and should never show up in an allow rule. Instead the only place this
permission has meaning is in a dontaudit rule like so:
dontaudit nautilus_t sbin_t:file audit_access
With such a rule if nautilus just checks access() we will still get denied and
thus userspace will still get the correct answer but we will not log the denial.
If nautilus attempted to actually perform one of the forbidden actions
(rather than just querying access(2) about it) we would still log a denial.
This type of dontaudit rule should be used sparingly, as it could be a
method for an attacker to probe the system permissions without detection.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Currently MAY_ACCESS means that filesystems must check the permissions
right then and not rely on cached results or the results of future
operations on the object. This can be because of a call to sys_access() or
because of a call to chdir() which needs to check search without relying on
any future operations inside that dir. I plan to use MAY_ACCESS for other
purposes in the security system, so I split the MAY_ACCESS and the
MAY_CHDIR cases.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
When commit be6d3e56a6 "introduce new LSM hooks
where vfsmount is available." was proposed, regarding security_path_truncate(),
only "struct file *" argument (which AppArmor wanted to use) was removed.
But length and time_attrs arguments are not used by TOMOYO nor AppArmor.
Thus, let's remove these arguments.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: James Morris <jmorris@namei.org>
Devices register mask notifier using gsi, but irqchip knows about
irqchip/pin, so conversion from irqchip/pin to gsi should be done before
looking for mask notifier to call.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
For 32bit machines where the physical address width is
larger than the virtual address width the frame number types
in KVM may overflow. Fix this by changing them to u64.
[sfr: fix build on 32-bit ppc]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add support for JMB364 and 369.
Patch-originally-from: Aries Lee <arieslee@jmicron.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some AHCI implementations may use Vendor Specific HBA[A0h, FFh]
and/or Port[70h, 7Fh] registers to 'prepare' for initialization.
For that, the platform needs memory mapped address of AHCI registers.
This patch adds the 'mmio' argument and reorders the call to
platform init function.
Signed-off-by: Jassi Brar <jassi.brar@samsung.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some DIU structures will be used in platform code in
subsequent MPC5121 DIU patch, so we move this header
to be able to include it elsewhere.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.
Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark init_workqueues() as early_initcall() and thus it will be initialized
before smp bringup. init_workqueues() registers for the hotcpu notifier
and thus it should cope with the processors that are brought online after
the workqueues are initialized.
x86 smp bringup code uses workqueues and uses a workaround for the
cold boot process (as the workqueues are initialized post smp_init()).
Marking init_workqueues() as early_initcall() will pave the way for
cleaning up this code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for
each request generates a large number of unneeded atomics if a bit is set.
Replace with a separate test/clear sequence. This is safe since there is
no clear_bit() outside the vcpu thread.
Signed-off-by: Avi Kivity <avi@redhat.com>
As advertised in feature-removal-schedule.txt. Equivalent support is provided
by overlapping memory regions.
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch enable guest to use XSAVE/XRSTOR instructions.
We assume that host_xcr0 would use all possible bits that OS supported.
And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch moves the declaration of of_get_address(), of_get_pci_address(),
and of_pci_address_to_resource() out of arch code and into the common
linux/of_address header file.
This patch also fixes some of the asm/prom.h ordering issues. It still
includes some header files that it ideally shouldn't be, but at least the
ordering is consistent now so that of_* overrides work.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation. This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.
Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.
Signed-off-by: Avi Kivity <avi@redhat.com>
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.
But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.
The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.
To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.
[xiao: fix warning introduced by avi]
Reported-by: Max Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some platforms gate the pclk (APB - the bus - clock) to the peripherals
for power saving, along with the functional clock. When devices are
accessed without pclk enabled, the kernel will oops.
This gives them two options:
1. Leave all clocks on all the time.
2. Attempt to gate pclk along with the functional clock.
(With some hardware, pclk and the functional clock are gated by a single
bit in a register.)
(1) has the disadvantage that it causes increased power usage, which is
bad news for battery operated devices. (2) can lead to kernel oops if
registers are accessed without the functional clock being enabled.
So, introduce the apb_pclk signal in such a way existing drivers don't
need to be updated. Essentially, this means we guarantee that:
1. pclk will be enabled whenever the driver is bound to a device -
from probe() to remove() time.
2. pclk will also be enabled when reading the primecell IDs from the device.
In order to allow drivers to be incrementally updated to achieve greater
power savings, we provide two additional calls to allow drivers to
manage the pclk - amba_pclk_enable()/amba_pclk_disable().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
A function that copies the padata cpumasks to a user buffer
is a bit error prone. The cpumask can change any time so we
can't be sure to have the right cpumask when using this function.
A user who is interested in the padata cpumasks should register
to the padata cpumask notifier chain instead. Users of
padata_get_cpumask are already updated, so we can remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We pass a pointer to the new padata cpumasks to the cpumask_change_notifier
chain. So users can access the cpumasks without the need of an extra
padata_get_cpumask function.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
padata_set_cpumask needs to be protected by a lock. We make
__padata_set_cpumasks unlocked and static. So this function
can be used by the exported and locked padata_set_cpumask and
padata_set_cpumasks functions.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We rename padata_alloc to padata_alloc_possible because this
function allocates a padata_instance and uses the cpu_possible
mask for parallel and serial workers. Also we rename __padata_alloc
to padata_alloc to avoid to export underlined functions. Underlined
functions are considered to be private to padata. Users are updated
accordingly.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for the cy8ctmg110 capacitive touchscreen used on some
embedded devices.
(Some clean up by Alan Cox)
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
See https://bugzilla.kernel.org/show_bug.cgi?id=16056
If other processes are blocked waiting for kswapd to free up some memory so
that they can make progress, then we cannot allow kswapd to block on those
processes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
This is needed by NFSv4.0 servers in order to keep the number of locking
stateids at a manageable level.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.
However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
last MSI message written
- Use the new functions where appropriate
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch exports SMBIOS provided firmware instance and label of
onboard PCI devices to sysfs. New files are:
/sys/bus/pci/devices/.../label which contains the firmware name for
the device in question, and
/sys/bus/pci/devices/.../index which contains the firmware device type
instance for the given device.
Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It is a known issue that mmio decoding shall be disabled while doing PCI
bar sizing. Host bridge and other devices (PCI PIC) shall be excluded for
certain platforms. This patch mainly comes from Mathew Willcox's
patch in http://kerneltrap.org/mailarchive/linux-kernel/2007/9/13/258969.
A new flag bit "mmio_alway_on" is added to pci_dev with the intention that
devices with their mmio decoding cannot be disabled during BAR sizing shall
have this bit set, preferrablly in their quirks.
Without this patch, Intel Moorestown platform graphics unit will be
corrupted during bar sizing activities.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move of_register_spi_devices() call from drivers to
spi_register_master(). Also change the function to use
the struct device_node pointer from master spi device
instead of passing it as function argument.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
of_node_to_nid() is only relevant in a few architectures. Don't force
everyone to implement it anyway.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
The AMBA bus should also use of_device_make_bus_id() when populating device
out of device tree data. This patch makes the function non-static, and
adds a suitable prototype in of_device.h
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Fix __task_cred()'s lockdep check by removing the following validation
condition:
lockdep_tasklist_lock_is_held()
as commit_creds() does not take the tasklist_lock, and nor do most of the
functions that call it, so this check is pointless and it can prevent
detection of the RCU lock not being held if the tasklist_lock is held.
Instead, add the following validation condition:
task->exit_state >= 0
to permit the access if the target task is dead and therefore unable to change
its own credentials.
Fix __task_cred()'s comment to:
(1) discard the bit that says that the caller must prevent the target task
from being deleted. That shouldn't need saying.
(2) Add a comment indicating the result of __task_cred() should not be passed
directly to get_cred(), but rather than get_task_cred() should be used
instead.
Also put a note into the documentation to enforce this point there too.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.
What happens is that get_task_cred() can race with commit_creds():
TASK_1 TASK_2 RCU_CLEANER
-->get_task_cred(TASK_2)
rcu_read_lock()
__cred = __task_cred(TASK_2)
-->commit_creds()
old_cred = TASK_2->real_cred
TASK_2->real_cred = ...
put_cred(old_cred)
call_rcu(old_cred)
[__cred->usage == 0]
get_cred(__cred)
[__cred->usage == 1]
rcu_read_unlock()
-->put_cred_rcu()
[__cred->usage == 1]
panic()
However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.
We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.
Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:
kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
[<ffffffff810698cd>] put_cred+0x13/0x15
[<ffffffff81069b45>] commit_creds+0x16b/0x175
[<ffffffff8106aace>] set_current_groups+0x47/0x4e
[<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP [<ffffffff81069881>] __put_cred+0xc/0x45
RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the DMA context programming and userspace ABI for multichannel
reception, i.e. for listening on multiple channel numbers by means of a
single DMA context.
The use case is reception of more streams than there are IR DMA units
offered by the link layer. This is already implemented by the older
ohci1394 + ieee1394 + raw1394 stack. And as discussed recently on
linux1394-devel, this feature is occasionally used in practice.
The big drawbacks of this mode are that buffer layout and interrupt
generation necessarily differ from single-channel reception: Headers
and trailers are not stripped from packets, packets are not aligned with
buffer chunks, interrupts are per buffer chunk, not per packet.
These drawbacks also cause a rather hefty code footprint to support this
rarely used OHCI-1394 feature. (367 lines added, among them 94 lines of
added userspace ABI documentation.)
This implementation enforces that a multichannel reception context may
only listen to channels to which no single-channel context on the same
link layer is presently listening to. OHCI-1394 would allow to overlay
single-channel contexts by the multi-channel context, but this would be
a departure from the present first-come-first-served policy of IR
context creation.
The implementation is heavily based on an earlier one by Jay Fenlason.
Thanks Jay.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Platforms may have some external power control which need to be
controlled from board specific code. Rename the translate_vdd()
callback to vdd_handler() and pass it the power mode.
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
A small number of users of IRQF_TIMER are using it for the implied no
suspend behaviour on interrupts which are not timer interrupts.
Therefore add a new IRQF_NO_SUSPEND flag, rename IRQF_TIMER to
__IRQF_TIMER and redefine IRQF_TIMER in terms of these new flags.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: xen-devel@lists.xensource.com
Cc: linux-input@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
Cc: devicetree-discuss@lists.ozlabs.org
LKML-Reference: <1280398595-29708-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
davinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high
regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
wm8350-regulator: fix wm8350_register_regulator error handling
ab3100: fix off-by-one value range checking for voltage selector
fanotify currently, when given a vfsmount_mark will look up (if it exists)
the corresponding inode mark. This patch drops that lookup and uses the
mark provided.
Signed-off-by: Eric Paris <eparis@redhat.com>
should_send_event() and handle_event() will both need to look up the inode
event if they get a vfsmount event. Lets just pass both at the same time
since we have them both after walking the lists in lockstep.
Signed-off-by: Eric Paris <eparis@redhat.com>
The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.
Signed-off-by: Eric Paris <eparis@redhat.com>
group->mask is now useless. It was originally a shortcut for fsnotify to
save on performance. These checks are now redundant, so we remove them.
Signed-off-by: Eric Paris <eparis@redhat.com>
Because we walk the object->fsnotify_marks list instead of the global
fsnotify groups list we don't need the fsnotify_inode_mask and
fsnotify_vfsmount_mask as these were simply shortcuts in fsnotify() for
performance. They are now extra checks, rip them out.
Signed-off-by: Eric Paris <eparis@redhat.com>
With the change of fsnotify to use srcu walking the marks list instead of
walking the global groups list we now know the mark in question. The code can
send the mark to the group's handling functions and the groups won't have to
find those marks themselves.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side. This patch protects the read side of those lists
with a new single srcu.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently fsnotify check is mark->group is NULL to decide if
fsnotify_destroy_mark() has already been called or not. With the upcoming
rcu work it is a heck of a lot easier to use an explicit flag than worry
about group being set to NULL.
Signed-off-by: Eric Paris <eparis@redhat.com>
Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file. To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than the horrific void ** argument and such just to pass the
fanotify_merge event back to the caller of fsnotify_add_notify_event() have
those things return an event if it was different than the event suggusted to
be added.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently fanotify fds opened for thier listeners are done with f_flags
equal to O_RDONLY | O_LARGEFILE. This patch instead takes f_flags from the
fanotify_init syscall and uses those when opening files in the context of
the listener.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds a check to make sure that all fsnotify bits are unique and we
cannot accidentally use the same bit for 2 different fsnotify event types.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify uses bits called IN_* and fsnotify uses bits called FS_*. These
need to line up. This patch adds build time checks to make sure noone can
change these bits so they are not the same.
Signed-off-by: Eric Paris <eparis@redhat.com>
An inotify watch on a directory will send events for children even if those
children have been unlinked. This patch add a new inotify flag IN_EXCL_UNLINK
which allows a watch to specificy they don't care about unlinked children.
This should fix performance problems seen by tasks which add a watch to
/tmp and then are overrun with events when other processes are reading and
writing to unlinked files they created in /tmp.
https://bugzilla.kernel.org/show_bug.cgi?id=16296
Requested-by: Matthias Clasen <mclasen@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In TPS6507x, depending on the status of DEFDCDC{2,3} pin either
DEFDCDC{2,3}_LOW or DEFDCDC{2,3}_HIGH register needs to be read or
programmed to change the output voltage.
The current driver assumes DEFDCDC{2,3} pins are always tied low
and thus operates only on DEFDCDC{2,3}_LOW register. This need
not always be the case (as is found on OMAP-L138 EVM).
Unfortunately, software cannot read the status of DEFDCDC{2,3} pins.
So, this information is passed through platform data depending on
how the board is wired.
Signed-off-by: Anuj Aggarwal <anuj.aggarwal@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
akpm got a warning the fsnotify_mask could be used uninitialized in
fsnotify_perm(). It's not actually possible but his compiler complained
about it. This patch just initializes it to 0 to shut up the compiler.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify groups need to respond to events which include permissions types.
To do so groups will send a response using write() on the fanotify_fd they
have open.
Signed-off-by: Eric Paris <eparis@redhat.com>
This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events. This is done using the
new fsnotify secondary queue. No userspace interface is provided actually
respond to or request these events.
Signed-off-by: Eric Paris <eparis@redhat.com>
introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code. This hook is used to allow fsnotify groups to make access
control decisions about events on the system. We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.
Signed-off-by: Eric Paris <eparis@redhat.com>
Extern declarations in sysctl.c should be move to their own head file, and
then include them in relavant .c files.
Move inotify_table extern declaration to linux/inotify.h
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Move dir_notify_enable declaration to where it belongs -- dnotify.h .
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify was using char * when it passed around the d_name.name string
internally but it is actually an unsigned char *. This patch switches
fsnotify to use unsigned and should silence some pointer signess warnings
which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify
code as there are still issues with kstrdup and strlen which would pop
out needless warnings.
Signed-off-by: Eric Paris <eparis@redhat.com>
Each group can define their own notification (and secondary_q) merge
function. Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event. But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue. This patch just implements a
void ** argument which is passed to the merge function. fanotify can use
this field to pass the new event back to higher layers.
Signed-off-by: Eric Paris <eparis@redhat.com>
for fanotify to properly deal with permissions events
This introduces an ordering to fsnotify groups. With purely asynchronous
notification based "things" implementing fsnotify (inotify, dnotify) ordering
isn't particularly important. But if people want to use fsnotify for the
basis of sycronous notification or blocking notification ordering becomes
important.
eg. A Hierarchical Storage Management listener would need to get its event
before an AV scanner could get its event (since the HSM would need to
bring the data in for the AV scanner to scan.) Typically asynchronous notification
would want to run after the AV scanner made any relevant access decisions
so as to not send notification about an event that was denied.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify listeners may want to clear all marks. They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.
Signed-off-by: Eric Paris <eparis@redhat.com>
Some users may want to truely ignore an inode even if it has been modified.
Say you are wanting a mount which contains a log file and you really don't
want any notification about that file. This patch allows the listener to
do that.
Signed-off-by: Eric Paris <eparis@redhat.com>
Some inodes a group may want to never hear about a set of events even if
the inode is modified. We add a new mark flag which indicates that these
marks should not have their ignored_mask cleared on modification.
Signed-off-by: Eric Paris <eparis@redhat.com>
Change the sys_fanotify_mark() system call so users can set ignored_masks
on inodes. Remember, if a user new sets a real mask, and only sets ignored
masks, the ignore will never be pinned in memory. Thus ignored_masks can
be lost under memory pressure and the user may again get events they
previously thought were ignored.
Signed-off-by: Eric Paris <eparis@redhat.com>
The ignored_mask is a new mask which is part of fsnotify marks. A group's
should_send_event() function can use the ignored mask to determine that
certain events are not of interest. In particular if a group registers a
mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
ignored_mask for individual inodes and not send open events for those
inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify marks must pin inodes in core. dnotify doesn't technically need to
since they are closed when the directory is closed. fanotify also need to
pin inodes in core as it works today. But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest. I claim that these should be liberally sent to the
kernel and should not pin the inode in core. If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.
This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning. We do not pin a mark until it has a non-zero mask entry. If a
listener new sets a mask we never pin the inode.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify_mark_validate functions are all needlessly declared in headers as
static inlines. Instead just do the checks where they are needed for code
readability.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
the term 'vfsmount' isn't sensicle to userspace. instead call is 'mount.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Create a new fanotify_mark flag which indicates we should attach the mark
to the vfsmount holding the object referenced by dfd and pathname rather
than the inode itself.
Signed-off-by: Eric Paris <eparis@redhat.com>
currently should_send_event in fanotify only cares about marks on inodes.
This patch extends that interface to indicate that it cares about events
that happened on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>
Per-mount watches allow groups to listen to fsnotify events on an entire
mount. This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode. They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: Eric Paris <eparis@redhat.com>
Pass the process identifiers of the triggering processes to fanotify
listeners: this information is useful for event filtering and logging.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
Send events to userspace by reading the file descriptor from fanotify_init().
One will get blocks of data which look like:
struct fanotify_event_metadata {
__u32 event_len;
__u32 vers;
__s32 fd;
__u64 mask;
__s64 pid;
__u64 cookie;
} __attribute__ ((packed));
Simple code to retrieve and deal with events is below
while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
struct fanotify_event_metadata *metadata;
metadata = (void *)buf;
while(FAN_EVENT_OK(metadata, len)) {
[PROCESS HERE!!]
if (metadata->fd >= 0 && close(metadata->fd) != 0)
goto fail;
metadata = FAN_EVENT_NEXT(metadata, len);
}
}
Signed-off-by: Eric Paris <eparis@redhat.com>
NAME
fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object
SYNOPSIS
int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
int dfd, const char *pathname)
DESCRIPTION
fanotify_mark() is used to add remove or modify a mark on a filesystem
object. Marks are used to indicate that the fanotify group is
interested in events which occur on that object. At this point in
time marks may only be added to files and directories.
fanotify_fd must be a file descriptor returned by fanotify_init()
The flags field must contain exactly one of the following:
FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
from the mark
The following values can be OR'd into the flags field:
FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)
dfd may be any of the following:
AT_FDCWD: the object will be lookup up based on pathname similar
to open(2)
file descriptor of a directory: if pathname is not NULL the
object to modify will be lookup up similar to openat(2)
file descriptor of the final object: if pathname is NULL the
object to modify will be the object referenced by dfd
The mask is the bitwise OR of the set of events of interest such as:
FAN_ACCESS - object was accessed (read)
FAN_MODIFY - object was modified (write)
FAN_CLOSE_WRITE - object was writable and was closed
FAN_CLOSE_NOWRITE - object was read only and was closed
FAN_OPEN - object was opened
FAN_EVENT_ON_CHILD - interested in objected that happen to
children. Only relavent when the object
is a directory
FAN_Q_OVERFLOW - event queue overflowed (not implemented)
RETURN VALUE
On success, this system call returns 0. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL An invalid value was specified in mask.
EINVAL An invalid value was specified in ignored_mask.
EINVAL fanotify_fd is not a file descriptor as returned by
fanotify_init()
EBADF fanotify_fd is not a valid file descriptor
EBADF dfd is not a valid file descriptor and path is NULL.
ENOTDIR dfd is not a directory and path is not NULL
EACCESS no search permissions on some part of the path
ENENT file not found
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch simply declares the new sys_fanotify_mark syscall
int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
int dfd const char *pathname)
Signed-off-by: Eric Paris <eparis@redhat.com>
NAME
fanotify_init - initialize an fanotify group
SYNOPSIS
int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);
DESCRIPTION
fanotify_init() initializes a new fanotify instance and returns a file
descriptor associated with the new fanotify event queue.
The following values can be OR'd into the flags field:
FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
Using this flag saves extra calls to fcntl(2) to achieve the same
result.
FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
See the description of the O_CLOEXEC flag in open(2) for reasons why
this may be useful.
The event_f_flags argument is unused and must be set to 0
The priority argument is unused and must be set to 0
RETURN VALUE
On success, this system call return a new file descriptor. On error, -1 is
returned, and errno is set to indicate the error.
ERRORS
EINVAL An invalid value was specified in flags.
EINVAL A non-zero valid was passed in event_f_flags or in priority
ENFILE The system limit on the total number of file descriptors has been reached.
ENOMEM Insufficient kernel memory is available.
CONFORMING TO
These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch defines a new syscall fanotify_init() of the form:
int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)
This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question. This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification. These are useful for on access scanners or hierachical storage
management schemes.
This patch just implements the basics of the fsnotify functions.
Signed-off-by: Eric Paris <eparis@redhat.com>
sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
something unique.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This is a new f_mode which can only be set by the kernel. It indicates
that the fd was opened by fanotify and should not cause future fanotify
events. This is needed to prevent fanotify livelock. An example of
obvious livelock is from fanotify close events.
Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?
The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify. Thus when that file is used it will not generate future
events.
This patch simply defines the bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
previously I used mark_entry when talking about marks on inodes. The
_entry is pretty useless. Just use "mark" instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Some fsnotify operations send a struct file. This is more information than
we technically need. We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Signed-off-by: Eric Paris <eparis@redhat.com>
vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Signed-off-by: Eric Paris <eparis@redhat.com>
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>
currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.
The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: Eric Paris <eparis@redhat.com>
Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_obtain_group was intended to be able to find an already existing
group. Nothing uses that functionality. This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: Eric Paris <eparis@redhat.com>
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added. I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event. This patch implements the replace functionality of that
process.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller. Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created. Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: Eric Paris <eparis@redhat.com>
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch allows a task to add a second fsnotify mark to an inode for the
same group. This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function. This
is useful if a user wants to add a new mark before removing the old one.
Signed-off-by: Eric Paris <eparis@redhat.com>
Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.
Signed-off-by: Eric Paris <eparis@redhat.com>
Audit currently uses inotify to pin inodes in core and to detect when
watched inodes are deleted or unmounted. This patch uses fsnotify instead
of inotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
Add a new kernel API to attach a task to current task's cgroup
in all the active hierarchies.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
This patch add DMA drivers for DMA controllers in Langwell chipset
of Intel(R) Moorestown platform and DMA controllers in Penwell of
Intel(R) Medfield platfrom
This patch adds support for Moorestown DMAC1 and DMAC2 controllers.
It also add support for Medfiled GP DMA and DMAC1 controllers.
These controllers supports memory to peripheral and peripheral to
memory transfers. It support only single block transfers.
This driver is based on Kernel DMA engine
Anyone who wishes to use this controller should use DMA engine APIs
This controller exposes DMA_SLAVE capabilities and notifies the client drivers
of DMA transaction completion
Config option required to be enabled CONFIG_INTEL_MID_DMAC=y
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds support for RX and TX DMA via the DMA API,
this is only supported when the KS8842 is accessed via timberdale.
There is no support for DMA on the generic bus interface it self,
a state machine inside the FPGA is handling RX and TX transfers to/from
buffers in the FPGA. The host CPU can do DMA to and from these buffers.
The FPGA has to handle the RX interrupts, so these must be enabled in
the ks8842 but not in the FPGA. The driver must not disable the RX interrupt
that would mean that the data transfers into the FPGA buffers would stop.
The host shall not enable TX interrupts since TX is handled by the FPGA,
the host is notified by DMA callbacks when transfers are finished.
Which DMA channels to use are added as parameters in the platform data struct.
Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
s2io: fixing DBG_PRINT() macro
ath9k: fix dma direction for map/unmap in ath_rx_tasklet
net: dev_forward_skb should call nf_reset
net sched: fix race in mirred device removal
tun: avoid BUG, dump packet on GSO errors
bonding: set device in RLB ARP packet handler
wimax/i2400m: Add PID & VID for Intel WiMAX 6250
ipv6: Don't add routes to ipv6 disabled interfaces.
net: Fix skb_copy_expand() handling of ->csum_start
net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c
macvtap: Limit packet queue length
ixgbe/igb: catch invalid VF settings
bnx2x: Advance a module version
bnx2x: Protect statistics ramrod and sequence number
bnx2x: Protect a SM state change
wireless: use netif_rx_ni in ieee80211_send_layer2_update
Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited. That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.
This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
__GFP_NOFAIL is going away, so add our own retry loop. Also add
jbd2__journal_start() and jbd2__journal_restart() which take a gfp
mask, so that file systems can optionally (re)start transaction
handles using GFP_KERNEL. If they do this, then they need to be
prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
to reflect that error up to userspace.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
To properly handle clocksources that change frequencies
at the clocksource->enable() point, this patch adds
a method that will update the clocksource's mult/shift and
max_idle_ns values.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-12-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes xtime and wall_to_monotonic static, as planned in
Documentation/feature-removal-schedule.txt. This will allow for
further cleanups to the timekeeping core.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-10-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.
This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After accidentally misusing timespec_add_safe, I wanted to make sure
we don't accidently trip over that issue again, so I created a simple
timespec_add() function which we can use to replace the instances
of timespec_add_safe() that don't want the overflow detection.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-3-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Implementation of the ST-Ericsson baudrate extension in the PL011
block. In this modified variant it is possible to change the
sampling factor from 16 to 8, and thanks to this we can get higher
baudrates while still using the same peripheral clock.
Also replace the simple division to determine the baud divisor
with DIV_ROUND_CLOSEST() rather than a simple integer division.
Cc: Alessandro Rubini <rubini@unipv.it>
Cc: Jerzy Kasenberg <jerzy.kasenberg@tieto.com>
Signed-off-by: Marcin Mielczarczyk <marcin.mielczarczyk@tieto.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In the ST-Ericsson version of the PL011 the TX and RX have different
control registers.
Cc: Alessandro Rubini <rubini@unipv.it>
Signed-off-by: Marcin Mielczarczyk <marcin.mielczarczyk@tieto.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited. That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.
This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alex Elder <aelder@sgi.com>
This patch allows exporting GPIO pins not used by the keypad itself
to be accessible from elsewhere.
Signed-off-by: Xiaolong Chen <xiao-long.chen@motorola.com>
Signed-off-by: Yuanbo Ye <yuan-bo.ye@motorola.com>
Signed-off-by: Tao Hu <taohu@motorola.com>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This inserts sanity check that refuses to mount a filesystem with
unsupported block size.
Previously, kernel code of nilfs was looking only limitation of
devices though mkfs.nilfs2 limits the range of block sizes; there was
no check that prevents rec_len overflow with larger block sizes.
With this change, block sizes larger than 64KB or smaller than 1KB
will get rejected explicitly by kernel.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
With 64KB blocksize, a directory entry can have size 64KB which does
not fit into 16 bits we have for entry length. So this patch stores
0xffff instead and converts value when read from / written to disk.
Nilfs derives its directory implementation from ext2 filesystem, and
this draws upon the corresponding change on ext2.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Move frags[] at the end of struct skb_shared_info, and make
pskb_expand_head() copy only the used part of it instead of whole array.
This should avoid kmemcheck warnings and speedup pskb_expand_head() as
well, avoiding a lot of cache misses.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add addr_assign_type to struct net_device and expose it via sysfs.
This new attribute has the purpose of giving user-space the ability to
distinguish between different assignment types of MAC addresses.
For example user-space can treat NICs with randomly generated MAC
addresses differently than NICs that have permanent (locally assigned)
MAC addresses.
For the former udev could write a persistent net rule by matching the
device path instead of the MAC address.
There's also the case of devices that 'steal' MAC addresses from slave
devices. In which it is also be beneficial for user-space to be aware
of the fact.
This patch also introduces a helper function to assist adoption of
drivers that generate MAC addresses randomly.
Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work. To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This list used was by only two platforms with all other platforms defining an
own list of valid bus id's to pass to of_platform_bus_probe. This patch:
i) copies the default list to the two platforms that depended on it (powerpc)
ii) remove the usage of of_default_bus_ids in of_platform_bus_probe
iii) removes the definition of the list from all architectures that defined it
Passing a NULL 'matches' parameter to of_platform_bus_probe is still valid; the
function returns no error in that case as the NULL value is equivalent to an
empty list.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
[grant.likely@secretlab.ca: added __initdata annotations, warn on and return error on missing match table, and fix whitespace errors]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
of_device is currently just an #define alias to platform_device until it
gets removed entirely. This patch removes references to it from the
include directories and the core drivers/of code.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
It is mostly unused now. Sparc has a few defines left in it, but they
can be moved to other headers. Removing this header means that new
architectures adding CONFIG_OF support don't need to also add this
header file.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Only thing left in it is of_instantiate_rtc() which can be moved to
asm/prom.h on PowerPC and is unused in microblaze.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Both of_bus_type and of_platform_bus_type are just #define aliases
for the platform bus. This patch removes all references to them and
switches to the of_register_platform_driver()/of_unregister_platform_driver()
API for registering.
Subsequent patches will convert each user of of_register_platform_driver()
into plain platform_drivers without the of_platform_driver shim. At which
point the of_register_platform_driver()/of_unregister_platform_driver()
functions can be removed.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
of_platform_bus was being used in the same manner as the platform_bus.
The only difference being that of_platform_bus devices are generated
from data in the device tree, and platform_bus devices are usually
statically allocated in platform code. Having them separate causes
the problem of device drivers having to be registered twice if it
was possible for the same device to appear on either bus.
This patch removes of_platform_bus_type and registers all of_platform
bus devices and drivers on the platform bus instead. A previous patch
made the of_device structure an alias for the platform_device structure,
and a shim is used to adapt of_platform_drivers to the platform bus.
After all of of_platform_bus drivers are converted to be normal platform
drivers, the shim code can be removed.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
usleep[_range] are finer precision implementations of msleep
and are designed to be drop-in replacements for udelay where
a precise sleep / busy-wait is unnecessary. They also allow
an easy interface to specify slack when a precise (ish)
wakeup is unnecessary to help minimize wakeups
Signed-off-by: Patrick Pannuto <ppannuto@codeaurora.org>
Cc: akinobu.mita@gmail.com
Cc: sboyd@codeaurora.org
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <4C44CDD2.1070708@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We should copy the initial value to userspace for iptables-save and
to allow removal of specific quota rules.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In both the ieee1394 stack and the firewire stack, the core treats
kernelspace drivers better than userspace drivers when it comes to
CSR address range allocation: The former may request a register to be
placed automatically at a free spot anywhere inside a specified address
range. The latter may only request a register at a fixed offset.
Hence, userspace drivers which do not require a fixed offset potentially
need to implement a retry loop with incremented offset in each retry
until the kernel does not fail allocation with EBUSY. This awkward
procedure is not fundamentally necessary as the core already provides a
superior allocation API to kernelspace drivers.
Therefore change the ioctl() ABI by addition of a region_end member in
the existing struct fw_cdev_allocate. Userspace and kernelspace APIs
work the same way now.
There is a small cost to pay by clients though: If client source code
is required to compile with older kernel headers too, then any use of
the new member fw_cdev_allocate.region_end needs to be enclosed by
#ifdef/#endif directives. However, any client program that seriously
wants to use address range allocations will require a kernel of cdev ABI
version >= 4 at runtime and a linux/firewire-cdev.h header of >= 4
anyway. This is because v4 brings FW_CDEV_EVENT_REQUEST2. The only
client program in which build-time compatibility with struct
fw_cdev_allocate as found in older kernel headers makes sense is
libraw1394.
(libraw1394 uses the older broken FW_CDEV_EVENT_REQUEST to implement a
makeshift, incorrect transaction responder that does at least work
somewhat in many simple scenarios, relying on guesswork by libraw1394
and by libraw1394 based applications. Plus, address range allocation
and transaction responder is only one of many features that libraw1394
needs to provide, and these other features need to work with kernel and
kernel-headers as old as possible. Any new linux/firewire-cdev.h based
client that implements a transaction responder should never attempt to
do it like libraw1394; instead it should make a header and kernel of v4
or later a hard requirement.)
While we are at it, update the struct fw_cdev_allocate documentation to
better reflect the recent fw_cdev_event_request2 ABI addition.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This extends the FW_CDEV_IOC_SEND_PHY_PACKET ioctl() for /dev/fw* to be
useful for ping time measurements. One application for it would be gap
count optimization in userspace that is based on ping times rather than
hop count. (The latter is implemented in firewire-core itself but is
not applicable to beta PHYs that act as repeater.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Add an FW_CDEV_IOC_RECEIVE_PHY_PACKETS ioctl() and
FW_CDEV_EVENT_PHY_PACKET_RECEIVED poll()/read() event for /dev/fw*.
This can be used to get information from remote PHYs by remote access
PHY packets.
This is also the 2nd half of the functionality (the receive part) to
support a userspace implementation of a VersaPHY transaction layer.
Safety considerations:
- PHY packets are generally broadcasts, hence some kind of elevated
privileges should be required of a process to be able to listen in
on PHY packets. This implementation assumes that a process that is
allowed to open the /dev/fw* of a local node does have this
privilege.
There was an inconclusive discussion about introducing POSIX
capabilities as a means to check for user privileges for these
kinds of operations.
Other limitations:
- PHY packet reception may be switched on by ioctl() but cannot be
switched off again. It would be trivial to provide an off switch,
but this is not worth the code. The client should simply close()
the fd then, or just ignore further events.
- For sake of simplicity of API and kernel-side implementation, no
filter per packet content is provided.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Add an FW_CDEV_IOC_SEND_PHY_PACKET ioctl() for /dev/fw* which can be
used to implement bus management related functionality in userspace.
This is also half of the functionality (the transmit part) that is
needed to support a userspace implementation of a VersaPHY transaction
layer.
Safety considerations:
- PHY packets are generally broadcasts and may have interesting
effects on PHYs and the bus, e.g. make asynchronous arbitration
impossible due to too low gap count. Hence some kind of elevated
privileges should be required of a process to be able to send
PHY packets. This implementation assumes that a process that is
allowed to open the /dev/fw* of a local node does have this
privilege.
There was an inconclusive discussion about introducing POSIX
capabilities as a means to check for user privileges for these
kinds of operations.
- The kernel does not check integrity of the supplied packet data.
That would be far too much code, considering the many kinds of
PHY packets. A process which got the privilege to send these
packets is trusted to do it correctly.
Just like with the other "send packet" ioctls, a non-blocking API is
chosen; i.e. the ioctl may return even before AT DMA started. After
transmission, an event for poll()/read() is enqueued. Most users are
going to need a blocking API, but a blocking userspace wrapper is easy
to implement, and the second of the two existing libraw1394 calls
raw1394_phy_packet_write() and raw1394_start_phy_packet_write() can be
better supported that way.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Response events:
- are generated on more occasions than their documentation claimed.
CSR allocation:
- An already occupied CSR can be determined from errno==EBUSY.
Bus resets:
- Note that FW_CDEV_IOC_INITIATE_BUS_RESET is nonblocking and that the
client is not required to observe a grace period since kernels
2.6.36+ will enforce it now (commit 02d37bed).
- The possible values of fw_cdev_initiate_bus_reset.type are listed in
the kerneldoc comment already.
- Clarify that an application that uses FW_CDEV_IOC_ADD_DESCRIPTOR and
FW_CDEV_IOC_REMOVE_DESCRIPTOR does not have to issue a bus reset.
Isochronous I/O contexts:
- At most one can be created per open file descriptor.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
core-transaction.c transmit_complete_callback() and close_transaction()
expect packet callback status to be an ACK or RCODE, and ACKs get
translated to RCODEs for transaction callbacks.
An old comment on the packet callback API (been there from the initial
submission of the stack) and the dummy_driver implementation of
send_request/send_response deviated from this as they also included
-ERRNO in the range of status values.
Let's narrow status values down to ACK and RCODE to prevent surprises.
RCODE_CANCELLED is chosen as the dummy_driver's RCODE as its meaning of
"transaction timed out" comes closest to what happens when a transaction
coincides with card removal.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.
With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)
Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.
Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.
Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
-j REDIRECT --to-port 8081
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
-j REDIRECT --to-port 8082
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
-j REDIRECT --to-port 8083
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Quota code never touches file data. It just modifies i_blocks + i_bytes
of inodes and inode flags of quota files. So use mark_inode_dirty_sync
instead of mark_inode_dirty.
Signed-off-by: Jan Kara <jack@suse.cz>
This implements the kernel-space side of the netfilter matcher xt_ipvs.
[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
[ Patrick: added xt_ipvs.h to Kbuild ]
Signed-off-by: Patrick McHardy <kaber@trash.net>