* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (31 commits)
[MIPS] Remove duplicate ISA DMA code for 0 DMA channel case.
[MIPS] Remove unused definition of cpu_to_lelongp()
[MIPS] Remove userspace proofing from <asm/bitops.h>.
[MIPS] Remove old junk left from old atomic_lock.
[MIPS] Use conditional traps for BUG_ON on MIPS II and better.
[MIPS] mips HPT cleanup: make clocksource_mips public
[MIPS] do_IRQ cleanup
[MIPS] Avoid dupliate D-cache flush on R400C / R4400 SC and MC variants.
[MIPS] Remove redundant r4k_blast_icache() calls
[MIPS] Work around bogus gcc warnings.
[MIPS] Fix double inclusions
[MIPS] use generic_handle_irq, handle_level_irq, handle_percpu_irq
[MIPS] IRQ cleanups
[MIPS] mips hpt cleanup: get rid of mips_hpt_init
[MIPS] PB1200: Remove duplicate definitions
[MIPS] Fix alignment hole in struct cache_desc; shrink struct.
[MIPS] Oprofile: kernel support for the R10000.
[MIPS] Remove unused R10000 performance counter definitions.
[MIPS] Add support for kexec
[MIPS] Don't print presence of WAIT instruction on bootup.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6: (103 commits)
usbcore: remove unused argument in autosuspend
USB: keep count of unsuspended children
USB hub: simplify remote-wakeup handling
USB: struct usb_device: change flag to bitflag
OHCI: make autostop conditional on CONFIG_PM
USB: Add autosuspend support to the hub driver
EHCI: Fix root-hub and port suspend/resume problems
USB: create a new thread for every USB device found during the probe sequence
USB: add driver for the USB debug devices
USB: added dynamic major number for USB endpoints
USB: pegasus error path not resetting task's state
USB: endianness fix for asix.c
USB: build the appledisplay driver
USB serial: replace kmalloc+memset with kzalloc
USB: hid-core: canonical defines for Apple USB device IDs
USB: idmouse cleanup
USB: make drivers/usb/core/driver.c:usb_device_match() static
USB: lh7a40x_udc remove double declaration
USB: pxa2xx_udc recognizes ixp425 rev b0 chip
usbtouchscreen: add support for DMC TSC-10/25 devices
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
Driver core: show drivers in /sys/module/
Documentation/driver-model/platform.txt update/rewrite
Driver core: platform_driver_probe(), can save codespace
driver core: Use klist_remove() in device_move()
driver core: Introduce device_move(): move a device to a new parent.
Driver core: make drivers/base/core.c:setup_parent() static
driver core: Introduce device_find_child().
sysfs: sysfs_write_file() writes zero terminated data
cpu topology: consider sysfs_create_group return value
Driver core: Call platform_notify_remove later
ACPI: Change ACPI to use dev_archdata instead of firmware_data
Driver core: add dev_archdata to struct device
Driver core: convert sound core to use struct device
Driver core: change mem class_devices to be real devices
Driver core: convert fb code to use struct device
Driver core: convert firmware code to use struct device
Driver core: convert mmc code to use struct device
Driver core: convert ppdev code to use struct device
Driver core: convert PPP code to use struct device
Driver core: convert cpuid code to use struct device
...
Show the drivers, which belong to the module:
$ ls -l /sys/module/usbcore/drivers/
hub -> ../../../bus/usb/drivers/hub
usb -> ../../../bus/usb/drivers/usb
usbfs -> ../../../bus/usb/drivers/usbfs
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This defines a new platform_driver_probe() method allowing the driver's
probe() method, and its support code+data, to safely live in __init
sections for typical system configurations.
Many system-on-chip processors could benefit from this API, to the tune
of recovering hundreds to thousands of bytes per driver. That's memory
which is currently wasted holding code which can never be called after
system startup, yet can not be removed. It can't be removed because of
the linkage requirement that pointers to init section code (like, ideally,
probe support) must not live in other sections (like driver method tables)
after those pointers would be invalid.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Provide a function device_move() to move a device to a new parent device. Add
auxilliary functions kobject_move() and sysfs_move_dir().
kobject_move() generates a new uevent of type KOBJ_MOVE, containing the
previous path (DEVPATH_OLD) in addition to the usual values. For this, a new
interface kobject_uevent_env() is created that allows to add further
environmental data to the uevent at the kobject layer.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change ACPI to use dev_archdata instead of firmware_data
This patch changes ACPI to use the new dev_archdata on i386, x86_64
and ia64 (is there any other arch using ACPI ?) to store it's
acpi_handle.
It also removes the firmware_data field from struct device as this
was the only user.
Only build-tested on x86
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add arch specific dev_archdata to struct device
Adds an arch specific struct dev_arch to struct device. This enables
architecture to add specific fields to every device in the system, like
DMA operation pointers, NUMA node ID, firmware specific data, etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andi Kleen <ak@suse.de>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Converts from using struct "class_device" to "struct device" making
everything show up properly in /sys/devices/ with symlinks from the
/sys/class directory.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Converts from using struct "class_device" to "struct device" making
everything show up properly in /sys/devices/ with symlinks from the
/sys/class directory.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Converts from using struct "class_device" to "struct device" making
everything show up properly in /sys/devices/ with symlinks from the
/sys/class directory.
Also fixes up the isdn drivers that were putting something in the class
device's directory.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This also ment that some of the misc drivers had to also be fixed
up as they were assuming the device was a class_device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I finally did as you suggested and added the notifier to the struct
bus_type itself. There are still problems to be expected is something
attaches to a bus type where the code can hook in different struct
device sub-classes (which is imho a big bogosity but I won't even try to
argue that case now) but it will solve nicely a number of issues I've
had so far.
That also means that clients interested in registering for such
notifications have to do it before devices are added and after bus types
are registered. Fortunately, most bus types that matter for the various
usage scenarios I have in mind are registerd at postcore_initcall time,
which means I have a really nice spot at arch_initcall time to add my
notifiers.
There are 4 notifications provided. Device being added (before hooked to
the bus) and removed (failure of previous case or after being unhooked
from the bus), along with driver being bound to a device and about to be
unbound.
The usage I have for these are:
- The 2 first ones are used to maintain a struct device_ext that is
hooked to struct device.firmware_data. This structure contains for now a
pointer to the Open Firmware node related to the device (if any), the
NUMA node ID (for quick access to it) and the DMA operations pointers &
iommu table instance for DMA to/from this device. For bus types I own
(like IBM VIO or EBUS), I just maintain that structure directly from the
bus code when creating the devices. But for bus types managed by generic
code like PCI or platform (actually, of_platform which is a variation of
platform linked to Open Firmware device-tree), I need this notifier.
- The other two ones have a completely different usage scenario. I have
cases where multiple devices and their drivers depend on each other. For
example, the IBM EMAC network driver needs to attach to a MAL DMA engine
which is a separate device, and a PHY interface which is also a separate
device. They are all of_platform_device's (well, about to be with my
upcoming patches) but there is no say in what precise order the core
will "probe" them and instanciate the various modules. The solution I
found for that is to have the drivers for emac to use multithread_probe,
and wait for a driver to be bound to the target MAL and PHY control
devices (the device-tree contains reference to the MAL and PHY interface
nodes, which I can then match to of_platform_devices). Right now, I've
been polling, but with that notifier, I can more cleanly wait (with a
timeout of course).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This updated patch adds the Intel ICH9 LPC and SMBus Controller DID's.
Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changes the pci_{enable,disable}_device() functions to work in a
nested basis, so that eg, three calls to enable_device() require three
calls to disable_device().
The reason for this is to simplify PCI drivers for
multi-interface/capability devices. These are devices that cram more
than one interface in a single function. A relevant example of that is
the Wireless [USB] Host Controller Interface (similar to EHCI) [see
http://www.intel.com/technology/comms/wusb/whci.htm].
In these kind of devices, multiple interfaces are accessed through a
single bar and IRQ line. For that, the drivers map only the smallest
area of the bar to access their register banks and use shared IRQ
handlers.
However, because the order at which those drivers load cannot be known
ahead of time, the sequence in which the calls to pci_enable_device()
and pci_disable_device() cannot be predicted. Thus:
1. driverA starts pci_enable_device()
2. driverB starts pci_enable_device()
3. driverA shutdown pci_disable_device()
4. driverB shutdown pci_disable_device()
between steps 3 and 4, driver B would loose access to it's device,
even if it didn't intend to.
By using this modification, the device won't be disabled until all the
callers to enable() have called disable().
This is implemented by replacing 'struct pci_dev->is_enabled' from a
bitfield to an atomic use count. Each caller to enable increments it,
each caller to disable decrements it. When the count increments from 0
to 1, __pci_enable_device() is called to actually enable the
device. When it drops to zero, pci_disable_device() actually does the
disabling.
We keep the backend __pci_enable_device() for pci_default_resume() to
use and also change the sysfs method implementation, so that userspace
enabling/disabling the device doesn't disable it one time too much.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Support a shadowed ROM when running with an ACPI capable PROM.
Define a new dev.resource flag IORESOURCE_ROM_BIOS_COPY to
describe the case of a BIOS shadowed ROM, which can then
be used to avoid pci_map_rom() making an unneeded call to
pci_enable_rom().
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Move some MSI-X #defines into pci_regs.h so they can be used
outside of drivers/pci.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as818b) simplifies autosuspend processing by keeping track
of the number of unsuspended children of each USB hub. This will
permit us to avoid a good deal of unnecessary work all the time; we
will no longer have to create a bunch of workqueue entries to carry
out autosuspend requests, only to have them fail because one of the
hub's children isn't suspended.
The basic idea is simple. There already is a usage counter in the
usb_device structure for preventing autosuspends. The patch just
increments that counter for every unsuspended child. There's only one
tricky part: When a device disconnects we need to remember whether it
was suspended at the time (leave the counter alone) or not (decrement
the counter).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as816) changes an existing flag in the usb_device
structure to a bitflag, preparing the way for more bitflags to come
in the future.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as814) adds usb_autopm_set_interface() to the autosuspend
API. It also provides convenient wrapper routines,
usb_autopm_enable() and usb_autopm_disable(), for drivers that want
to specify directly whether autosuspend should be allowed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have no benefits of having the usb_endpoint_* functions as functions,
but making them inline saves text and data segment sizes:
text data bss dec hex filename
14893634 3108770 1108840 19111244 1239d4c vmlinux.func
14893185 3108566 1108840 19110591 1239abf vmlinux.inline
This is the result of a 2.6.19-rc3 kernel compiled with GCC 4.1.1 without
CONFIG_MODULES, CONFIG_CC_OPTIMIZE_FOR_SIZE, CONFIG_REGPARM options set.
USB support is fully enabled (while most of the other drivers are not),
and that kernel has most of the USB code ported to use the endpoint
functions.
That happens because a call to those functions are expensive (in terms
of bytes), while the function's size is smaller or have the same 'size' of
the call.
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Current Wireless USB host hardware (Intel i1480 for example) allows up
to 22 devices to connect, thus bringing up the max number of children
in the WUSB Host Controller to 22 'fake' ports. Upcoming hardware
might raise that limit.
Makes almost no difference to go to 31, as the bit arrays are
byte-aligned (plus an extra bit in general), so 22 bits fit in 4 bytes
as 31 do.
As well, the only other array that depends on USB_MAXCHILDREN is
'struct usb_hub->indicator'. By declaring it 'u8' instead of 'enum
hub_led_mode', we reduce the size of each entry from 4 bytes (in i386)
to 1, which will add as we when are doubling USB_MAXCHILDREN
(with 16 the size of that array is 64 bytes, with 31 would be 128; by
using u8 that goes down to 31 bytes).
Signed-off-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Modern SD cards support a clock speed of 50 MHz. Make sure we test for
this capability and do the song and dance required to activate it.
Activating high speed support actually modifies the TRAN_SPEED field
of the CSD. But as the spec says that the cards must report 50 MHz,
we might as well skip re-reading the CSD.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
This change adds support for the mmc4 4-bit wide-bus mode.
The mmc4 spec defines 8-bit and 4-bit transfer modes. As we do not support
any 8-bit hardware, this patch only adds support for the 4-bit mode, but
it can easily be built upon when the time comes.
The 4-bit mode is electrically compatible with SD's 4-bit mode but the
procedure for turning it on is different. This patch implements only
the essential parts of the procedure as defined by the spec. Two additional
steps are recommended but not compulsory. I am documenting them here so
that there's a record.
1) A bus-test mechanism is implemented using dedicated mmc commands which allow
for testing the functionality of the data bus at the electrical level. This is
pretty paranoid and they way the commands work is not compatible with the mmc
subsystem (they don't set valid CRC values).
2) MMC v4 cards can indicate they would like to draw more than the default
amount of current in wide-bus modes. We currently will never switch the card
into a higher draw mode. Supposedly, allowing the card to draw more current
will let it perform better, but the specs seem to indicate that the card will
function correctly without the mode change. Empirical testing supports this
interpretation.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
This adds support for the high-speed modes defined by mmc v4
(assuming the host controller is up to it). On a TI sdhci controller,
it improves read speed from 1.3MBps to 2.3MBps. The TI controller can
only go up to 24MHz, but everything helps. Another person has taken
this basic patch and used it on a Nokia 770 to get a bigger boost
because that controller can run at 48MHZ.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
- ->init_queue() does not need the elevator passed in
- ->put_request() is a hot path and need not have the queue passed in
- cfq_update_io_seektime() does not need cfqd passed in
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c
users so that it supports requests larger than bio by chaining them together.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds a new timestamp message to blktrace, giving the timeofday when
we starting tracing. This helps user space correlate block trace events
with eg an application strace.
This requires a (compatible) update to blkparse. The changed blkparse
is still able to process traces generated by older kernels, and older
versions of blkparse should silently ignore the new records (because
they have a pid of 0).
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Changes persistant -> persistent. www.dictionary.com does not know
persistant (with an A), but should it be one of those things you can
spell in more than one correct way, let me know.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Fix various .c/.h typos in comments (no code changes).
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
jiffies.h includes a comment informing that jiffies_64 must be read with the
assistance of the xtime_lock seqlock. The comment text, however, calls
jiffies_64 "not volatile", which should probably read "not atomic".
Signed-off-by: Chase Venters <chase.venters@clientec.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
MAX_HEADER is either set to LL_MAX_HEADER or LL_MAX_HEADER + 48, and
this is controlled by a set of CONFIG_* ifdef tests.
It is trying to use LL_MAX_HEADER + 48 when any of the tunnels are
enabled which set hard_header_len like this:
dev->hard_header_len = LL_MAX_HEADER + sizeof(struct xxx);
The correct set of tunnel drivers which do this are:
ipip
ip_gre
ip6_tunnel
sit
so make the ifdef test match.
Noticed by Patrick McHardy and with help from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
You wouldn't think that doing an ALIGN() macro that aligns something up
to a power-of-two boundary would be likely to have bugs, would you?
But hey, in the wonderful world of mixing integer types, you have to be
careful. This just makes sure that the alignment is interpreted in the
same type as the thing to be aligned.
Thanks to Roland Dreier, who noticed that the amso1100 driver got broken
by the previous fix (that just extended the mask to "unsigned long", but
was still broken in "unsigned long long" - it just happened to be the
same on 64-bit architectures).
See commit 4c8bd7eeee for the history of
bugs here...
Acked-by: Roland Dreier <rdreier@cisco.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: David Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit ee3ce191e8, since it
broke on at least ARM, MIPS and PA-RISC due to complicated header file
dependencies.
Conflicts in include/linux/spinlock.h (due to the "nested" variety
fixes) fixed up by hand.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it break or warn if you pass to spin_lock_irqsave() and friends
something different from "unsigned long flags;". Suprisingly large amount
of these was caught by recent commit
c53421b18f and others.
Idea is largely from FRV typechecking. Suggestions from Andrew Morton.
All stupid typos in first version fixed.
Passes allmodconfig on i386, x86_64, alpha, arm as well as my usual config.
Note #1: checking with sparse is still needed, because a driver can save
and pass around flags or something. So far patch is very intrusive.
Note #2: techically, we should break only if
sizeof(flags) < sizeof(unsigned long),
however, the more pain for getting suspicious code into kernel,
the better.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.
For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.
To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.
Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).
However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().
In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).
Signed-Off-By: David Howells <dhowells@redhat.com>
Reclaim a word from the size of the work_struct by folding the pending bit and
the wq_data pointer together. This shouldn't cause misalignment problems as
all pointers should be at least 4-byte aligned.
Signed-Off-By: David Howells <dhowells@redhat.com>
Define a type for the work function prototype. It's not only kept in the
work_struct struct, it's also passed as an argument to several functions.
This makes it easier to change it.
Signed-Off-By: David Howells <dhowells@redhat.com>
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.
The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.
Signed-Off-By: David Howells <dhowells@redhat.com>
The IGMPV3_EXP() macro doesn't correctly shift the normalization bit, so
time-out values are longer than they should be.
Thanks to Dirk Ooms for finding the problem in IGMPv3 - MLDv2 had a
similar problem that was already fixed a year ago. :-(
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a quick hack to overcome the fact that SRCU currently does not
allow static initializers, and we need to sometimes initialize those
things before any other initializers (even "core" ones) can do so.
Currently we don't allow this at all for modules, and the only user that
needs is right now is cpufreq. As reported by Thomas Gleixner:
"Commit b4dfdbb3c7 ("[PATCH] cpufreq:
make the transition_notifier chain use SRCU breaks cpu frequency
notification users, which register the callback > on core_init
level."
Cc: Thomas Gleixner <tglx@timesys.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>,
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch has removed one too many semicolon in crypto.h.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Disable TSO on 5906 if CLKREQ is enabled.
[TCP]: Fix up sysctl_tcp_mem initialization.
[NETFILTER]: ip6_tables: use correct nexthdr value in ipv6_find_hdr()
[NETFILTER]: ip6_tables: fixed conflicted optname for getsockopt
[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue
[NETFILTER]: nfnetlink_log: fix byteorder of NFULA_SEQ_GLOBAL
[TG3]: Increase 5906 firmware poll time.
This adds fat_getattr() for setting stat->blksize. (FAT uses the size
of cluster for proper I/O)
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Due to hardware errata, TSO must be disabled if the PCI Express clock
request is enabled on 5906. The chip may hang when transmitting TSO
frames if CLKREQ is enabled.
Update version to 3.69.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
66 and 67 for getsockopt on IPv6 socket is doubly used for IPv6 Advanced
API and ip6tables. This moves numbers for ip6tables to 68 and 69.
This also kills XT_SO_* because {ip,ip6,arp}_tables doesn't have so much
common numbers now.
The old userland tools keep to behave as ever, because old kernel always
calls functions of IPv6 Advanced API for their numbers.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(David:)
If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.
But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it. That will eventually call down to the
non-hugepage version of unmap_page_range(). On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries. I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on.
(Hugh:)
prepare_hugepage_range() should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down. PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.
Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
is the check for too small a mapping); but even so, move up setting of
VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you call set_personality() with an expression such as:
set_personality(foo ? PERS_FOO1 : PERS_FOO2);
then this evaluates to:
((current->personality == foo ? PERS_FOO1 : PERS_FOO2) ? ...
which is obviously not the intended result. Add the missing parents
to ensure this gets evaluated as expected:
((current->personality == (foo ? PERS_FOO1 : PERS_FOO2)) ? ...
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- reorder 'struct vm_struct' to speedup lookups on CPUS with small cache
lines. The fields 'next,addr,size' should be now in the same cache line,
to speedup lookups.
- One minor cleanup in __get_vm_area_node()
- Bugfixes in vmalloc_user() and vmalloc_32_user() NULL returns from
__vmalloc() and __find_vm_area() were not tested.
[akpm@osdl.org: remove redundant BUG_ONs]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Cleaned up interrupt mapping a little by adding a helper
function which parses the irq out of the device-tree, and puts
it into a resource.
* Changed the arch/ppc platform files to specify PHY_POLL, instead of -1
* Changed the fixed phy to use PHY_IGNORE_INTERRUPT
* Added ethtool.h and mii.h to phy.h includes
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a variant of ht_create_irq __ht_create_irq that takes an
aditional parameter update that is a function that is called whenever we want
to write to a drivers htirq configuration registers.
This is needed to support the ipath_iba6110 because it's registers in the
proper location are not actually conected to the hardware that controlls
interrupt delivery.
[bos@serpentine.com: fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This refactoring actually optimizes the code a little by caching the value
that we think the device is programmed with instead of reading it back from
the hardware. Which simplifies the code a little and should speed things up a
bit.
This patch introduces the concept of a ht_irq_msg and modifies the
architecture read/write routines to update this code.
There is a minor consistency fix here as well as x86_64 forgot to initialize
the htirq as masked.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: <olson@pathscale.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some more errors from the IPMI send message command are retryable, but are not
being retried by the IPMI code. Make sure they get retried.
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Frederic Lelievre <Frederic.Lelievre@ca.kontron.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.
This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3. This is ugly. This will be fixed by later patches.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch takes the CTL_UNNUMBERD concept from NFS and makes it available to
all new sysctl users.
At the same time the sysctl binary interface maintenance documentation is
updated to mention and to describe what is needed to successfully maintain the
sysctl binary interface.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since it is becoming clear that there are just enough users of the binary
sysctl interface that completely removing the binary interface from the kernel
will not be an option for foreseeable future, we need to find a way to address
the sysctl maintenance issues.
The basic problem is that sysctl requires one central authority to allocate
sysctl numbers, or else conflicts and ABI breakage occur. The proc interface
to sysctl does not have that problem, as names are not densely allocated.
By not terminating a sysctl table until I have neither a ctl_name nor a
procname, it becomes simple to add sysctl entries that don't show up in the
binary sysctl interface. Which allows people to avoid allocating a binary
sysctl value when not needed.
I have audited the kernel code and in my reading I have not found a single
sysctl table that wasn't terminated by a completely zero filled entry. So
this change in behavior should not affect anything.
I think this mechanism eases the pain enough that combined with a little
disciple we can solve the reoccurring sysctl ABI breakage.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is needed on bigendian 64bit architectures.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a swsusp debugging mode. This does everything that's needed for a suspend
except for actually suspending. So we can look in the log messages and work
out a) what code is being slow and b) which drivers are misbehaving.
(1)
# echo testproc > /sys/power/disk
# echo disk > /sys/power/state
This should turn off the non-boot CPU, freeze all processes, wait for 5
seconds and then thaw the processes and the CPU.
(2)
# echo test > /sys/power/disk
# echo disk > /sys/power/state
This should turn off the non-boot CPU, freeze all processes, shrink
memory, suspend all devices, wait for 5 seconds, resume the devices etc.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Stefan Seyfried <seife@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
printk_ratelimit() has global state which makes it not useful for callers
which wish to perform ratelimiting at a particular frequency.
Add a printk_timed_ratelimit() which utilises caller-provided state storage to
permit more flexibility.
This function can in fact be used for things other than printk ratelimiting
and is perhaps poorly named.
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs2 fails to mount on x86_64, claiming bad magic. This is because
ufs_super_block_third's fs_un1 member is padded out by 4 bytes for 8-byte
alignment, pushing down the rest of the struct.
Forcing this to be packed solves it. I took a quick look over other
on-disk structures and didn't immediately find other problems. I was able
to mount & ls a populated ufs2 filesystem w/ this change.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ata_dev_revalidate() isn't used outside of libata core. Unexport it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix the last current kernel-doc warning:
Warning(/var/linsrc/linux-2619-rc3g5//include/linux/mtd/nand.h:416): No description found for parameter 'write_page'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
signal_struct is (mostly) protected by ->sighand->siglock, I think we don't
need ->taskstats_lock to protect ->stats. This also allows us to simplify the
locking in fill_tgid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
taskstats_tgid_free() is called on copy_process's error path. This is wrong.
IF (clone_flags & CLONE_THREAD)
We should not clear ->signal->taskstats, current uses it,
it probably has a valid accumulated info.
ELSE
taskstats_tgid_init() set ->signal->taskstats = NULL,
there is nothing to free.
Move the callsite to __exit_signal(). We don't need any locking, entire
thread group is exiting, nobody should have a reference to soon to be
released ->signal.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This means we can call it when the bitmap we want to fetch is declared
const.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If __vmalloc is called to allocate memory with GFP_ATOMIC in atomic
context, the chain of calls results in __get_vm_area_node allocating memory
for vm_struct with GFP_KERNEL, causing the 'sleeping from invalid context'
warning. This patch fixes it by passing the gfp flags along so
__get_vm_area_node allocates memory for vm_struct with the same flags.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The temp_priority field in zone is racy, as we can walk through a reclaim
path, and just before we copy it into prev_priority, it can be overwritten
(say with DEF_PRIORITY) by another reclaimer.
The same bug is contained in both try_to_free_pages and balance_pgdat, but
it is fixed slightly differently. In balance_pgdat, we keep a separate
priority record per zone in a local array. In try_to_free_pages there is
no need to do this, as the priority level is the same for all zones that we
reclaim from.
Impact of this bug is that temp_priority is copied into prev_priority, and
setting this artificially high causes reclaimers to set distress
artificially low. They then fail to reclaim mapped pages, when they are,
in fact, under severe memory pressure (their priority may be as low as 0).
This causes the OOM killer to fire incorrectly.
From: Andrew Morton <akpm@osdl.org>
__zone_reclaim() isn't modifying zone->prev_priority. But zone->prev_priority
is used in the decision whether or not to bring mapped pages onto the inactive
list. Hence there's a risk here that __zone_reclaim() will fail because
zone->prev_priority ir large (ie: low urgency) and lots of mapped pages end up
stuck on the active list.
Fix that up by decreasing (ie making more urgent) zone->prev_priority as
__zone_reclaim() scans the zone's pages.
This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created. It should be
possible to remove that now, and to just start out at DEF_PRIORITY?
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Consolidate page_cache_alloc
- Fix splice: only the pagecache pages and filesystem data need to use
mapping_gfp_mask.
- Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The multithreaded-probing code has a problem: after one initcall level (eg,
core_initcall) has been processed, we will then start processing the next
level (postcore_initcall) while the kernel threads which are handling
core_initcall are still executing. This breaks the guarantees which the
layered initcalls previously gave us.
IOW, we want to be multithreaded _within_ an initcall level, but not between
different levels.
Fix that up by causing the probing code to wait for all outstanding probes at
one level to complete before we start processing the next level.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds two functions to create and remove sysfs attributes and
attribute_group to all cpus. That allows to register sysfs attributes in
a subdirectory like: /sys/devices/system/cpu/cpuX/group_name/what_ever
This will be used by cbe_thermal to group all attributes dealing with
thermal support in one directory.
Signed-of-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Eliminate more __must_check madness.
The return code from device_for_each_child() depends on the values
which the helper function returns. If the helper function always
returns zero, it's utterly pointless to check the return code from
device_for_each_child().
The only code which knows if the return value should be checked is
the caller itself, so forcing the return code to always be checked
is silly. Hence, remove the __must_check annotation.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] libata-sff: Allow for wacky systems
[PATCH] ahci: readability tweak
[PATCH] libata: typo fix
[PATCH] ATA must depend on BLOCK
[PATCH] libata: use correct map_db values for ICH8
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86-64: Revert timer routing behaviour back to 2.6.16 state
[PATCH] x86-64: Overlapping program headers in physical addr space fix
[PATCH] x86-64: Put more than one cpu in TARGET_CPUS
[PATCH] x86: Revert new unwind kernel stack termination
[PATCH] x86-64: Use irq_domain in ioapic_retrigger_irq
[PATCH] i386: Disable nmi watchdog on all ThinkPads
[PATCH] x86-64: Revert interrupt backlink changes
[PATCH] x86-64: Fix ENOSYS in system call tracing
[PATCH] i386: Fix fake return address
[PATCH] x86-64: x86_64 add NX mask for PTE entry
[PATCH] x86-64: Speed up dwarf2 unwinder
[PATCH] x86: Use -maccumulate-outgoing-args
[PATCH] x86-64: fix page align in e820 allocator
[PATCH] x86-64: Fix for arch/x86_64/pci/Makefile CFLAGS
[PATCH] i386: fix .cfi_signal_frame copy-n-paste error
[PATCH] x86-64: typo in __assign_irq_vector when updating pos for vector and offset
[PATCH] x86-64: x86_64 hot-add memory srat.c fix
[PATCH] i386: Update defconfig
[PATCH] x86-64: Update defconfig
Mistyped an ifdef CONFIG_CPUSETS - fixed.
I doubt that anyone ever noticed. The impact of this typo was
that if someone:
1) was using MPOL_BIND to force off node allocations
2) while using cpusets to constrain memory placement
3) when that cpuset was migrating that jobs memory
4) while the tasks in that job were actively forking
then there was a rare chance that future allocations using
that MPOL_BIND policy would be node local, not off node.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reintroduce NODES_SPAN_OTHER_NODES for powerpc
Revert "[PATCH] Remove SPAN_OTHER_NODES config definition"
This reverts commit f62859bb68.
Revert "[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
This reverts commit a94b3ab7ea.
Also update the comments to indicate that this is still required
and where its used.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Kravetz <kravetz@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We seem to have lost the declaration of pci_get_device_reverse(), if we ever
had one.
Add a CONFIG_PCI=0 stub too.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
And a couple of bug fixes found by sparse.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Includes a couple of bugfixes found by sparse.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. so that you can use bitmaps with 32bit userspace on a 64 bit kernel.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Remove SUID when splicing into an inode
[PATCH] Add lockless helpers for remove_suid()
[PATCH] Introduce generic_file_splice_write_nolock()
[PATCH] Take i_mutex in splice_from_pipe()
This changes the dwarf2 unwinder to do a binary search for CIEs
instead of a linear work. The linker is unfortunately not
able to build a proper lookup table at link time, instead it creates
one at runtime as soon as the bootmem allocator is usable (so you'll continue
using the linear lookup for the first [hopefully] few calls).
The code should be ready to utilize a build-time created table once
a fixed linker becomes available.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (36 commits)
[Bluetooth] Fix HID disconnect NULL pointer dereference
[Bluetooth] Add missing entry for Nokia DTL-4 PCMCIA card
[Bluetooth] Add support for newer ANYCOM USB dongles
[NET]: Can use __get_cpu_var() instead of per_cpu() in loopback driver.
[IPV4] inet_peer: Group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS
[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err()
[NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layer
[NETPOLL]: initialize skb for UDP
[IPV6]: Fix route.c warnings when multiple tables are disabled.
[TG3]: Bump driver version and release date.
[TG3]: Add lower bound checks for tx ring size.
[TG3]: Fix set ring params tx ring size implementation
[NET]: reduce per cpu ram used for loopback stats
[IPv6] route: Fix prohibit and blackhole routing decision
[DECNET]: Fix input routing bug
[TCP]: Bound TSO defer time
[IPv4] fib: Remove unused fib_config members
[IPV6]: Always copy rt->u.dst.error when copying a rt6_info.
[IPV6]: Make IPV6_SUBTREES depend on IPV6_MULTIPLE_TABLES.
[IPV6]: Clean up BACKTRACK().
...
We are using NFS_REPLAY_ME as a special error value that is never leaked to
clients. That works fine; the only problem is mixing host- and network-
endian values in the same objects. Network-endian equivalent would work just
as fine; switch to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
don't use the same variable to store NFS and host error values
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
on-the-wire data is big-endian
[in large part pulled from Alexey's patch]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
svc_procfunc instances return __be32, not int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If invalidate_inode_pages2() fails, then it should in principle just be
because the current process was signalled. In that case, we just want to
ensure that the inode's page cache remains marked as invalid.
Also add a helper to allow the O_DIRECT code to simply mark the page cache as
invalid once it is finished writing, instead of calling
invalidate_inode_pages2() itself.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Despite mm.h is not being exported header, it does contain one thing
which is part of userspace ABI -- value disabling OOM killer for given
process. So,
a) create and export include/linux/oom.h
b) move OOM_DISABLE define there.
c) turn bounding values of /proc/$PID/oom_adj into defines and export
them too.
Note: mass __KERNEL__ removal will be done later.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
<linux/personality.h> contains the constants for personality(2) but also
some defintions that are useless or even harmful in userspace such as the
personality() macro.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Separate out the concept of "queue congestion" from "backing-dev congestion".
Congestion is a backing-dev concept, not a queue concept.
The blk_* congestion functions are retained, as wrappers around the core
backing-dev congestion functions.
This proper layering is needed so that NFS can cleanly use the congestion
functions, and so that CONFIG_BLOCK=n actually links.
Cc: "Thomas Maier" <balagi@justmail.de>
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Howells <dhowells@redhat.com>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Export the clear_queue_congested() and set_queue_congested() functions
located in ll_rw_blk.c
The functions are renamed to blk_clear_queue_congested() and
blk_set_queue_congested().
(needed in the pktcdvd driver's bio write congestion control)
Signed-off-by: Thomas Maier <balagi@justmail.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Right now users have to grab i_mutex before calling remove_suid(), in the
unlikely event that a call to ->setattr() may be needed. Split up the
function in two parts:
- One to check if we need to remove suid
- One to actually remove it
The first we can call lockless.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This allows file systems to manage their own i_mutex locking while
still re-using the generic_file_splice_write() logic.
OCFS2 in particular wants this so that it can order cluster locks within
i_mutex.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.
The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch limits the amount of time you will defer sending a TSO segment
to less than two clock ticks, or the time between two acks, whichever is
longer.
On slow links, deferring causes significant bursts. See attached plots,
which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue
for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness
causes significant jitter, tends to overflow queues early (bad for short
queues), and makes delay-based congestion control more difficult.
Deferring by a couple clock ticks I believe will have a relatively small
impact on performance.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a TIPC application to cancel an existing
topology service subscription by re-requesting the subscription
with the TIPC_SUB_CANCEL filter bit set. (All other bits of
the cancel request must match the original subscription request.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'ubuntu-updates' of master.kernel.org:/pub/scm/linux/kernel/git/bcollins/ubuntu-2.6:
[pci_ids] Add Quicknet XJ vendor/device ID's.
[valkyriefb] Ifdef for when CONFIG_NVRAM isn't enabled.
[platinumfb] Ifdef for when CONFIG_NVRAM isn't enabled.
[igafb] Add pci dev table for module auto loading.
[controlfb] Ifdef for when CONFIG_NVRAM isn't enabled.
[hid-core] TurboX Keyboard needs NOGET quirk.
[ixj] Add pci dev table for module auto loading.
[initio] Add pci dev table for module auto loading.
[fdomain] Add pci dev table for module auto loading.
[BusLogic] Add pci dev table for auto module loading.
[mv643xx] Add pci device table for auto module loading.
[alim7101] Add pci dev table for auto module loading.
This makes it possible to build pci hotplug drivers outside of the main
kernel tree, and Sam keeps telling me to move local header files to
their proper places...
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.
Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.
On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.
A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.
-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)
Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.
Solution:
The solution can come in multiple steps.
Suggested fix#1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".
Suggested fix#2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.
Suggested fix#3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.
Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.
You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In order to finish converting to pci_get_* interfaces we need to add a couple
of bits of missing functionaility
pci_get_bus_and_slot() provides the equivalent to pci_find_slot()
(pci_get_slot is already taken as a name for something similar but not the
same)
pci_get_device_reverse() is the equivalent of pci_find_device_reverse but
refcounting
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When IO error happens on metadata buffer, buffer is freed from memory and
later fsync() is called, filesystems like ext2 fail to report EIO. We
solve the problem by introducing a pointer to associated address space into
the buffer_head. When a buffer is removed from a list of metadata buffers
associated with an address space, IO error is transferred from the buffer to
the address space, so that fsync can later report it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is possible for the ->fopen callback from lockd into nfsd to find that an
answer cannot be given straight away (an upcall is needed) and so the request
has to be 'dropped', to be retried later. That error status is not currently
propagated back.
So:
Change nlm_fopen to return nlm error codes (rather than a private
protocol) and define a new nlm_drop_reply code.
Cause nlm_drop_reply to cause the rpc request to get rpc_drop_reply
when this error comes back.
Cause svc_process to drop a request which returns a status of
rpc_drop_reply.
[akpm@osdl.org: fix warning storm]
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Unless someone reads the documentation for write_seqcount_{begin,end} it is
not obvious, that i_size_write() needs locking. Especially, that lack of such
locking can result in a system hang.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce desc->name and eliminate the handle_irq_name() hack. Add
set_irq_chip_and_handler_name() to set the flow type and name at once.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make net_random() more widely available by calling it random32
akpm: hopefully this will permit the removal of carta_random32. That needs
confirmation from Stephane - this code looks somewhat more computationally
expensive, and has a different (ie: callee-stateful) interface.
[akpm@osdl.org: lots of build fixes, cleanups]
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Unify the following functions:
acpi_ec_poll_read()
acpi_ec_poll_write()
acpi_ec_poll_query()
acpi_ec_intr_read()
acpi_ec_intr_write()
acpi_ec_intr_query()
into:
acpi_ec_poll_transaction()
acpi_ec_intr_transaction()
These new functions take as arguments an ACPI EC command, a few bytes
to write to the EC data register and a buffer for a few bytes to read
from the EC data register. The old _read(), _write(), _query() are
just special cases of these functions.
Then unified the code in acpi_ec_poll_transaction() and
acpi_ec_intr_transaction() a little more. Both functions are now just
wrappers around the new acpi_ec_transaction_unlocked() function. The
latter contains the EC access logic, the two original
function now just do their special way of locking and call the the
new function for the actual work.
This saves a lot of very similar code. The primary reason for doing
this, however, is that my driver for MSI 270 laptops needs to issue
some non-standard EC commands in a safe way. Due to this I added a new
exported function similar to ec_write()/ec_write() which is called
ec_transaction() and is essentially just a wrapper around
acpi_ec_{poll,intr}_transaction().
Signed-off-by: Lennart Poettering <mzxreary@0pointer.de>
Acked-by: Luming Yu <luming.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] block layer: ioprio_best function fix
[PATCH] ide-cd: fix breakage with internally queued commands
[PATCH] block layer: elv_iosched_show should get elv_list_lock
[PATCH] splice: fix pipe_to_file() ->prepare_write() error path
[PATCH] block layer: elevator_find function cleanup
[PATCH] elevator: elevator_type member not used
We still need to maintain a private PC style command, since it
isn't completely unified with REQ_TYPE_BLOCK_PC yet.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
elevator_type field in elevator_type structure is useless:
it isn't used anywhere in kernel sources.
Signed-off-by: Vasily Tarasov <vtaras@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.
The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.
This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.
With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).
Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.
The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.
Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).
This patch: Fix the selinux side of things.
This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.
Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
The attached patch destroys all the dentries attached to a superblock in one go
by:
(1) Destroying the tree rooted at s_root.
(2) Destroying every entry in the anon list, one at a time.
(3) Each entry in the anon list has its subtree consumed from the leaves
inwards.
This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.
Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
functions added by this patch. This is because:
(1) at the point the filesystem calls generic_shutdown_super(), it is not
permitted to further touch the superblock's set of dentries, and nor may
it remove aliases from inodes;
(2) the dcache memory shrinker now skips dentries that are being unmounted;
and
(3) the superblock no longer has any external references through which the VFS
can reach it.
Given these points, the only locking we need to do is when we remove dentries
from the unused list and the name hashes, which we do a directory's worth at a
time.
We also don't need to guard against reference counts going to zero unexpectedly
and removing bits of the tree we're working on as nothing else can call dput().
A cut down version of dentry_iput() has been folded into
shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
things, it also doesn't need to check for inotify watches.
In this version of the patch, the complaint about a dentry still being in use
has been expanded from a single BUG_ON() and now gives much more information.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The nbd header uses __be32 and such types but doesn't actually include the
header that defines these things (linux/types.h); so let's include it.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's nothing arch-specific about check_signature(), so move it to
<linux/io.h>. Use a cross between the Alpha and i386 implementations as
the generic one.
Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lib/bitmap.c:bitmap_parse() is a library function that received as input a
user buffer. This seemed to have originated from the way the write_proc
function of the /proc filesystem operates.
This has been reworked to not use kmalloc and eliminates a lot of
get_user() overhead by performing one access_ok before using __get_user().
We need to test if we are in kernel or user space (is_user) and access the
buffer differently. We cannot use __get_user() to access kernel addresses
in all cases, for example in architectures with separate address space for
kernel and user.
This function will be useful for other uses as well; for example, taking
input for /sysfs instead of /proc, so it was changed to accept kernel
buffers. We have this use for the Linux UWB project, as part as the
upcoming bandwidth allocator code.
Only a few routines used this function and they were changed too.
Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of HDIO IOCTLs are not yet handled and a few others are marked
as using a pointer rather than an unsigned long. The formers include:
HDIO_GET_WCACHE, HDIO_GET_ACOUSTIC, HDIO_GET_ADDRESS and
HDIO_GET_BUSSTATE. The latters are: HDIO_SET_MULTCOUNT,
HDIO_SET_UNMASKINTR, HDIO_SET_KEEPSETTINGS, HDIO_SET_32BIT,
HDIO_SET_NOWERR, HDIO_SET_DMA, HDIO_SET_PIO_MODE and HDIO_SET_NICE.
Additionally 0x330 used to be HDIO_GETGEO_BIG and may be issued by 32-bit
`hdparm' run on a 64-bit kernel making Linux complain loudly.
This is a fix for these issues.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Module taint flags listing in Oops/panic has a couple of issues:
* taint_flags() doesn't null-terminate the buffer after printing the flags
* per-module taints are only set if the kernel is not already tainted
(with that particular flag) => only the first offending module gets its
taint info correctly updated
Some additional changes:
* 'license_gplok' is no longer needed - equivalent to !(taints &
TAINT_PROPRIETARY_MODULE) - so we can drop it from struct module *
exporting module taint info via /proc/module:
pwc 88576 0 - Live 0xf8c32000
evilmod 6784 1 pwc, Live 0xf8bbf000 (PF)
Signed-off-by: Florin Malita <fmalita@gmail.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a follow-up patch based on the review for perfmon2. This patch
adds the carta_random32() library routine + carta_random32.h header file.
This is fast, simple, and efficient pseudo number generator algorithm. We
use it in perfmon2 to randomize the sampling periods. In this context, we
do not need any fancy randomizer.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: David Mosberger <david.mosberger@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the epoll_pwait system call, that extend the event wait mechanism
with the same logic ppoll and pselect do. The definition of epoll_pwait
is:
int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
int timeout, const sigset_t *sigmask, size_t sigsetsize);
The difference between the vanilla epoll_wait and epoll_pwait is that the
latter allows the caller to specify a signal mask to be set while waiting
for events. Hence epoll_pwait will wait until either one monitored event,
or an unmasked signal happen. If sigmask is NULL, the epoll_pwait system
call will act exactly like epoll_wait. For the POSIX definition of
pselect, information is available here:
http://www.opengroup.org/onlinepubs/009695399/functions/select.html
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the lock debug checks below the page reserved checks. Also, having
debug_check_no_locks_freed in kernel_map_pages is wrong.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move '_hi' bits of block numbers in the larger part of the
block group descriptor structure
Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Similar to ext4, change blocks in JBD2 from sector_t to unsigned long long.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change ext4 in-kernel block type (ext4_fsblk_t) from sector_t to unsigned
long long. Remove ext4 block type string micro E3FSBLK, replaced with "%llu"
[akpm@osdl.org: build fix]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As we are planning to support 48-bit block numbers for ext4, we need to
support 48-bit block numbers for extended attributes. In the short term, we
can do this by reuse (on-disk) 16-bit padding (linux2.i_pad1 currently used
only by "hurd") as high order bits for xattr. This patch basically does that.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD layer in-kernel block varibles type fixes to support >32 bit block number
and convert to sector_t type.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the patch to JBD to handle 64 bit block numbers, originally from Zach
Brown. This patch is useful only after adding support for 64-bit block
numbers in the filesystem.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make it possible to add file preallocation support in future as an RO_COMPAT
feature by recognizing uninitialized extents as holes and limiting extent
length to keep the top bit of ee_len free for marking uninitialized extents.
Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Redefine ext3 in-kernel filesystem block type (ext3_fsblk_t) from unsigned
long to sector_t, to allow kernel to handle >32 bit ext3 blocks.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On disk extents format:
/*
* this is extent on-disk structure
* it's used at the bottom of the tree
*/
struct ext3_extent {
__le32 ee_block; /* first logical block extent covers */
__le16 ee_len; /* number of blocks covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start; /* low 32 bigs of physical block */
};
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To allow ext4 to build during the transition from jbd to jbd2, we have both
ext4_jbd.h and ext4_jbd2.h in the tree. We no longer need the former.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao originally did this work, and Shaggy reproduced it using some
scripts from her.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a simple copy of the files in fs/jbd to fs/jbd2 and
/usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Originally part of a patch from Mingming Cao and Randy Dunlap. Reorganized
by Shaggy.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao originally did this work, and Shaggy reproduced it using some
scripts from her.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Start of the ext4 patch series. See Documentation/filesystems/ext4.txt for
details.
This is a simple copy of the files in fs/ext3 to fs/ext4 and
/usr/incude/linux/ext3* to /usr/include/ex4*
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
commit fe1668ae5b causes kernel to oops with
libhugetlbfs test suite. The problem is that hugetlb pages can be shared
by multiple mappings. Multiple threads can fight over page->lru in the
unmap path and bad things happen. We now serialize __unmap_hugepage_range
to void concurrent linked list manipulation. Such serialization is also
needed for shared page table page on hugetlb area. This patch will fixed
the bug and also serve as a prepatch for shared page table.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This annotation makes it possible to assign a subclass on lock init. This
annotation is meant to reduce the _nested() annotations by assigning a
default subclass.
One could do without this annotation and rely on lockdep_set_class()
exclusively, but that would require a manual stack of struct lock_class_key
objects.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves code out of fs/xattr.c:listxattr into a new function -
vfs_listxattr. The code for vfs_listxattr was originally submitted by Bill
Nottingham <notting@redhat.com> to Unionfs.
Sorry about that. The reason for this submission is to make the
listxattr code in fs/xattr.c a little cleaner (as well as to clean up
some code in Unionfs.)
Currently, Unionfs has vfs_listxattr defined in its code. I think
that's very ugly, and I'd like to see it (re)moved. The logical place
to put it, is along side of all the other vfs_*xattr functions.
Overall, I think this patch is benefitial for both kernel.org kernel and
Unionfs.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.
In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.
So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.
max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.
Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SD cards extend the protocol by allowing the host to query a card how many
blocks were successfully stored on the medium. This allows us to safely write
chunks of blocks at once.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a kerneldoc warning and reorderd the description for is_init().
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial typo fix in the "syntax error if percpu macros are incorrectly
used" patch. I misspelled "identifier" in all places. D'Oh!
Thanks to Dirk Mueller to point this out.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a tickadj compatibility define for archs still using it.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a way for a no_page() handler to request a retry of the faulting
instruction. It goes back to userland on page faults and just tries again
in get_user_pages(). I added a cond_resched() in the loop in that later
case.
The problem I have with signal and spufs is an actual bug affecting apps and I
don't see other ways of fixing it.
In addition, we are having issues with infiniband and 64k pages (related to
the way the hypervisor deals with some HV cards) that will require us to muck
around with the MMU from within the IB driver's no_page() (it's a pSeries
specific driver) and return to the caller the same way using NOPAGE_REFAULT.
And to add to this, the graphics folks have been following a new approach of
memory management that involves transparently swapping objects between video
ram and main meory. To do that, they need installing PTEs from a no_page()
handler as well and that also requires returning with NOPAGE_REFAULT.
(For the later, they are currently using io_remap_pfn_range to install one PTE
from no_page() which is a bit racy, we need to add a check for the PTE having
already been installed afer taking the lock, but that's ok, they are only at
the proof-of-concept stage. I'll send a patch adding a "clean" function to do
that, we can use that from spufs too and get rid of the sparsemem hacks we do
to create struct page for SPEs. Basically, that provides a generic solution
for being able to have no_page() map hardware devices, which is something that
I think sound driver folks have been asking for some time too).
All of these things depend on having the NOPAGE_REFAULT exit path from
no_page() handlers.
Signed-off-by: Benjamin Herrenchmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
Typedef the IRQ handler function type.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)
Typedef the IRQ flow handler function type.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)
* 'for-2.6.19' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Document bi_sector and sector_t
[PATCH] helper function for retrieving scsi_cmd given host based block layer tag
This was necessitated by the need for a function to get back
to a scsi_cmnd, when an hba the posts its (corresponding) completion
interrupt with a block layer tag as its reference.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>