Commit Graph

34589 Commits

Author SHA1 Message Date
Eric W. Biederman b099ce2602 net: Batch inet_twsk_purge
This function walks the whole hashtable so there is no point in
passing it a network namespace.  Instead I purge all timewait
sockets from dead network namespaces that I find.  If the namespace
is one of the once I am trying to purge I am guaranteed no new timewait
sockets can be formed so this will get them all.  If the namespace
is one I am not acting for it might form a few more but I will
call inet_twsk_purge again and  shortly to get rid of them.  In
any even if the network namespace is dead timewait sockets are
useless.

Move the calls of inet_twsk_purge into batch_exit routines so
that if I am killing a bunch of namespaces at once I will just
call inet_twsk_purge once and save a lot of redundant unnecessary
work.

My simple 4k network namespace exit test the cleanup time dropped from
roughly 8.2s to 1.6s.  While the time spent running inet_twsk_purge fell
to about 2ms.  1ms for ipv4 and 1ms for ipv6.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:23:47 -08:00
Eric W. Biederman e9c5158ac2 net: Allow fib_rule_unregister to batch
Refactor the code so fib_rules_register always takes a template instead
of the actual fib_rules_ops structure that will be used.  This is
required for network namespace support so 2 out of the 3 callers already
do this, it allows the error handling to be made common, and it allows
fib_rules_unregister to free the template for hte caller.

Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
to allw multiple namespaces to be cleaned up in the same rcu grace
period.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:55 -08:00
Eric W. Biederman d79d792ef9 net: Allow xfrm_user_net_exit to batch efficiently.
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code.  Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL.  This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:03 -08:00
Eric W. Biederman 72ad937abd net: Add support for batching network namespace cleanups
- Add exit_list to struct net to support building lists of network
  namespaces to cleanup.

- Add exit_batch to pernet_operations to allow running operations only
  once during a network namespace exit.  Instead of once per network
  namespace.

- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
  up a network namespace does not need to be duplicated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:01 -08:00
Patrick McHardy 8153a10c08 ipv4 05/05: add sysctl to accept packets with local source addresses
commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:16:35 2009 +0100

    ipv4: add sysctl to accept packets with local source addresses

    Change fib_validate_source() to accept packets with a local source address when
    the "accept_local" sysctl is set for the incoming inet device. Combined with the
    previous patches, this allows to communicate between multiple local interfaces
    over the wire.

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:38 -08:00
Patrick McHardy 1b038a5e60 net 03/05: fib_rules: add oif classification
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:25 2009 +0100

    net: fib_rules: add oif classification

    Support routing table lookup based on the flow's oif. This is useful to
    classify packets originating from sockets bound to interfaces differently.

    The route cache already includes the oif and needs no changes.

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:36 -08:00
Patrick McHardy 491deb24bf net 02/05: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:23 2009 +0100

    net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME

    The next patch will add oif classification, rename interface related members
    and attributes to reflect that they're used for iif classification.

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:36 -08:00
Patrick McHardy d285834001 net 01/05: fib_rules: rearrange struct fib_rule
commit b8952893d5d86f69c4e499d191b98c6658f64b0f
Author: Patrick McHardy <kaber@trash.net>
Date:   Thu Dec 3 12:05:22 2009 +0100

    net: fib_rules: rearrange struct fib_rule

    The ifname member is only used to resolve interface names and is not needed
    during rule lookups. The target and ctarget members however are used during
    rule lookups and are currently located in a second cacheline.

    Move ifname further to the end to make sure both target and ctarget are
    located in the same cacheline as other members used during rule lookups.

    The layout on 64 bit changes from:

    struct fib_rule {
    	...
            u32                        table;                /*    56     4 */
            u8                         action;               /*    60     1 */

            /* XXX 3 bytes hole, try to pack */

            /* --- cacheline 1 boundary (64 bytes) --- */
            u32                        target;               /*    64     4 */

            /* XXX 4 bytes hole, try to pack */

            struct fib_rule *          ctarget;              /*    72     8 */
            struct rcu_head            rcu;                  /*    80    16 */
            struct net *               fr_net;               /*    96     8 */
    };

    to:

    struct fib_rule {
    	...
            u32                        table;                /*    40     4 */
            u8                         action;               /*    44     1 */

            /* XXX 3 bytes hole, try to pack */

            u32                        target;               /*    48     4 */

            /* XXX 4 bytes hole, try to pack */

            struct fib_rule *          ctarget;              /*    56     8 */
            /* --- cacheline 1 boundary (64 bytes) --- */
            char                       ifname[16];           /*    64    16 */
            struct rcu_head            rcu;                  /*    80    16 */
            struct net *               fr_net;               /*    96     8 */

    };

    Signed-off-by: Patrick McHardy <kaber@trash.net>

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:14:34 -08:00
Bartlomiej Zolnierkiewicz 95514fd8ff libata: add private driver field to struct ata_device
This brings struct ata_device in-line with struct ata_{port,host}.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 14:36:18 -05:00
Alan Cox 8e182a90f9 pata_piccolo: Driver for old Toshiba chipsets
We were never able to get docs for this out of Toshiba for years. Dave
Barnes produced a NetBSD driver however and from that we can fill in the
needed tables.

As we correct the PCI identifiers a bit also update the old ide generic driver
at the same time so it stays compiling.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 14:35:31 -05:00
Ingo Molnar 26fb20d008 Merge branch 'perf/mce' into perf/core
Merge reason: It's ready for v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:06 +01:00
Gustavo F. Padovan 4ec10d9720 Bluetooth: Implement RejActioned flag
RejActioned is used to prevent retransmission when a entity is on the
WAIT_F state, i.e., waiting for a frame with F-bit set due local busy
condition or a expired retransmission timer. (When these two events raise
they send a frame with the Poll bit set and enters in the WAIT_F state to
wait for a frame with the Final bit set.)
The local entity doesn't send I-frames(the data frames) until the receipt
of a frame with F-bit set. When that happens it also set RejActioned to false.
RejActioned is a mandatory feature of ERTM spec.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:24 +01:00
Gustavo F. Padovan 9f121a5a80 Bluetooth: Fix sending ReqSeq on I-frames
As specified by ERTM spec an ERTM channel can acknowledge received
I-frames(the data frames) by sending an I-frame with the proper ReqSeq
value (i.e. ReqSeq is set to BufferSeq).  Until now we aren't setting the
ReqSeq value on I-frame control bits. That way we can save sending
S-frames(Supervise frames) only to acknowledge receipt of I-frames. It
is very helpful to the full-duplex channel.
ReqSeq is the packet sequence number sent in an acknowledgement frame to
acknowledge receipt of frames up to (ReqSeq - 1).
BufferSeq controls the receiver buffer, it is used to delay
acknowledgement of new frames to not cause buffer overflow. BufferSeq
value is not increased until frames are pulled by reassembly function.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:23 +01:00
Marcel Holtmann c78ae28314 Bluetooth: Unobfuscate tasklet_schedule usage
The tasklet schedule function helpers are just an obfuscation. So remove
them and call the schedule functions directly.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:21 +01:00
Marcel Holtmann 76bca88012 Bluetooth: Turn hci_recv_frame into an exported function
For future simplification it is important that the hci_recv_frame
function is no longer an inline function. So move it into the module
itself and export it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03 19:34:20 +01:00
Vivek Goyal 31e4c28d95 blkio: Introduce blkio controller cgroup interface
o This is basic implementation of blkio controller cgroup interface. This is
  the common interface visible to user space and should be used by different
  IO control policies as we implement those.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:51 +01:00
Wu Fengguang b17621fed6 writeback: introduce wbc.for_background
It will lower the flush priority for NFS, and maybe more in future.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 13:54:25 +01:00
Jens Axboe 220d0b1dbf Merge branch 'master' into for-2.6.33 2009-12-03 13:49:39 +01:00
Steven Whitehouse 0ab7d13fcb GFS2: Tag all metadata with jid
There are two spare field in the header common to all GFS2
metadata. One is just the right size to fit a journal id
in it, and this patch updates the journal code so that each
time a metadata block is modified, we tag it with the journal
id of the node which is performing the modification.

The reason for this is that it should make it much easier to
debug issues which arise if we can tell which node was the
last to modify a particular metadata block.

Since the field is updated before the block is written into
the journal, each journal should only contain metadata which
is tagged with its own journal id. The one exception to this
is the journal header block, which might have a different node's
id in it, if that journal was recovered by another node in the
cluster.

Thus each journal will contain a record of which nodes recovered
it, via the journal header.

The other field in the metadata header could potentially be
used to hold information about what kind of operation was
performed, but for the time being we just zero it on each
transaction so that if we use it for that in future, we'll
know that the information (where it exists) is reliable.

I did consider using the other field to hold the journal
sequence number, however since in GFS2's journaling we write
the modified data into the journal and not the original
data, this gives no information as to what action caused the
modification, so I think we can probably come up with a better
use for those 64 bits in the future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03 11:58:47 +00:00
Steven Whitehouse 86e931a35e VFS: Export dquot_send_warning
Sending a message to userspace in a generic format to warn
of events (e.g. quota exceeded) in the quota subsystem is
a generically useful feature. This patch makes some minor
changes to the send_message function from dquot.c renaming
it quota_send_message, moving it to quota.c and exporting it
for use by filesystems which do not use the dquot code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-12-03 11:53:02 +00:00
Steven Whitehouse 796bd95247 VFS: Add forget_all_cached_acls()
This is required for cluster filesystems which want to use
cached ACLs so that they can invalidate the cache when
required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
2009-12-03 11:43:23 +00:00
Martin K. Petersen 98262f2762 block: Allow devices to indicate whether discarded blocks are zeroed
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down.  However, not all devices return zeroed
blocks after a discard.  Some drives return stale data, potentially
containing old superblocks.  It is therefore important to know whether
discarded blocks are properly zeroed.

Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation.  Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 09:24:48 +01:00
Christoph Hellwig 18f0f97850 libata: add translation for SCSI WRITE SAME (aka TRIM support)
Add support for the ATA TRIM command in libata.  We translate a WRITE SAME 16
command with the unmap bit set into an ATA TRIM command and export enough
information in READ CAPACITY 16 and the block limits EVPD page so that the new
SCSI layer discard support will driver this for us.

Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
merged we can fix it up to properly use the symbolic name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Tejun Heo 6013efd886 libata: retry failed FLUSH if device didn't fail it
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.

However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.

This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination.  Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode.  This was reported in the following bugzilla bug.

  http://bugzilla.kernel.org/show_bug.cgi?id=14543

This patch makes libata EH retry FLUSH if it wasn't failed by the
device.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Carsten Otte d7b0b5eb30 KVM: s390: Make psw available on all exits, not just a subset
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.

The userspace ABI is broken by this, however there are no applications
in the wild using this.  A capability check is provided so users can
verify the updated API exists.

Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka 3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity 65ac726404 KVM: VMX: Report unexpected simultaneous exceptions as internal errors
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults.  If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Avi Kivity a9c7399d6c KVM: Allow internal errors reported to userspace to carry extra data
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Jan Kiszka c54d2aba27 KVM: Reorder IOCTLs in main kvm.h
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:24 +02:00
Glauber Costa afbcf7ab8d KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:19 +02:00
Ed Swierk ffde22ac53 KVM: Xen PV-on-HVM guest support
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:18 +02:00
Zhai, Edwin d255f4f2ba KVM: introduce kvm_vcpu_on_spin
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.

Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Alexander Graf 10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Avi Kivity bfd99ff5d4 KVM: Move assigned device code to own file
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:09 +02:00
Gleb Natapov 136bdfeee7 KVM: Move irq ack notifier list to arch independent code
Mask irq notifier list is already there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov 3e71f88bc9 KVM: Maintain back mapping from irqchip/pin to gsi
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov 46e624b95c KVM: Change irq routing table to use gsi indexed array
Use gsi indexed array instead of scanning all entries on each interrupt
injection.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov 1a6e4a8c27 KVM: Move irq sharing information to irqchip level
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Avi Kivity 58988b07cf Merge remote branch 'tip/x86/entry' into kvm-updates/2.6.33
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:30:06 +02:00
James Morris c84d6efd36 Merge branch 'master' into next 2009-12-03 12:03:40 +05:30
Andrew Morton 7cff7ce94a include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
Maybe 4.1.0 doesn't too, but this fixed it for me.

Caused by:

 4a31276: x86: Turn the copy_from_user check into an (optional) compile time warning
 9f0cf4a: x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <200910090724.n997OQl6013538@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 07:29:17 +01:00
Ilpo Järvinen 8818a9d884 tcp: clear hints to avoid a stale one (nfs only affected?)
Eric Dumazet mentioned in a context of another problem:

"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"

I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).

...At least it compiles.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:24:02 -08:00
William Allen Simpson 4957faade1 TCPCT part 1g: Responder Cookie => Initiator
Parse incoming TCP_COOKIE option(s).

Calculate <SYN,ACK> TCP_COOKIE option.

Send optional <SYN,ACK> data.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

Requires:
   TCPCT part 1a: add request_values parameter for sending SYNACK
   TCPCT part 1b: generate Responder Cookie secret
   TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
   TCPCT part 1d: define TCP cookie option, extend existing struct's
   TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
   TCPCT part 1f: Initiator Cookie => Responder

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:26 -08:00
William Allen Simpson 435cf559f0 TCPCT part 1d: define TCP cookie option, extend existing struct's
Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration).  There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.

This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data.  This is more flexible and
less subject to user configuration error.  Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.

    "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
    http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html

    "Re: what a new TCP header might look like", May 12, 1998.
    ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail

These functions will also be used in subsequent patches that implement
additional features.

Requires:
   TCPCT part 1a: add request_values parameter for sending SYNACK
   TCPCT part 1b: generate Responder Cookie secret
   TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:25 -08:00
William Allen Simpson 519855c508 TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.

Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.

Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).

This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

These functions will also be used in subsequent patches that implement
additional features.

Requires:
   net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:24 -08:00
William Allen Simpson da5c78c826 TCPCT part 1b: generate Responder Cookie secret
Define (missing) hash message size for SHA1.

Define hashing size constants specific to TCP cookies.

Add new function: tcp_cookie_generator().

Maintain global secret values for tcp_cookie_generator().

This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.

Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.

These functions will also be used in subsequent patches that implement
additional features.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:23 -08:00
William Allen Simpson e6b4d11367 TCPCT part 1a: add request_values parameter for sending SYNACK
Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission.  Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 22:07:23 -08:00
Alexander Duyck c81c2d9544 skbuff: remove skb_dma_map/unmap
The two functions skb_dma_map/unmap are unsafe to use as they cause
problems when packets are cloned and sent to multiple devices while a HW
IOMMU is enabled.  Due to this it is best to remove the code so it is not
used by any other network driver maintainters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02 19:57:15 -08:00
Linus Torvalds 56f3f55cf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Correct WM831X_MAX_ISEL_VALUE
2009-12-02 15:41:49 -08:00
Hidetoshi Seto 0cf55e1ec0 sched, cputime: Introduce thread_group_times()
This is a real fix for problem of utime/stime values decreasing
described in the thread:

   http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

 - {u,s}time in task_struct are increased every time when the thread
   is interrupted by a tick (timer interrupt).

 - When a thread exits, its {u,s}time are added to signal->{u,s}time,
   after adjusted by task_times().

 - When all threads in a thread_group exits, accumulated {u,s}time
   (and also c{u,s}time) in signal struct are added to c{u,s}time
   in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

 - task_times(), to get cputime of a thread:
   This function returns adjusted values that originates from raw
   {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

 - thread_group_cputime(), to get cputime of a thread group:
   This function returns sum of all {u,s}time of living threads in
   the group, plus {u,s}time in the signal struct that is sum of
   adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

  group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

  group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

 - Modify thread's exit (= __exit_signal()) not to use task_times().
   It means {u,s}time in signal struct accumulates raw values instead
   of adjusted values.  As the result it makes thread_group_cputime()
   to return pure sum of "raw" values.

 - Introduce a new function thread_group_times(*task, *utime, *stime)
   that converts "raw" values of thread_group_cputime() to "adjusted"
   values, in same calculation procedure as task_times().

 - Modify group's exit (= wait_task_zombie()) to use this introduced
   thread_group_times().  It make c{u,s}time in signal struct to
   have adjusted values like before this patch.

 - Replace some thread_group_cputime() by thread_group_times().
   This replacements are only applied where conveys the "adjusted"
   cputime to users, and where already uses task_times() near by it.
   (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)

This patch have a positive side effect:

 - Before this patch, if a group contains many short-life threads
   (e.g. runs 0.9ms and not interrupted by ticks), the group's
   cputime could be invisible since thread's cputime was accumulated
   after adjusted: imagine adjustment function as adj(ticks, runtime),
     {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
   After this patch it will not happen because the adjustment is
   applied after accumulated.

v2:
 - remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 17:32:40 +01:00
Hidetoshi Seto d99ca3b977 sched, cputime: Cleanups related to task_times()
- Remove if({u,s}t)s because no one call it with NULL now.
- Use cputime_{add,sub}().
- Add ifndef-endif for prev_{u,s}time since they are used
  only when !VIRT_CPU_ACCOUNTING.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 17:32:39 +01:00
Hiroshi Shimamoto fa1452e808 locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
Reorder task_struct field for TRACE_IRQFLAGS to remove padding
on 64-bit.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4B135F50.8070302@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 10:24:37 +01:00
Frederic Weisbecker 6b62fe019e tracing/syscalls: Make syscall events print callbacks static
enter_syscall_print_##sname and exit_syscall_print_##sname don't
need to have a global scope. Make them static.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1259734990-9034-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 09:59:02 +01:00
Tejun Heo 8592e6486a sched: Revert 498657a478
498657a478 incorrectly assumed
that preempt wasn't disabled around context_switch() and thus
was fixing imaginary problem.  It also broke KVM because it
depended on ->sched_in() to be called with irq enabled so that
it can do smp calls from there.

Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.

Avi: spotted transposed in/out in the added comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: peterz@infradead.org
Cc: efault@gmx.de
Cc: rusty@rustcorp.com.au
LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 09:55:33 +01:00
David S. Miller ff9c38bba3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/mac80211/ht.c
2009-12-01 22:13:38 -08:00
Eric W. Biederman 65c0cfafce net: remove [un]register_pernet_gen_... and update the docs.
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:16:00 -08:00
Eric W. Biederman f875bae065 net: Automatically allocate per namespace data.
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code.  When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.

To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core.  The core now frees the data only after
all of the network namespace exit routines have run.

Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.

This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary.  For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric W. Biederman 2b035b3997 net: Batch network namespace destruction.
It is fairly common to kill several network namespaces at once.  Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments.  As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.

To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch.  For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.

An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.

In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric W. Biederman dcbccbd4f1 net: Implement for_each_netdev_reverse.
I will need this shortly to implement network namespace shutdown
batching.  For sanity sake network devices should be removed in
the reverse order they were created in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:50 -08:00
Eric W. Biederman a5ee155136 net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.

For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.

Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation.  So I have restored the original location of the final
synchronize_net.

Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:50 -08:00
Lai Jiangshan 3bbe84e9d3 trace_syscalls: Simplify syscall profile
use only one prof_sysenter_enable() instead of
prof_sysenter_enable_##sname()

use only one prof_sysenter_disable() instead of
prof_sysenter_disable_##sname()

use only one prof_sysexit_enable() instead of
prof_sysexit_enable_##sname()

use only one prof_sysexit_disable() instead of
prof_sysexit_disable_##sname()

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D2A1.8060304@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:30 +01:00
Lai Jiangshan a1301da099 trace_syscalls: Remove duplicate init_enter_##sname()
use only one init_syscall_trace instead of
many init_enter_##sname()/init_exit_##sname()

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D29B.6090708@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:30 +01:00
Lai Jiangshan c252f65793 trace_syscalls: Add syscall_nr field to struct syscall_metadata
Add syscall_nr field to struct syscall_metadata,
it helps us to get syscall number easier.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D293.6090800@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:29 +01:00
Lai Jiangshan fcc19438dd trace_syscalls: Remove enter_id exit_id
use ->enter_event->id instead of ->enter_id
use ->exit_event->id instead of ->exit_id

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D288.7030001@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:29 +01:00
Lai Jiangshan 31c16b1334 trace_syscalls: Set event_enter_##sname->data to its metadata
Set event_enter_##sname->data to its metadata,
it makes codes simpler.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D282.7050709@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:28 +01:00
Lai Jiangshan bf56a4ea9f trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
fix event_enter_##sname->event
fix event_exit_##sname->event

remove unused event_syscall_enter and event_syscall_exit

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D278.4090209@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-01 17:33:28 +01:00
David Howells f13a48bd79 SLOW_WORK: Move slow_work's proc file to debugfs
Move slow_work's debugging proc file to debugfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Requested-and-acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-01 08:20:31 -08:00
Takashi Iwai b00615d163 Merge branch 'topic/pcm-dma-fix' into topic/core-change 2009-12-01 15:58:15 +01:00
Takashi Iwai 75639e7ee1 Merge branch 'topic/beep-rename' into topic/core-change 2009-12-01 15:58:10 +01:00
Takashi Iwai 980f31c46b Merge branch 'topic/ice1724-quartet' into topic/hda 2009-12-01 15:57:01 +01:00
Mark Brown 7716977b6a mfd: Correct WM831X_MAX_ISEL_VALUE
There was confusion between the array size and the highest ISEL
value possible.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2009-12-01 11:24:19 +01:00
Haojian Zhuang b3a8549593 backlight: da903x_bl: control WLED output current in da9034
Update WLED output current source before changing brightness.

Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2009-12-01 09:02:34 +08:00
Jun Nie c41562b162 pxa168fb: remove useless vsync/hsync invert flag
fb_var_screeninfo.var has already encoded this information.

Signed-off-by: Jun Nie <njun@marvell.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2009-12-01 09:02:32 +08:00
Linus Torvalds 29e553631b Merge branch 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
* 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  mac80211: fix spurious delBA handling
  mac80211: fix two remote exploits
2009-11-30 16:47:16 -08:00
Linus Torvalds ed9fd93e9a Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] fix crash when disconnecting usb storage
  [SCSI] fix async scan add/remove race resulting in an oops
  [SCSI] sd: Return correct error code for DIF
2009-11-30 15:21:50 -08:00
Linus Torvalds cd79bf7b1c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  b44: Fix wedge when using netconsole.
  wan: cosa: drop chan->wsem on error path
  ep93xx-eth: check for zero MAC address on probe, not on device open
  NET: smc91x: Fix irq flags
  smsc9420: prevent BUG() if ethtool is called with interface down
  r8169: restore mac addr in rtl8169_remove_one and rtl_shutdown
  ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr OOPS
  e100: Use pci pool to work around GFP_ATOMIC order 5 memory allocation failure
  sctp: on T3_RTX retransmit all the in-flight chunks
  pktgen: Fix netdevice unregister
  macvlan: fix gso_max_size setting
  rfkill: fix miscdev ops
  ath9k: set ps_default as false
  hso: fix soft-lockup
  hso: fix debug routines
  pktgen: Fix device name compares
  stmmac: do not fail when the timer cannot be used.
  stmmac: fixed a compilation error when use the external timer
  netfilter: xt_limit: fix invalid return code in limit_mt_check()
  Au1x00: fix crash when trying register_netdev()
  ...
2009-11-30 14:01:36 -08:00
Linus Torvalds 6e80133f7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...
2009-11-30 13:33:48 -08:00
Johannes Berg 827d42c9ac mac80211: fix spurious delBA handling
Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was

  commit d75636ef9c
  Author: Johannes Berg <johannes@sipsolutions.net>
  Date:   Tue Feb 10 21:25:53 2009 +0100

    mac80211: RX aggregation: clean up stop session

and other parts were already present in the original

  commit d92684e660
  Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
  Date:   Mon Jan 28 14:07:22 2008 +0200

      mac80211: A-MPDU Tx add delBA from recipient support

The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.

The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.

For this case, there are two things necessary to fix it:
 1) don't process spurious delBA frames, and be more careful
    about the session state; don't drop the lock

 2) HT drivers need to be prepared to handle a session stop
    even before the session was really started -- this is
    true for all drivers (that support aggregation) but
    iwlwifi which can be fixed easily. The other HT drivers
    (ath9k and ar9170) are behaving properly already.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-30 13:55:51 -05:00
Avi Kivity 8e7cac7980 core: Fix user return notifier on fork()
fork() clones all thread_info flags, including
TIF_USER_RETURN_NOTIFY; if the new task is first scheduled on a cpu
which doesn't have user return notifiers set, this causes user
return notifiers to trigger without any way of clearing itself.

This is easy to trigger with a forky workload on the host in
parallel with kvm, resulting in a cpu in an endless loop on the
verge of returning to userspace.

Fix by dropping the TIF_USER_RETURN_NOTIFY immediately after fork.

Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1259505288-16559-1-git-send-email-avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-29 22:03:04 +01:00
David S. Miller 9b963e5d0e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/ieee802154/fakehard.c
	drivers/net/e1000e/ich8lan.c
	drivers/net/e1000e/phy.c
	drivers/net/netxen/netxen_nic_init.c
	drivers/net/wireless/ath/ath9k/main.c
2009-11-29 00:57:15 -08:00
PJ Waskiewicz 5789d290cd ethtool: Add Direct Attach support to connector port reporting
This patch allows a base driver to specify Direct Attach as the
type of port through the ethtool interface.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:34:00 -08:00
andrew hendry 2f5517aefc X25: Move SYSCTL ifdefs into header
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:24:59 -08:00
David S. Miller 5656b6ca19 Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev 2009-11-29 00:16:22 -08:00
Andrei Pelinescu-Onciul 5fdd4baef6 sctp: on T3_RTX retransmit all the in-flight chunks
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
 b6157d8e03.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29 00:14:02 -08:00
Samuel Ortiz 67fbb16be6 nl80211: PMKSA caching support
This is an interface to set, delete and flush PMKIDs through nl80211.
Main users would be fullmac devices which firmwares are capable of
generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-28 15:05:05 -05:00
Johannes Berg 8c0c709eea mac80211: move cmntr flag out of rx flags
The RX flags should soon be used only for flags
that cannot change within an a-MPDU, so move the
cooked monitor flag into the RX status flags.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-28 15:05:01 -05:00
Dominik Brodowski 5fa9167a1b pcmcia: rework the irq_req_t typedef
Most of the irq_req_t typedef'd struct can be re-worked quite
easily:

(1) IRQInfo2 was unused in any case, so drop it.

(2) IRQInfo1 was used write-only, so drop it.

(3) Instance (private data to be passed to the IRQ handler):
	Most PCMCIA drivers using pcmcia_request_irq() to actually
	register an IRQ handler set the "dev_id" to the same pointer
	as the "priv" pointer in struct pcmcia_device. Modify the two
	exceptions (ipwireless, ibmtr_cs) to also work this waym and
	set the IRQ handler's "dev_id" to p_dev->priv unconditionally.

(4) Handler is to be of type irq_handler_t.

(5) Handler != NULL already tells whether an IRQ handler is present.
	Therefore, we do not need the IRQ_HANDLER_PRESENT flag in
	irq_req_t.Attributes.

CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-ide@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: alsa-devel@alsa-project.org
CC: Jaroslav Kysela <perex@perex.cz>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Karsten Keil <isdn@linux-pingi.de>
for the Bluetooth parts: Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:03:14 +01:00
Dominik Brodowski dd2e5a1565 pcmcia: remove deprecated handle_to_dev() macro
Update remaining users and remove deprecated handle_to_dev() macro

CC: Harald Welte <laforge@gnumonks.org>
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: linux-serial@vger.kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:03:10 +01:00
Dominik Brodowski 6838b03fc6 pcmcia: pcmcia_request_window() doesn't need a pointer to a pointer
pcmcia_request_window() only needs a pointer to struct pcmcia_device, not
a pointer to a pointer.

CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: Jiri Kosina <jkosina@suse.cz>
Acked-by: Karsten Keil <keil@b1-systems.de> (for ISDN)
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:02:58 +01:00
Dominik Brodowski 82f88e3600 pcmcia: remove unused "window_t" typedef
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:02:52 +01:00
Magnus Damm 0bdf9b3dd3 pcmcia: Change window_handle_t logic to unsigned long
Logic changes based on top of the other patches:

This set of patches changed window_handle_t from being a pointer to an
unsigned long. The unsigned long is now a simple index into socket->win[].
Going from a pointer to unsigned long should leave the user space interface
unchanged unless I'm mistaken.

This change results in code that is less error prone and a user space
interface which is much cleaner and safer. A nice side effect is that we
are also are able to remove all members except one from window_t.

[ linux@dominikbrodowski.net:
	Update to 2.6.31. Also, a plain "index" to socket->win[] does not
	work, as several codepaths rely on "window_handle_t" being
	non-zero if used. Therefore, set the window_handle_t to the
	socket->win[] index + 1. ]

CC: netdev@vger.kernel.org
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:02:50 +01:00
Magnus Damm 16456ebabf pcmcia: Pass struct pcmcia_socket to pcmcia_get_mem_page()
No logic changes, just pass struct pcmcia_socket to pcmcia_get_mem_page()

[linux@dominikbrodowski.net: update to 2.6.31]
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:02:49 +01:00
Magnus Damm 868575d1e8 pcmcia: Pass struct pcmcia_device to pcmcia_map_mem_page()
No logic changes, just pass struct pcmcia_device to pcmcia_map_mem_page()

[linux@dominikbrodowski.net: update to 2.6.31]
CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: Jiri Kosina <jkosina@suse.cz>
Acked-by: Karsten Keil <keil@b1-systems.de> (for ISDN)
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:02:13 +01:00
Magnus Damm f5560da549 pcmcia: Pass struct pcmcia_device to pcmcia_release_window()
No logic changes, just pass struct pcmcia_device to pcmcia_release_window().

[linux@dominikbrodowski.net: update to 2.6.31]
CC: netdev@vger.kernel.org
CC: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2009-11-28 18:01:26 +01:00
Thomas Kunze 9ca3dc805c add gpiolib support to ucb1x00
The old access methods to the gpios will be removed when
all users has been converted. (mainly ucb1x00-ts)
2009-11-27 21:07:21 +01:00
Thomas Kunze c8602edf3f move drivers/mfd/*.h to include/linux/mfd
So drivers like collie_battery driver can use
those files easier.
2009-11-27 21:07:18 +01:00
Frederic Weisbecker dd1853c3f4 hw-breakpoints: Use struct perf_event_attr to define kernel breakpoints
Kernel breakpoints are created using functions in which we pass
breakpoint parameters as individual variables: address, length
and type.

Although it fits well for x86, this just does not scale across
architectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.

Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27 06:22:59 +01:00
Frederic Weisbecker 5fa10b28e5 hw-breakpoints: Use struct perf_event_attr to define user breakpoints
In-kernel user breakpoints are created using functions in which
we pass breakpoint parameters as individual variables: address,
length and type.

Although it fits well for x86, this just does not scale across
archictectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.

Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27 06:22:58 +01:00
Patrick McHardy 5e75659305 vlan: support "loose binding" to the underlying network device
Currently the UP/DOWN state of VLANs is synchronized to the state of the
underlying device, meaning all VLANs are set down once the underlying
device is set down. This causes all routes to the VLAN devices to vanish.

Add a flag to specify a "loose binding" mode, in which only the operstate
is transfered, but the VLAN device state is independant.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 16:00:36 -08:00
Arnd Bergmann 27c0b1a850 macvlan: export macvlan mode through netlink
In order to support all three modes of macvlan at
runtime, extend the existing netlink protocol
to allow choosing the mode per macvlan slave
interface.

This depends on a matching patch to iproute2
in order to become accessible in user land.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 15:53:10 -08:00