Commit Graph

13481 Commits

Author SHA1 Message Date
Vlad Yasevich f68b2e05f3 sctp: Fix SCTP_MAXSEG socket option to comply to spec.
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:21:00 -04:00
Vlad Yasevich cb95ea32a4 sctp: Don't do NAGLE delay on large writes that were fragmented small
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich b29e790728 sctp: Nagle delay should be based on path mtu
The decision to delay due to Nagle should be based on the path mtu
and future packet size.  We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation.  This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich d4d6fb5787 sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.
We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing.  The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing.  The fix is to use the peers old rwnd and add our flight
size to it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich 4d3c46e683 sctp: drop a_rwnd to 0 when receive buffer overflows.
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:59 -04:00
Vlad Yasevich 33ce828131 sctp: Clear fast_recovery on the transport when T3 timer expires.
If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void.  We can clear the fast recovery
flag.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Vlad Yasevich b9f8478682 sctp: Fix error count increments that were results of HEARTBEATS
SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint.  As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon.  This goes for both
the overall error count and the path error count.

Now, there is a difference in how the detection is done
between the two.  The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold.  The overall error detection
is done _BEFORE_ the increment.  Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.

Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Alexey Dobriyan d71a09ed55 sctp: use proc_create()
create_proc_entry() is deprecated (not formally, though).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Wei Yongjun dadb50cc1a sctp: fix check the chunk length of received HEARTBEAT-ACK chunk
The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
  sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory.  We should really make sure that the chunk format
is what we expect, before attempting to touch the data.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:58 -04:00
Wei Yongjun a2f36eec56 sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN
If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:57 -04:00
Vlad Yasevich 9c5c62be2f sctp: Send user messages to the lower layer as one
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:57 -04:00
Vlad Yasevich 5d7ff261ef sctp: Try to encourage SACK bundling with DATA.
If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich e83963b769 sctp: Generate SACKs when actually sending outbound DATA
We are now trying to bundle SACKs when we have outbound
DATA to send.  However, there are situations where this
outbound DATA will not be sent (due to congestion or 
available window).  In such cases it's ok to wait for the
timer to expire.  This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.

Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich 3e62abf92f sctp: Fix data segmentation with small frag_size
Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation.   Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.

What we need to do is track the largest possbile DATA chunk that
can fit into the mtu.  Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down.  Otherwise,
we just use the smaller segment size without changing it further.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Vlad Yasevich bec9640bb0 sctp: Disallow new connection on a closing socket
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down.  If this was a result of a close() call, there will be no socket
and will cause a memory leak.  We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:56 -04:00
Doug Graham af87b823ca sctp: Fix piggybacked ACKs
This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet.  The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
following paragraph from section 6.2 had not been implemented correctly:

   Before an endpoint transmits a DATA chunk, if any received DATA
   chunks have not been acknowledged (e.g., due to delayed ack), the
   sender should create a SACK and bundle it with the outbound DATA
   chunk, as long as the size of the final SCTP packet does not exceed
   the current MTU.  See Section 6.2.

When about to send a DATA chunk, the code now checks to see if the SACK
timer is running.  If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer.  For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request.  Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms).  This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.

Signed-off-by: Doug Graham <dgraham@nortel.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:55 -04:00
Vlad Yasevich 40187886bc sctp: release cached route when the transport goes down.
When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything.  This way, if a better route
or source address is available, we'll try to use it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:55 -04:00
Wei Yongjun 3cd9749c0b sctp: update the route for non-active transports after addresses are added
Update the route and saddr entries for the non-active transports as some
of the added addresses can be used as better source addresses, or may
be there is a better route.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:55 -04:00
Wei Yongjun 44e65c1ef1 sctp: check the unrecognized ASCONF parameter before access it
This patch fix to check the unrecognized ASCONF parameter before
access it.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:54 -04:00
Wei Yongjun 425e0f6852 sctp: avoid overwrite the return value of sctp_process_asconf_ack()
The return value of sctp_process_asconf_ack() may be
overwritten while process parameters with no error.
This patch fixed the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04 18:20:54 -04:00
Cosmin Ratiu a8fdf2b331 ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header
Here is a patch which fixes an issue observed when using TCP over IPv6
and AH from IPsec.

When a connection gets closed the 4-way method and the last ACK from
the server gets dropped, the subsequent FINs from the client do not
get ACKed because tcp_v6_send_response does not set the transport
header pointer. This causes ah6_output to try to allocate a lot of
memory, which typically fails, so the ACKs never make it out of the
stack.

I have reproduced the problem on kernel 2.6.7, but after looking at
the latest kernel it seems the problem is still there.

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 20:44:38 -07:00
Eric Dumazet 1a123a3168 vlan: adds drops accounting
Its hard to tell if vlans are dropping frames, since
every frame given to vlan_???_start_xmit() functions
is accounted as fully transmitted by lower device.

We can test dev_queue_xmit() return values to
properly account for dropped frames.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 20:02:17 -07:00
Eric Dumazet 55f9d6786d net: Remove debugging code
Remove a debugging aid I accidently left in previous 'cleanup' patch

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 05:17:20 -07:00
Eric Dumazet 2f8bc32b7a vlan: enable multiqueue xmits
vlan_dev_hard_start_xmit() & vlan_dev_hwaccel_hard_start_xmit()
select txqueue number 0, instead of using index provided by
skb_get_queue_mapping().

This is not correct after commit 2e59af3dcb
[vlan: multiqueue vlan device] because
txq->tx_packets  & txq->tx_bytes changes are performed on
a single location, and not the right locking.

Fix is to take the appropriate struct netdev_queue pointer

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 02:19:58 -07:00
Eric Dumazet d1b19dff91 net: net/core/dev.c cleanups
Pure style cleanup patch before surgery :)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 01:29:39 -07:00
Karl Hiramoto 137742cf97 atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() when we can send packets again.
This patch removes the call to dev_kfree_skb() when the atm device is busy.
Calling dev_kfree_skb() causes heavy packet loss then the device is under
heavy load, the more correct behavior should be to stop the upper layers,
then when the lower device can queue packets again wake the upper layers.

Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:46:10 -07:00
Wu Fengguang aa1330766c tcp: replace hard coded GFP_KERNEL with sk_allocation
This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:45:45 -07:00
Ajit Khaparde 05c6a8d7a7 net/ethtool: Add support for the ethtool feature to flash firmware image from a specified file.
This patch adds support to flash a firmware image to a device using ethtool.
The driver gets the filename of the firmware image and flashes the image
using the request firmware path.

The region "on the chip" to be flashed can be specified by an option.
It is upto the device driver to enumerate the region number passed by ethtool,
to the region to be flashed.

The default behavior is to flash all the regions on the chip.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:07:39 -07:00
Eric Dumazet 16ebb5e0b3 tc: Fix unitialized kernel memory leak
Three bytes of uninitialized kernel memory are currently leaked to user

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 22:55:17 -07:00
Eric Dumazet 6ce9e7b5fe ip: Report qdisc packet drops
Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 18:05:33 -07:00
Eric Dumazet 2e59af3dcb vlan: multiqueue vlan device
vlan devices are currently not multi-queue capable.

We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from real device.

register_vlan_device() is also handled.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 18:03:00 -07:00
Neil Horman 5848cc096a net: drop_monitor: make last_rx timestamp private
It was recently pointed out to me that the last_rx field of the
net_device structure wasn't updated regularly.  In fact only the
bonding driver really uses it currently.  Since the drop_monitor code
relies on the last_rx field to detect drops on recevie in hardware, We
need to find a more reliable way to rate limit our drop checks (so
that we don't check for drops on every frame recevied, which would be
inefficient.  This patch makes a last_rx timestamp that is private to
the drop monitor code and is updated for every device that we track.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 14:37:45 -07:00
David S. Miller 3f968de276 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-09-02 14:18:09 -07:00
Bob Copeland fcc6cb0c13 cfg80211: fix looping soft lockup in find_ie()
The find_ie() function uses a size_t for the len parameter, and
directly uses len as a loop variable.  If any received packets
are malformed, it is possible for the decrease of len to overflow,
and since the result is unsigned, the loop will not terminate.
Change it to a signed int so the loop conditional works for
negative values.

This fixes the following soft lockup:

[38573.102007] BUG: soft lockup - CPU#0 stuck for 61s! [phy0:2230]
[38573.102007] Modules linked in: aes_i586 aes_generic fuse af_packet ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter ip_tables x_tables acpi_cpufreq binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod kvm_intel kvm uinput i915 arc4 ecb drm snd_hda_codec_idt ath5k snd_hda_intel hid_apple mac80211 usbhid appletouch snd_hda_codec snd_pcm ath cfg80211 snd_timer i2c_algo_bit ohci1394 video snd processor ieee1394 rfkill ehci_hcd sg sky2 backlight snd_page_alloc uhci_hcd joydev output ac thermal button battery sr_mod applesmc cdrom input_polldev evdev unix [last unloaded: scsi_wait_scan]
[38573.102007] irq event stamp: 2547724535
[38573.102007] hardirqs last  enabled at (2547724534): [<c1002ffc>] restore_all_notrace+0x0/0x18
[38573.102007] hardirqs last disabled at (2547724535): [<c10038f4>] apic_timer_interrupt+0x28/0x34
[38573.102007] softirqs last  enabled at (92950144): [<c103ab48>] __do_softirq+0x108/0x210
[38573.102007] softirqs last disabled at (92950274): [<c1348e74>] _spin_lock_bh+0x14/0x80
[38573.102007]
[38573.102007] Pid: 2230, comm: phy0 Tainted: G        W  (2.6.31-rc7-wl #8) MacBook1,1
[38573.102007] EIP: 0060:[<f8ea2d50>] EFLAGS: 00010292 CPU: 0
[38573.102007] EIP is at cmp_ies+0x30/0x180 [cfg80211]
[38573.102007] EAX: 00000082 EBX: 00000000 ECX: ffffffc1 EDX: d8efd014
[38573.102007] ESI: ffffff7c EDI: 0000004d EBP: eee2dc50 ESP: eee2dc3c
[38573.102007]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[38573.102007] CR0: 8005003b CR2: d8efd014 CR3: 01694000 CR4: 000026d0
[38573.102007] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[38573.102007] DR6: ffff0ff0 DR7: 00000400
[38573.102007] Call Trace:
[38573.102007]  [<f8ea2f8d>] cmp_bss+0xed/0x100 [cfg80211]
[38573.102007]  [<f8ea33e4>] cfg80211_bss_update+0x84/0x410 [cfg80211]
[38573.102007]  [<f8ea3884>] cfg80211_inform_bss_frame+0x114/0x180 [cfg80211]
[38573.102007]  [<f97255ff>] ieee80211_bss_info_update+0x4f/0x180 [mac80211]
[38573.102007]  [<f972b118>] ieee80211_rx_bss_info+0x88/0xf0 [mac80211]
[38573.102007]  [<f9739297>] ? ieee802_11_parse_elems+0x27/0x30 [mac80211]
[38573.102007]  [<f972b224>] ieee80211_rx_mgmt_probe_resp+0xa4/0x1c0 [mac80211]
[38573.102007]  [<f972bc59>] ieee80211_sta_rx_queued_mgmt+0x919/0xc50 [mac80211]
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c105ffd0>] ? mark_held_locks+0x60/0x80
[38573.102007]  [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70
[38573.102007]  [<c134baa5>] ? sub_preempt_count+0x85/0xc0
[38573.102007]  [<c1348bce>] ? _spin_unlock_irqrestore+0x3e/0x70
[38573.102007]  [<c12c1c0f>] ? skb_dequeue+0x4f/0x70
[38573.102007]  [<f972c021>] ieee80211_sta_work+0x91/0xb80 [mac80211]
[38573.102007]  [<c1009707>] ? sched_clock+0x27/0xa0
[38573.102007]  [<c134baa5>] ? sub_preempt_count+0x85/0xc0
[38573.102007]  [<c10479af>] worker_thread+0x18f/0x320
[38573.102007]  [<c104794e>] ? worker_thread+0x12e/0x320
[38573.102007]  [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70
[38573.102007]  [<f972bf90>] ? ieee80211_sta_work+0x0/0xb80 [mac80211]
[38573.102007]  [<c104cbb0>] ? autoremove_wake_function+0x0/0x50
[38573.102007]  [<c1047820>] ? worker_thread+0x0/0x320
[38573.102007]  [<c104c854>] kthread+0x84/0x90
[38573.102007]  [<c104c7d0>] ? kthread+0x0/0x90
[38573.102007]  [<c1003ab7>] kernel_thread_helper+0x7/0x10

Cc: stable@kernel.org
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-02 15:29:04 -04:00
Luis R. Rodriguez abd8ea22c2 wireless: remove mac80211 rate selection extra menu
We can just display this upon enabling mac80211 with an
'if MAC80211 != n' check.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-02 15:29:03 -04:00
Luis R. Rodriguez 253850c10d wireless: update reg debug kconfig entry
Refer to the wireless wiki for more information.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-02 15:29:03 -04:00
Stephen Hemminger 5ca1b998d3 net: file_operations should be const
All instances of file_operations should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:53 -07:00
Stephen Hemminger 3b401a81c0 inet: inet_connection_sock_af_ops const
The function block inet_connect_sock_af_ops contains no data
make it constant.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:49 -07:00
Stephen Hemminger b2e4b3debc tcp: MD5 operations should be const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:43 -07:00
Stephen Hemminger 98147d527a net: seq_operations should be const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 01:03:39 -07:00
David S. Miller 6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
Eric Dumazet 0625491493 ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS
qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 18:37:16 -07:00
Xiao Guangrong f2798eb4e0 drop_monitor: fix trace_napi_poll_hit()
The net_dev of backlog napi is NULL, like below:

__get_cpu_var(softnet_data).backlog.dev == NULL

So, we should check it in napi tracepoint's probe function

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 18:18:12 -07:00
David S. Miller 2fbd3da387 pkt_sched: Revert tasklet_hrtimer changes.
These are full of unresolved problems, mainly that conversions don't
work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
tasklets can't be killed from softirq context.

And when a qdisc gets reset, that's exactly what we need to do here.

We'll work this out in the net-next-2.6 tree and if warranted we'll
backport that work to -stable.

This reverts the following 3 changesets:

a2cb6a4dd4
("pkt_sched: Fix bogon in tasklet_hrtimer changes.")

38acce2d79
("pkt_sched: Convert CBQ to tasklet_hrtimer.")

ee5f9757ea
("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:59:25 -07:00
Jarek Poplawski d66ee0587c net: sk_free() should be allowed right after sk_alloc()
After commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
sk_free() frees socks conditionally and depends
on sk_wmem_alloc being set e.g. in sock_init_data(). But in some
cases sk_free() is called earlier, usually after other alloc errors.

Fix is to move sk_wmem_alloc initialization from sock_init_data()
to sk_alloc() itself.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:49:00 -07:00
Stephen Hemminger 89d69d2b75 net: make neigh_ops constant
These tables are never modified at runtime. Move to read-only
section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:57 -07:00
Alexey Dobriyan 86393e52c3 netns: embed ip6_dst_ops directly
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:31 -07:00
Patrick McHardy 8a56df0ae1 netfilter: ebt_ulog: fix checkentry return value
Commit 19eda87 (netfilter: change return types of check functions for
Ebtables extensions) broke the ebtables ulog module by missing a return
value conversion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-09-01 14:34:01 +02:00
Damian Lukowski 6fa12c8503 Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.

For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.

However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).

In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.

This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.

Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.

The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:47 -07:00
Damian Lukowski f1ecd5d9e7 Revert Backoff [v3]: Revert RTO on ICMP destination unreachable
Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.

This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.

Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:42 -07:00
Damian Lukowski 4d1a2d9ec1 Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()
This supplementary patch renames skb to icmp_skb in tcp_v4_err() in order to
disambiguate from another sk_buff variable, which will be introduced
in a separate patch.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:38 -07:00
Yi Zou 579496865c dcbnl: Add implementations of dcbnl setapp/getapp commands
Implements the dcbnl netlink setapp/getapp pair. When a setapp/getapp
is received, dcbnl would just pass on to dcbnl_rtnl_op.setapp/getapp
that are supposed to be implemented by the low level drivers.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:24:36 -07:00
Yi Zou 6fa382af61 dcbnl: Add netlink attributes for setapp/getapp to dcbnl
Add defines for dcbnl netlink attributes to support netlink message passing of
setapp/getapp in dcbnl.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:24:33 -07:00
Yi Zou 0af46d997f vlan: Add support for net_devices_ops.ndo_fcoe_enable/_disable to VLAN
This adds implementation of the net_devices_ops.ndo_fcoe_enable/_disable to
the VLAN driver. It checks if the real_dev has support for ndo_fcoe_enable/
ndo_fcoe_disable and if so, passes on to call the associated real_dev.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:24:23 -07:00
Stephen Hemminger d0cf9c0dad wireless: convert drivers to netdev_tx_t
Mostly just simple conversions:
  * ray_cs had bogus return of NET_TX_LOCKED but driver
    was not using NETIF_F_LLTX
  * hostap and ipw2x00 had some code that returned value
    from a called function that also had to change to return netdev_tx_t

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:14:04 -07:00
Stephen Hemminger 424efe9caf netdev: convert pseudo drivers to netdev_tx_t
These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:40 -07:00
Stephen Hemminger 6518bbb803 irda: convert to netdev_tx_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:38 -07:00
Stephen Hemminger 36e4d64a82 convert hamradio drivers to netdev_txreturnt_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:12 -07:00
Stephen Hemminger 3c805a22a3 convert ATM drivers to netdev_tx_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:10 -07:00
Stephen Hemminger 6fef4c0c8e netdev: convert pseudo-devices to netdev_tx_t
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:07 -07:00
Julius Volz 94b265514a IPVS: Add handling of incoming ICMPV6 messages
Add handling of incoming ICMPv6 messages.
This follows the handling of IPv4 ICMP messages.

Amongst ther things this problem allows IPVS to behave sensibly
when an ICMPV6_PKT_TOOBIG message is received:

This message is received when a realserver sends a packet >PMTU to the
client. The hop on this path with insufficient MTU will generate an
ICMPv6 Packet Too Big message back to the VIP. The LVS server receives
this message, but the call to the function handling this has been
missing. Thus, IPVS fails to forward the message to the real server,
which then does not adjust the path MTU. This patch adds the missing
call to ip_vs_in_icmp_v6() in ip_vs_in() to handle this situation.

Thanks to Rob Gallagher from HEAnet for reporting this issue and for
testing this patch in production (with direct routing mode).

[horms@verge.net.au: tweaked changelog]
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Tested-by: Rob Gallagher <robert.gallagher@heanet.ie>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31 16:22:23 +02:00
Patrick McHardy 4889086969 netfilter: ip6t_eui: fix read outside array bounds
Use memcmp() instead of open coded comparison that reads one byte past
the intended end.

Based on patch from Roel Kluin <roel.kluin@gmail.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31 15:30:31 +02:00
Alexey Dobriyan ee254fa44d netfilter: nf_conntrack: netns fix re reliable conntrack event delivery
Conntracks in netns other than init_net dying list were never killed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31 14:23:15 +02:00
Simon Horman 1e66dafc75 ipvs: Use atomic operations atomicly
A pointed out by Shin Hong, IPVS doesn't always use atomic operations
in an atomic manner. While this seems unlikely to be manifest in
strange behaviour, it seems appropriate to clean this up.

Cc: shin hong <hongshin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31 14:18:48 +02:00
Krishna Kumar a453e0689a pkt_sched: Fix resource limiting in pfifo_fast
pfifo_fast_enqueue has this check:
        if (skb_queue_len(list) < qdisc_dev(qdisc)->tx_queue_len) {

which allows each band to enqueue upto tx_queue_len skbs for a
total of 3*tx_queue_len skbs. I am not sure if this was the
intention of limiting in qdisc.

Patch compiled and 32 simultaneous netperf testing ran fine. Also:
# tc -s qdisc show dev eth2
qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 16835026752 bytes 373116 pkt (dropped 0, overlimits 0 requeues 25) 
 rate 0bit 0pps backlog 0b 0p requeues 25 

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-30 22:20:28 -07:00
Krishna Kumar 03a9a447d2 net: convert remaining non-symbolic return values in dev_queue_xmit
Patch compiled and 32 simultaneous netperf testing ran fine.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-30 22:16:57 -07:00
Oliver Hartkopp 6ca8b990e0 can: use correct NET_RX_ return values
Dropped skb's should be documented by an appropriate return value.
Use the correct NET_RX_DROP and NET_RX_SUCCESS values for that reason.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-30 22:13:18 -07:00
David S. Miller b9caaabb99 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6 2009-08-30 21:30:39 -07:00
roel kluin b3df9a514f tipc: fix test of bearer_priority range in tipc_register_media()
For the bearer_priority to be less than TIPC_MIN_LINK_PRI and greater than
TIPC_MAX_LINK_PRI is logically impossible.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:19:42 -07:00
Alexey Dobriyan ea00b8e222 can: switch to seq_file
create_proc_read_entry() is going to be removed soon.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:19:38 -07:00
John Dykstra 9a7030b76a tcp: Remove redundant copy of MD5 authentication key
Remove the copy of the MD5 authentication key from tcp_check_req().
This key has already been copied by tcp_v4_syn_recv_sock() or
tcp_v6_syn_recv_sock().

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:19:25 -07:00
Krishna Kumar fd3ae5e8fc Speed-up pfifo_fast lookup using a private bitmap
Maintain a per-qdisc bitmap for pfifo_fast giving  availability
of skbs for each band. This allows faster lookup for a skb when
there are no high priority skbs. Also, it helps in (rare) cases
when there are no skbs on the list, where an immediate lookup is
faster than iterating through the three bands.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:19:21 -07:00
David Ward 31ce8c71a3 ipv6: Update Neighbor Cache when IPv6 RA is received on a router
When processing a received IPv6 Router Advertisement, the kernel
creates or updates an IPv6 Neighbor Cache entry for the sender --
but presently this does not occur if IPv6 forwarding is enabled
(net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
cases processing of the Router Advertisement has already halted.

This patch allows the Neighbor Cache to be updated in these cases,
while still avoiding any modification to routes or link parameters.

This continues to satisfy RFC 4861, since any entry created in the
Neighbor Cache as the result of a received Router Advertisement is
still placed in the STALE state.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:04:09 -07:00
Octavian Purdila 80a1096bac tcp: fix premature termination of FIN_WAIT2 time-wait sockets
There is a race condition in the time-wait sockets code that can lead
to premature termination of FIN_WAIT2 and, subsequently, to RST
generation when the FIN,ACK from the peer finally arrives:

Time     TCP header
0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0

Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
instance inet_twdr_do_twkill_work returns 1. At that point we will
mark slot 1 and schedule inet_twdr_twkill_work. We will also make
twdr->slot = 2.

Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
is called which will create a new FIN_WAIT2 time-wait socket and will
place it in the last to be reached slot, i.e. twdr->slot = 1.

At this point say inet_twdr_twkill_work will run which will start
destroying the time-wait sockets in slot 1, including the just added
TCP_FIN_WAIT2 one.

To avoid this issue we increment the slot only if all entries in the
slot have been purged.

This change may delay the slots cleanup by a time-wait death row
period but only if the worker thread didn't had the time to run/purge
the current slot in the next period (6 seconds with default sysctl
settings). However, on such a busy system even without this change we
would probably see delays...

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-29 00:00:35 -07:00
Jens Låås 80b71b80df fib_trie: resize rework
Here is rework and cleanup of the resize function.

Some bugs we had. We were using ->parent when we should use 
node_parent(). Also we used ->parent which is not assigned by
inflate in inflate loop.

Also a fix to set thresholds to power 2 to fit halve 
and double strategy.

max_resize is renamed to max_work which better indicates
it's function.

Reaching max_work is not an error, so warning is removed. 
max_work only limits amount of work done per resize.
(limits CPU-usage, outstanding memory etc).

The clean-up makes it relatively easy to add fixed sized 
root-nodes if we would like to decrease the memory pressure
on routers with large routing tables and dynamic routing.
If we'll need that...

Its been tested with 280k routes.

Work done together with Robert Olsson.

Signed-off-by: Jens Låås <jens.laas@its.uu.se>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:57:15 -07:00
Sascha Hlusiak 8945a808f7 sit: allow ip fragmentation when using nopmtudisc to fix package loss
if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
will be performed by deriving the mtu from the ipv4 link and setting the
DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
using the new and lower mtu and everyone is happy.

If the frag_off parameter is unset, the mtu for the tunnel will be derived
from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
pmtu. In that case we must allow the fragmentation of the IPv4 packet because
the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
frequent icmp_frag_needed and package loss on the IPv6 layer.

This patch allows fragmentation when tunnel was created with parameter
nopmtudisc, like in ipip/gre tunnels.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:53:53 -07:00
Eric Dumazet 30038fc61a net: ip_rt_send_redirect() optimization
While doing some forwarding benchmarks, I noticed
ip_rt_send_redirect() is rather expensive, even if send_redirects is
false for the device.

Fix is to avoid two atomic ops, we dont really need to take a
reference on in_dev

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:52:01 -07:00
Eric Dumazet df19a62677 tcp: keepalive cleanups
Introduce keepalive_probes(tp) helper, and use it, like 
keepalive_time_when(tp) and keepalive_intvl_when(tp)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:48:54 -07:00
Eric Dumazet 3d1427f870 ipv4: af_inet.c cleanups
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:45:21 -07:00
Alexey Dobriyan 2975315b79 pktgen: use proc_create_data()
It looks like after rename device proc entry is unusable,
because of no ->read_proc or ->proc_fops.

And create_proc_entry() is deprecated.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:43 -07:00
Stephen Hemminger c3d2f52dd4 pktgen: increase version
Increase module version, and cleanup module info.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:39 -07:00
Stephen Hemminger 63adc6fb8a pktgen: cleanup checkpatch warnings
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:37 -07:00
Stephen Hemminger 64e8ff5ef2 pktgen: use common idle routine
Simpler to have one place that spins and accounts for delays,
this will also make the last packet be detected faster for more
repeatable timing.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:36 -07:00
Stephen Hemminger 2bc481cf43 pktgen: spin using hrtimer
This changes how the pktgen thread spins/waits between
packets if delay is configured. It uses a high res timer to
wait for time to arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:29 -07:00
Stephen Hemminger fd29cf7262 pktgen: convert to use ktime_t
The kernel ktime_t is a nice generic infrastructure for mananging
high resolution times, as is done in pktgen.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:32:12 -07:00
Stephen Hemminger 5c9d191c16 pktgen: avoid calling gettimeofday
If not using delay then no need to update next_tx after
each packet sent. This allows pktgen to send faster especially
on systems with slower clock sources.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:32:09 -07:00
Stephen Hemminger 5b8db2f568 pktgen: reorganize transmit loop
Handle standard (and non-standard) return values in a switch.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:32:07 -07:00
Stephen Hemminger e470757d61 pktgen: use netdev_alloc_skb
netdev_alloc_skb is NUMA node aware.
Also, don't exhaust atomic emergency pool. Don't want pktgen
to cause OOM behaviour.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:32:04 -07:00
Stephen Hemminger 7d7bb1cf0e pktgen: cleanup clone count test
The if statement to test for "should a new packet be used"
can be simplified.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:32:00 -07:00
Stephen Hemminger 3791decb5a pktgen: xmit logic reorganization
Do some reorganization of transmit logic path:
   * move transmit queue full idle to separate routine
   * add a cpu_relax()
   * eliminate some of the uneeded goto's
   * if queue is still stopped, go back to main thread loop.
   * don't give up transmitting if quantum is exhausted (be greedy)

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:31:55 -07:00
Stephen Hemminger 3bda06a3d7 pktgen: stop_device cleanup
All the callers were freeing skb after stopping device.
Remove unneeded forward decl.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:31:53 -07:00
Stephen Hemminger 65c5b786a3 pktgen: mark read-only/mostly variables
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:31:51 -07:00
Stephen Hemminger 475ac1e409 pktgen: change inlining
Don't force inlining where not needed. Gcc does better job
of deciding to inline local functions.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:31:47 -07:00
Stephen Hemminger 648fda7404 pktgen: minor cleanup
A couple of minor functions can be written more compactly.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:31:45 -07:00
Trond Myklebust 2574cc9f4f SUNRPC: Fix rpc_task_force_reencode
This patch fixes the bug that was reported in
  http://bugzilla.kernel.org/show_bug.cgi?id=14053

If we're in the case where we need to force a reencode and then resend of
the RPC request, due to xprt_transmit failing with a networking error, then
we _must_ retransmit the entire request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-28 19:35:56 -10:00
Jouni Malinen 5bf6fcc2bb mac80211: Check pending scan request after having processed mgd work
When the queued management work items are processed in
ieee80211_sta_work() an item could be removed. This could change the
anybusy from true to false, so we better check whether we can start a
new scan only after having processed the pending work first.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:46 -04:00
Johannes Berg 15db0b7fd8 mac80211: fix scan cancel on ifdown
When an interface is taken down while a scan is
pending -- i.e. a scan request was accepted but
not yet acted upon due to other work being in
progress -- we currently do not properly cancel
that scan and end up getting stuck. Fix this by
doing better checks when an interface is taken
down.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:45 -04:00
Arnd Hannemann eadac6bf95 mac80211: Fix output of minstrels rc_stats
An integer overflow in the minstrel debug code prevented the
throughput to be displayed correctly. This patch fixes that,
by permutating operations like proposed by Pavel Roskin.

Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:43 -04:00
Johannes Berg 77a980dc6c mac80211: fix RX skb leaks
In mac80211's RX path some of the warnings that
warn about drivers passing invalid status values
leak the skb, fix that by refactoring the code.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:41 -04:00
Roel Kluin 0448b5fc03 nl80211: jump to out_err upon unsupported iftype
Jump to out_err when the iftype is not supported.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:41 -04:00
Arnd Hannemann 1c4e9ab3f1 mac80211: Remove unnused throughput field from minstrel_rate.
I noticed that the throughput field of the minstrel_rate struct is never used,
so remove it.

Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:34 -04:00
Johannes Berg ea77f12f2c mac80211: remove tasklet enable/disable
Due to the way the tasklets work in mac80211 there's
no need to ever disable them.

However, we need to clear the pending packets when
taking down the last interface because otherwise
the tx_pending_tasklet might be queued if the
driver mucks with the queues (which it shouldn't).

I've had a situation occasionally with ar9170 in
which ksoftirq was using 100% CPU time because
a disabled tasklet was scheduled, and I think that
was due to ar9170 receiving a packet while the
tasklet was disabled. That's strange and it really
should not do that for other reasons, but there's
no need to waste that much CPU time over it, it
should just warn instead.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:34 -04:00
Johannes Berg 3d54d25515 cfg80211: clean up properly on interface type change
When the interface type changes while connected, and the
driver does not require the interface to be down for a
type change, it is currently possible to get very strange
results unless the driver takes special care, which it
shouldn't have to.

To fix this, take care to disconnect/leave IBSS when
changing the interface type -- even if the driver may fail
the call. Also process all events that may be pending to
avoid running into a situation where an event is reported
but only processed after the type has already changed,
which would lead to missing events and warnings.

A side effect of this is that you will have disconnected
or left the IBSS even if the mode change ultimately fails,
but since the intention was to change it and thus leave or
disconnect, this is not a problem.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:31 -04:00
Johannes Berg f7969969f4 cfg80211: make spurious warnings less likely, configurable
Bob reported that he got warnings in IBSS mode about
the ssid_len being zero on a joined event, but only
when kmemcheck was enabled. This appears to be due
to a race condition between drivers and userspace,
when the driver reports joined but the user in the
meantime decided to leave the IBSS again, the warning
would trigger. This was made more likely by kmemcheck
delaying the code that does the check and sends the
event.

So first, make the warning trigger closer to the
driver, which means it's not locked, but since only
the warning depends on it that's ok.

And secondly, users will not want to have spurious
warnings at all, so make those that are known to be
racy in such a way configurable.

Reported-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:30 -04:00
John W. Linville 103bf9f7d3 mac80211: remove ieee80211_rx namespace hack
With the libipw naming scheme change, it is no longer necessary for
mac80211 to avoid the ieee80211_rx name clash.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:29 -04:00
Johannes Berg 01a0ac417c cfg80211: check lost scans later, fix bug
When we lose a scan, cfg80211 tries to clean up after
the driver. However, it currently does this too early,
it does this in GOING_DOWN already instead of DOWN, so
it may happen with mac80211. Besides fixing this, also
make it more robust by leaking the scan request so if
the driver later actually finishes the scan, it won't
crash. Also check in ___cfg80211_scan_done whether a
scan request is still pending and exit if not.

Reported-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:25 -04:00
Johannes Berg 84f6a01ce0 mac80211: fix configure_filter invocation after stop
Since configure_filter can sleep now, any multicast
configuration needed to be postponed to a work struct.
This, however, lead to a problem that we could queue
the work, stop the device and then afterwards invoke
configure_filter which may lead to driver hangs and is
a bug. To fix this, we can just cancel the filter work
since it's unnecessary to do after stopping the hw.

Since there are various places that call drv_stop, and
two of them do very similar things, the code for them
can be put into a shared function at the same time.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Tested-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:25 -04:00
Javier Cardona 9e03fdfd05 mac80211: Update mesh config IE to 11s draft 3.02
The mesh config information element has changed significantly since draft 1.08
This patch brings it up to date.

Thanks to Sam Leffler and Rui Paulo for identifying this.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-28 14:40:24 -04:00
Wei Yongjun b0401d7253 sunrpc: move the close processing after do recvfrom method
sunrpc: "Move close processing to a single place"
(d7979ae4a0) moved the
close processing before the recvfrom method. This may
cause the close processing never to execute. So this
patch moves it to the right place.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-27 17:18:38 -04:00
Linus Torvalds cf481442f2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation pointers
  9p: remove unnecessary v9fses->options which duplicates the mount string
  net/9p: insulate the client against an invalid error code sent by a 9p server
  9p: Add missing cast for the error return value in v9fs_get_inode
  9p: Remove redundant inode uid/gid assignment
  9p: Fix possible regressions when ->get_sb fails.
  9p: Fix v9fs show_options
  9p: Fix possible memleak in v9fs_inode_from fid.
  9p: minor comment fixes
  9p: Fix possible inode leak in v9fs_get_inode.
  9p: Check for error in return value of v9fs_fid_add
2009-08-27 12:24:08 -07:00
Julien TINNES 788d908f28 ipv4: make ip_append_data() handle NULL routing table
Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.

Signed-off-by: Julien Tinnes <julien@cr0.org>
Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-27 12:23:43 -07:00
Gustavo F. Padovan 7e7430908c Bluetooth: Add support for L2CAP 'Send RRorRNR' action
When called, 'Send RRorRNR' should send a RNR frame if local device is
busy or a RR frame otherwise.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-26 00:12:20 -07:00
Gustavo F. Padovan 2246b2f1b4 Bluetooth: Handle L2CAP case when the remote receiver is busy
Implement all issues related to RemoteBusy in the RECV state table.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-26 00:12:20 -07:00
Gustavo F. Padovan ca42a613c9 Bluetooth: Acknowledge L2CAP packets when receiving RR-frames (F-bit=1)
Implement the Recv ReqSeqAndFBit event when a RR frame with F bit set is
received.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-26 00:12:20 -07:00
Linus Torvalds f415c413f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  irda/sa1100_ir: fix broken netdev_ops conversion
  irda/au1k_ir: fix broken netdev_ops conversion
  pkt_sched: Fix bogon in tasklet_hrtimer changes.
2009-08-25 21:24:49 -07:00
Wei Yongjun eac81736e6 sunrpc: reply AUTH_BADCRED to RPCSEC_GSS with unknown service
When an RPC message is received with RPCSEC_GSS with an unknown service
(not RPC_GSS_SVC_NONE, RPC_GSS_SVC_INTEGRITY, or RPC_GSS_SVC_PRIVACY),
svcauth_gss_accept() returns AUTH_BADCRED, but svcauth_gss_release()
subsequently drops the response entirely, discarding the error.

Fix that so the AUTH_BADCRED error is returned to the client.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-25 17:39:43 -04:00
Ryusei Yamaguchi ed2d8aed52 knfsd: Replace lock_kernel with a mutex in nfsd pool stats.
lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.

Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).

Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Greg Banks <gnb@fmeh.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-25 12:39:37 -04:00
Patrick McHardy 3993832464 netfilter: nfnetlink: constify message attributes and headers
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 16:07:58 +02:00
Patrick McHardy 3a6c2b419b netlink: constify nlmsghdr arguments
Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 16:07:40 +02:00
Patrick McHardy 74f7a6552c netfilter: nf_conntrack: log packets dropped by helpers
Log packets dropped by helpers using the netfilter logging API. This
is useful in combination with nfnetlink_log to analyze those packets
in userspace for debugging.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25 15:33:08 +02:00
David S. Miller a2cb6a4dd4 pkt_sched: Fix bogon in tasklet_hrtimer changes.
Reported by Stephen Rothwell, luckily it's harmless:

net/sched/sch_api.c: In function 'qdisc_watchdog':
net/sched/sch_api.c:460: warning: initialization from incompatible pointer type
net/sched/sch_cbq.c: In function 'cbq_undelay':
net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-24 19:37:05 -07:00
Randy Dunlap cbe86b98a6 Bluetooth: Add missing selection of CONFIG_CRC16 for L2CAP layer
Fix net/bluetooth/l2cap.c build errors:

l2cap.c:(.text+0x126035): undefined reference to `crc16'
l2cap.c:(.text+0x126323): undefined reference to `crc16'
l2cap.c:(.text+0x12668e): undefined reference to `crc16'
l2cap.c:(.text+0x12683b): undefined reference to `crc16'
l2cap.c:(.text+0x126956): undefined reference to `crc16'
net/built-in.o:l2cap.c:(.text+0x129041): more undefined references to `crc16' follow

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-24 16:34:35 -07:00
Alexandros Batsakis 8f55f3c0a0 nfsd41: sunrpc: svc_tcp_recv_record()
Factor functionality out of svc_tcp_recvfrom() to simplify routine

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-08-24 18:13:10 -04:00
Linus Torvalds 1cac6ec9b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  smc91x: let smc91x work well under netpoll
  pxaficp-ir: remove incorrect net_device_ops
  NET: llc, zero sockaddr_llc struct
  drivers/net: fixed drivers that support netpoll use ndo_start_xmit()
  netpoll: warning for ndo_start_xmit returns with interrupts enabled
  net: Fix Micrel KSZ8842 Kconfig description
  netfilter: xt_quota: fix wrong return value (error case)
  ipv6: Fix commit 63d9950b08 (ipv6: Make v4-mapped bindings consistent with IPv4)
  E100: fix interaction with swiotlb on X86.
  pkt_sched: Convert CBQ to tasklet_hrtimer.
  pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer
  rtl8187: always set MSR_LINK_ENEDCA flag with RTL8187B
  ibm_newemac: emac_close() needs to call netif_carrier_off()
  net: fix ks8851 build errors
  net: Rename MAC platform driver for w90p910 platform
  yellowfin: Fix buffer underrun after dev_alloc_skb() failure
  orinoco: correct key bounds check in orinoco_hw_get_tkip_iv
  mac80211: fix todo lock
2009-08-24 12:25:03 -07:00
Eric Dumazet f3abc9b963 netfilter: bridge: refcount fix
commit f216f082b2
([NETFILTER]: bridge netfilter: deal with martians correctly)
added a refcount leak on in_dev.

Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
as netfilter hooks are running under rcu_read_lock(), as pointed
by Patrick.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24 19:35:38 +02:00
Maximilian Engelhardt cce5a5c302 netfilter: nf_nat: fix inverted logic for persistent NAT mappings
Kernel 2.6.30 introduced a patch [1] for the persistent option in the
netfilter SNAT target. This is exactly what we need here so I had a quick look
at the code and noticed that the patch is wrong. The logic is simply inverted.
The patch below fixes this.

Also note that because of this the default behavior of the SNAT target has
changed since kernel 2.6.30 as it now ignores the destination IP in choosing
the source IP for nating (which should only be the case if the persistent
option is set).

[1] http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=98d500d66cb7940747b424b245fc6a51ecfbf005

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24 19:24:54 +02:00
Jan Engelhardt 35aad0ffdf netfilter: xtables: mark initial tables constant
The inputted table is never modified, so should be considered const.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24 14:56:30 +02:00
Gustavo F. Padovan 1b7bf4edca Bluetooth: Use proper *_unaligned_le{16,32} helpers for L2CAP
Simplify more conversions to the right endian with the proper helpers.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-24 01:05:05 -07:00
Gustavo F. Padovan e686219a64 Bluetooth: Add locking scheme to L2CAP timeout callbacks
Avoid race conditions when accessing the L2CAP socket from within the
timeout handlers.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-24 01:05:05 -07:00
Jiri Slaby 28e9fc592c NET: llc, zero sockaddr_llc struct
sllc_arphrd member of sockaddr_llc might not be changed. Zero sllc
before copying to the above layer's structure.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 22:55:51 -07:00
Dongdong Deng 79b1bee888 netpoll: warning for ndo_start_xmit returns with interrupts enabled
WARN_ONCE for ndo_start_xmit() enable interrupts in netpoll_send_skb(),
because the NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb().

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:50:59 -07:00
David S. Miller 9409172262 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-08-23 19:19:30 -07:00
Andy Grover f2c449320d RDS: Add a debug message suggesting to load transport modules
Now that RDS transports are no longer compiled-in to RDS core,
there is now the possibility that they will not be loaded. This
adds a helpful suggestion when rds_bind() fails to find a transport.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:13:14 -07:00
Andy Grover 335776bd69 RDS: Track transports via an array, not a list
Now that transports can be loaded in arbitrary order,
it is important for rds_trans_get_preferred() to look
for them in a particular order, instead of walking the list
until it finds a transport that works for a given address.
Now, each transport registers for a specific transport slot,
and these are ordered so that preferred transports come first,
and then if they are not loaded, other transports are queried.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:13:12 -07:00
Andy Grover 40d866095d RDS: Modularize RDMA and TCP transports
Enable the building of transports as modules.

Also, improve consistency of Kconfig messages in relation to other
protocols, and move build dependency on IB from the RDS core code
to the rds_rdma module.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:13:09 -07:00
Andy Grover 616b757ae1 RDS: Export symbols from core RDS
Now that rdma and tcp transports will be modularized,
we need to export a number of functions so they can call them.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:13:07 -07:00
Andy Grover 70041088e3 RDS: Add TCP transport to RDS
This code allows RDS to be tunneled over a TCP connection.

RDMA operations are disabled when using TCP transport,
but this frees RDS from the IB/RDMA stack dependency, and allows
it to be used with standard Ethernet adapters, or in a VM.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:13:02 -07:00
Patrick McHardy 2149f66f49 netfilter: xt_quota: fix wrong return value (error case)
Success was indicated on a memory allocation failure, thereby causing
a crash due to a later NULL deref.
(Affects v2.6.30-rc1 up to here.)

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:09:23 -07:00
Bruno Prémont ca6982b858 ipv6: Fix commit 63d9950b08 (ipv6: Make v4-mapped bindings consistent with IPv4)
Commit 63d9950b08
  (ipv6: Make v4-mapped bindings consistent with IPv4)
changes behavior of inet6_bind() for v4-mapped addresses so it should
behave the same way as inet_bind().

During this change setting of err to -EADDRNOTAVAIL got lost:

af_inet.c:469 inet_bind()
	err = -EADDRNOTAVAIL;
	if (!sysctl_ip_nonlocal_bind &&
	    !(inet->freebind || inet->transparent) &&
	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
	    chk_addr_ret != RTN_LOCAL &&
	    chk_addr_ret != RTN_MULTICAST &&
	    chk_addr_ret != RTN_BROADCAST)
		goto out;


af_inet6.c:463 inet6_bind()
	if (addr_type == IPV6_ADDR_MAPPED) {
		int chk_addr_ret;

		/* Binding to v4-mapped address on a v6-only socket                         
		 * makes no sense                                                           
		 */
		if (np->ipv6only) {
			err = -EINVAL;
			goto out; 
		}

		/* Reproduce AF_INET checks to make the bindings consitant */               
		v4addr = addr->sin6_addr.s6_addr32[3];                                      
		chk_addr_ret = inet_addr_type(net, v4addr);                                 
		if (!sysctl_ip_nonlocal_bind &&                                             
		    !(inet->freebind || inet->transparent) &&                               
		    v4addr != htonl(INADDR_ANY) &&
		    chk_addr_ret != RTN_LOCAL &&                                            
		    chk_addr_ret != RTN_MULTICAST &&                                        
		    chk_addr_ret != RTN_BROADCAST)
			goto out;
	} else {


Signed-off-by Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23 19:06:28 -07:00
David S. Miller 38acce2d79 pkt_sched: Convert CBQ to tasklet_hrtimer.
This code expects to run in softirq context, and bare hrtimers
run in hw IRQ context.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-23 18:52:38 -07:00
David S. Miller ee5f9757ea pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer
None of this stuff should execute in hw IRQ context, therefore
use a tasklet_hrtimer so that it runs in softirq context.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-22 18:09:17 -07:00
Luiz Augusto von Dentz 9e726b1742 Bluetooth: Fix rejected connection not disconnecting ACL link
When using DEFER_SETUP on a RFCOMM socket, a SABM frame triggers
authorization which when rejected send a DM response. This is fine
according to the RFCOMM spec:

    the responding implementation may replace the "proper" response
    on the Multiplexer Control channel with a DM frame, sent on the
    referenced DLCI to indicate that the DLCI is not open, and that
    the responder would not grant a request to open it later either.

But some stacks doesn't seems to cope with this leaving DLCI 0 open after
receiving DM frame.

To fix it properly a timer was introduced to rfcomm_session which is used
to set a timeout when the last active DLC of a session is unlinked, this
will give the remote stack some time to reply with a proper DISC frame on
DLCI 0 avoiding both sides sending DISC to each other on stacks that
follow the specification and taking care of those who don't by taking
down DLCI 0.

Signed-off-by: Luiz Augusto von Dentz <luiz.dentz@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:05:58 -07:00
Gustavo F. Padovan ef54fd937f Bluetooth: Full support for receiving L2CAP SREJ frames
Support for receiving of SREJ frames as specified by the state table.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:03:43 -07:00
Gustavo F. Padovan 8f17154f1f Bluetooth: Add support for L2CAP SREJ exception
When L2CAP loses an I-frame we send a SREJ frame to the transmitter side
requesting the lost packet. This patch implement all Recv I-frame events
on SREJ_SENT state table except the ones that deal with SendRej (the REJ
exception at receiver side is yet not implemented).

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 15:01:25 -07:00
Gustavo F. Padovan fcc203c30d Bluetooth: Add support for FCS option to L2CAP
Implement CRC16 check for L2CAP packets. FCS is used by Streaming Mode and
Enhanced Retransmission Mode and is a extra check for the packet content.

Using CRC16 is the default, L2CAP won't use FCS only when both side send
a "No FCS" request.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:59:49 -07:00
Gustavo F. Padovan 6840ed0770 Bluetooth: Enable Streaming Mode for L2CAP
Streaming Mode is helpful for the Bluetooth streaming based profiles, such
as A2DP. It doesn't have any error control or flow control.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:57:58 -07:00
Gustavo F. Padovan e90bac061b Bluetooth: Add support for Retransmission and Monitor Timers
L2CAP uses retransmission and monitor timers to inquiry the other side
about unacked I-frames. After sending each I-frame we (re)start the
retransmission timer. If it expires, we start a monitor timer that send a
S-frame with P bit set and wait for S-frame with F bit set. If monitor
timer expires, try again, at a maximum of L2CAP_DEFAULT_MAX_TX.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:56:15 -07:00
Gustavo F. Padovan 30afb5b2aa Bluetooth: Initial support for retransmission of packets with REJ frames
When receiving an I-frame with unexpected txSeq, receiver side start the
recovery procedure by sending a REJ S-frame to the transmitter side. So
the transmitter can re-send the lost I-frame.

This patch just adds a basic support for retransmission, it doesn't
mean that ERTM now has full support for packet retransmission.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:55:20 -07:00
Gustavo F. Padovan c74e560cd0 Bluetooth: Add support for Segmentation and Reassembly of SDUs
ERTM should use Segmentation and Reassembly to break down a SDU in many
PDUs on sending data to the other side.

On sending packets we queue all 'segments' until end of segmentation and
just the add them to the queue for sending. On receiving we create a new
SKB with the SDU reassembled.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:53:58 -07:00
Gustavo F. Padovan 1c2acffb76 Bluetooth: Add initial support for ERTM packets transfers
This patch adds support for ERTM transfers, without retransmission, with
txWindow up to 63 and with acknowledgement of packets received. Now the
packets are queued before call l2cap_do_send(), so packets couldn't be
sent at the time we call l2cap_sock_sendmsg(). They will be sent in
an asynchronous way on later calls of l2cap_ertm_send(). Besides if an
error occurs on calling l2cap_do_send() we disconnect the channel.

Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-08-22 14:53:01 -07:00