Commit Graph

22952 Commits

Author SHA1 Message Date
Szymon Janc 16cde9931b Bluetooth: Fix missing break in hci_cmd_complete_evt
Command complete event for HCI_OP_USER_PASSKEY_NEG_REPLY would result
in calling handler function also for HCI_OP_LE_SET_SCAN_PARAM. This
could result in undefined behaviour.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-04-24 11:38:41 -03:00
Paul Gortmaker 872f24dbc6 tipc: remove inline instances from C source files.
Untie gcc's hands and let it do what it wants within the
individual source files.  There are two files, node.c and
port.c -- only the latter effectively changes (gcc-4.5.2).
Objdump shows gcc deciding to not inline port_peernode().

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:41:03 -04:00
Eric Dumazet bfb253c9b2 af_netlink: drop_monitor/dropwatch friendly
Need to consume_skb() instead of kfree_skb() in netlink_dump() and
netlink_unicast_kernel() to avoid false dropwatch positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 658cb354ed af_netlink: cleanups
netlink_destroy_callback() move to avoid forward reference

CodingStyle cleanups

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 38ba0a65fa net: skb_can_coalesce returns a boolean
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:18:02 -04:00
Peter Huang (Peng) a881e963c7 set fake_rtable's dst to NULL to avoid kernel Oops
bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
  implementation -DaveM ]

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Huang <peter.huangpeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:16:24 -04:00
Eric Dumazet 783c175f90 tcp: tcp_try_coalesce returns a boolean
This clarifies code intention, as suggested by David.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:36:58 -04:00
Eric Dumazet d7ccf7c0a0 net: make spd_fill_page() linear argument a bool
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:35:04 -04:00
David S. Miller f24001941c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")

The former moved around the sysctl register/unregister calls, the
later simply removed them.

With help from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:15:17 -04:00
David S. Miller a108d5f35a net: Use bool and remove inline in skb_splice_bits() code.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:06:11 -04:00
Eric Dumazet 41c73a0d44 net: speedup skb_splice_bits()
Commit 35f3d14db (pipe: add support for shrinking and growing pipes)
added a slowdown for splice(socket -> pipe), as we might grow the spd
used in skb_splice_bits() for each skb we process in splice() syscall.

Its not needed since skb lengths are capped. The default on-stack arrays
are more than enough.

Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable
limit per skb.

Add coalescing support to help splicing of GRO skbs built from linear
skbs (linked into frag_list)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:01:35 -04:00
Eric Dumazet 1402d36601 tcp: introduce tcp_try_coalesce
commit c8628155ec (tcp: reduce out_of_order memory use) took care of
coalescing tcp segments provided by legacy devices (linear skbs)

We extend this idea to fragged skbs, as their truesize can be heavy.

ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.

Use this coalescing strategy for receive queue too.

This contributes to reduce number of tcp collapses, at minimal cost, and
reduces memory overhead and packets drops.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:42:49 -04:00
Eric Dumazet da882c1f2e tcp: sk_add_backlog() is too agressive for TCP
While investigating TCP performance problems on 10Gb+ links, we found a
tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit
in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends
one ACK every two MSS segments.

A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default
value (87380), allowing a too small backlog.

A TCP ACK, even being small, can consume nearly same truesize space than
outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and
is fast to compute.

Performance results on netperf, single flow, receiver with disabled
GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop
increments at sender.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:28:28 -04:00
Eric Dumazet f545a38f74 net: add a limit parameter to sk_add_backlog()
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.

No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:28:28 -04:00
Bala Shanmugam 218d2e26dc cfg80211: Validate legacy rateset.
Legacy rates are not validated while configuring
tx rateset using iw. So below cmd is accepted by nl80211.
sudo iw wlan2 set bitrates legacy-2.4 1 2 3

Validate legacy rates and return
error if any rate in the rateset is not valid.

Signed-off-by: Bala Shanmugam <bkamatch@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Wey-Yi Guy 0d8a0a1728 mac80211: declare ieee80211_ave_rssi as EXPORT
ieee80211_ave_rssi need to be declare as export for driver to use it.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Javier Cardona 6ac95b5765 mac80211: fixup for mesh TSF adjustment latency in Toffset setpoint
The original patch defined the correction margin but did not apply it.

Signed-off-by: Shinichi Hotori <hotorinn@gmail.com>
Signed-off-by: Yu Niiro <yu.niiro@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Thomas Pedersen aee286c2cf mac80211: fix STA channel width field
According to IEEE 802.11 8.4.2.59, set the "STA channel width" bit to 0
if transmitting STA is using a 20mhz channel.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen e76781e48f mac80211: don't set mesh peer ht caps if ht disabled
Blindly setting ht caps on a mesh peer's station entry would result in
MCS rates being used by the rate control algorithm even if no ht had
been configured. Fix this by checking the channel type before assigning
ht capabilites.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen f743ff4907 mac80211: refactor mesh peer rate handling
To avoid passing supp_rates and basic_rates around all the time, just
derive these when needed in mesh_matches_local() and mesh_peer_init().

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen 54ab1ffb6c mac80211: refactor mesh peer initialization
This patch unifies the previous two paths toward mesh peer creation a
bit. It also fixes a bug where a peer's changing rates or HT mode
wouldn't register on leaving and then returning to the mesh with a sta
entry still present.

Also clean up locking and clear possibly stale ht cap.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Ben Greear 8a690674e0 mac80211: Support on-channel scan option.
This based on an idea posted by Stanislaw Gruszka,
though I accept full blame for the implementation!

This has been tested with ath9k.

The idea is to let users scan on the current operating
channel without interrupting normal traffic more than
absolutely necessary (changing power level might reset
some hardware, for instance).

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:28:33 -04:00
David S. Miller ac807fa8e6 tcp: Fix build warning after tcp_{v4,v6}_init_sock consolidation.
net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock':
net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable]
net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock':
net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 03:21:58 -04:00
Neal Cardwell d135c522f1 tcp: fix TCP_MAXSEG for established IPv6 passive sockets
Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6
TCP server sockets that used TCP_MAXSEG would find that the advmss of
child sockets would be incorrect. This commit mirrors the advmss logic
from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this
logic should probably be shared between IPv4 and IPv6, but this at
least fixes this issue.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-22 17:09:35 -04:00
Eric Dumazet c06fff6e17 af_packet: packet_getsockopt() cleanup
Factorize code, since most fetched values are int type.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Neal Cardwell 900f65d361 tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()
This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().

Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).

When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Eric Dumazet e66e9a3147 net: allow better page reuse in splice(sock -> pipe)
splice() from socket to pipe needs linear_to_page() helper to transfert
skb header to part of page.

We can reset the offset in the current sk->sk_sndmsg_page if we are the
last user of the page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Eric Dumazet bbe362be53 drop_monitor: allow more events per second
It seems there is a logic error in trace_drop_common(), since we store
only 64 drops, even if they are from same location.

This fix is a one liner, but we probably need more work to avoid useless
atomic dec/inc

Now I can watch 1 Mpps drops through dropwatch...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:28:38 -04:00
Eric Dumazet a74e910618 net: change big iov allocations
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through
sock_kmalloc()

As these allocations are temporary only and small enough, it makes sense
to use plain kmalloc() and avoid sk_omem_alloc atomic overhead.

Slightly changed fast path to be even faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:24:20 -04:00
Pavel Emelyanov b139ba4e90 tcp: Repair connection-time negotiated parameters
There are options, which are set up on a socket while performing
TCP handshake. Need to resurrect them on a socket while repairing.
A new sockoption accepts a buffer and parses it. The buffer should
be CODE:VALUE sequence of bytes, where CODE is standard option
code and VALUE is the respective value.

Only 4 options should be handled on repaired socket.

To read 3 out of 4 of these options the TCP_INFO sockoption can be
used. An ability to get the last one (the mss_clamp) was added by
the previous patch.

Now the restore. Three of these options -- timestamp_ok, mss_clamp
and snd_wscale -- are just restored on a coket.

The sack_ok flags has 2 issues. First, whether or not to do sacks
at all. This flag is just read and set back. No other sack  info is
saved or restored, since according to the standart and the code
dropping all sack-ed segments is OK, the sender will resubmit them
again, so after the repair we will probably experience a pause in
connection. Next, the fack bit. It's just set back on a socket if
the respective sysctl is set. No collected stats about packets flow
is preserved. As far as I see (plz, correct me if I'm wrong) the
fack-based congestion algorithm survives dropping all of the stats
and repairs itself eventually, probably losing the performance for
that period.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 5e6a3ce657 tcp: Report mss_clamp with TCP_MAXSEG option in repair mode
The mss_clamp is the only connection-time negotiated option which
cannot be obtained from the user space. Make the TCP_MAXSEG sockopt
report one in the repair mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov c0e88ff0f2 tcp: Repair socket queues
Reading queues under repair mode is done with recvmsg call.
The queue-under-repair set by TCP_REPAIR_QUEUE option is used
to determine which queue should be read. Thus both send and
receive queue can be read with this.

Caller must pass the MSG_PEEK flag.

Writing to queues is done with sendmsg call and yet again --
the repair-queue option can be used to push data into the
receive queue.

When putting an skb into receive queue a zero tcp header is
appented to its head to address the tcp_hdr(skb)->syn and
the ->fin checks by the (after repair) tcp_recvmsg. These
flags flags are both set to zero and that's why.

The fin cannot be met in the queue while reading the source
socket, since the repair only works for closed/established
sockets and queueing fin packet always changes its state.

The syn in the queue denotes that the respective skb's seq
is "off-by-one" as compared to the actual payload lenght. Thus,
at the rcv queue refill we can just drop this flag and set the
skb's sequences to precice values.

When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent,
waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the
protocol POV the send queue looks like it was sent, but the data
between the write_seq and snd_nxt is lost in the network.

This helps to avoid another sockoption for setting the snd_nxt
sequence. Leaving the whole queue in a 'not yet sent' state (as
it will be after sendmsg-s) will not allow to receive any acks
from the peer since the ack_seq will be after the snd_nxt. Thus
even the ack for the window probe will be dropped and the
connection will be 'locked' with the zero peer window.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov ee9952831c tcp: Initial repair mode
This includes (according the the previous description):

* TCP_REPAIR sockoption

This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.

* TCP_REPAIR_QUEUE sockoption

This one sets the queue which we're about to repair. The
'no-queue' is set by default.

* TCP_QUEUE_SEQ socoption

Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).

* Ability to forcibly bind a socket to a port

The sk->sk_reuse is set to SK_FORCE_REUSE.

* Immediate connect modification

The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.

* Silent close modification

The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 370816aef0 tcp: Move code around
This is just the preparation patch, which makes the needed for
TCP repair code ready for use.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman 5f568e5afe net: Remove register_net_sysctl_table
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman a5347fe36b net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman ec8f23ce0f net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman f99e8f715a net: Convert nf_conntrack_proto to use register_net_sysctl
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 8607ddb867 net ipv4: Convert devinet to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 6105e29320 net ipv6: Convert addrconf to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 9bdcc88fa0 net decnet: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 8f40a1f982 net neighbour: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 6dceb03687 net ipv6: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 64fb301040 net llc: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables with .child
entries.

Kill the intermediate tables and use register_net_sysctl directly to
remove the need for compatibility code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 0ca7a4c87d net ax25: Simplify and cleanup the ax25 sysctl handling.
Don't register/unregister every ax25 table in a batch.  Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.

This moves ax25 to be a completely modern sysctl user.  Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:28 -04:00
Eric W. Biederman 4e5ca78541 net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.

Delete the ipv4_path and the ipv4_skeleton these are no longer needed.

Directly register the ipv4_route_table.

And since I am an idiot remove the header definitions that I should
have removed in the previous patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman a5287acc6c net ipv6: Remove unneded registration of an empty net/ipv6/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
should cause no user visible changes.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman 45bad91498 net core: Remove unneded creation of an empty net/core sysctl directory
On the next line we register the net_core_table in net/core which
creates the directory and ensures it exists.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman 5dd3df105b net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman 4344475797 net: Kill register_sysctl_rotable
register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman 2ca794e5e8 net sysctl: Initialize the network sysctls sooner to avoid problems.
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough.  So to avoid races call net_sysctl_init from sock_init().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman bc8a36942a net sysctl: Register an empty /proc/sys/net
Implementation limitations of the sysctl core won't let /proc/sys/net
reside in a network namespace.  /proc/sys/net at least must be registered
as a normal sysctl.  So register /proc/sys/net early as an empty directory
to guarantee we don't violate this constraint and hit bugs in the sysctl
implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman ab41a2ca50 net: Implement register_net_sysctl.
Right now all of the networking sysctl registrations are running in a
compatibiity mode.  The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table.  Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.

Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.

I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built.  Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:15 -04:00
David S. Miller 167de77fd4 Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-04-20 20:40:31 -04:00
Allan Stephens 9d52ce4bd3 tipc: Ensure network address change doesn't impact configuration service
Enhances command validation done by TIPC's configuration service so
that it works properly even if the node's network address is changed in
mid-operation. The default node address of <0.0.0> is now recognized as an
alias for "this node" even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:50 -04:00
Allan Stephens 630d920dca tipc: Ensure network address change doesn't impact rejected message
Revises handling of a rejected message to ensure that a locally
originated message is returned properly even if the node's network
address is changed in mid-operation. The routine now treats the
default node address of <0.0.0> as an alias for "this node" when
determining where to send a returned message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:49 -04:00
Allan Stephens 8a55fe74b1 tipc: handle <0.0.0> as an alias for this node on outgoing msgs
Revises handling of send routines for payload messages to ensure that
they are processed properly even if the node's network address is
changed in mid-operation. The routines now treat the default node
address of <0.0.0> as an alias for "this node" when determining where
to send an outgoing message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:48 -04:00
Allan Stephens b8f683d126 tipc: properly handle off-node send requests with invalid addr
There are two send routines that might conceivably be asked by an
application to send a message off-node when the node is still using
the default network address.  These now have an added check that
detects this and rejects the message gracefully.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:47 -04:00
Allan Stephens 974a5a864b tipc: take lock while updating node network address
The routine that changes the node's network address now takes TIPC's
network lock in write mode while the main address variable and associated
data structures are being changed; this is needed to ensure that the
link subsystem won't attempt to send a message off-node until the sending
port's message header template has been updated with the node's new
network address.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:46 -04:00
Allan Stephens f0712e86b7 tipc: Ensure network address change doesn't impact local connections
Revises routines that deal with connections between two ports on
the same node to ensure the connection is not impacted if the node's
network address is changed in mid-operation. The routines now treat
the default node address of <0.0.0> as an alias for "this node" in
the following situations:

1) Incoming messages destined to a connected port now handle the alias
properly when validating that the message was sent by the expected
peer port, ensuring that the message will be accepted regardless of
whether it specifies the node's old network address or it's current one.

2) The code which completes connection establishment now handles the
alias properly when determining if the peer port is on the same node
as the connected port.

An added benefit of addressing issue 1) is that some peer port
validation code has been relocated to TIPC's socket subsystem, which
means that validation is no longer done twice when a message is
sent to a non-socket port (such as TIPC's configuration service or
network topology service).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:45 -04:00
Allan Stephens d0e17fedc2 tipc: delete duplicate peerport/peernode helper functions
Prior to commit 23dd4cce38

    "tipc: Combine port structure with tipc_port structure"

there was a need for the two sets of helper functions.  But
now they are just duplicates.  Remove the globally visible
ones, and mark the remaining ones as inline.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:43 -04:00
Allan Stephens f21536d1e7 tipc: Ensure network address change doesn't impact new port
Re-orders port creation logic so that the initialization of a new
port's message header template occurs while the port list lock is
held. This ensures that a change to the node's network address that
occurs at the same time as the port is being created does not result
in the template identifying the sender using the former network
address. The new approach guarantees that the new port's template is
using the current network address or that it will be updated when
the address changes.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:42 -04:00
Allan Stephens 5eb0a291fb tipc: Optimize re-initialization of port message header templates
Removes an unnecessary check in the logic that updates the message
header template for existing ports when a node's network address is
first assigned. There is no longer any need to check to see if the
node's network address has actually changed since the calling routine
has already verified that this is so.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:41 -04:00
Allan Stephens d4f5c12cdf tipc: Ensure network address change doesn't impact name table updates
Revises routines that add and remove an entry from a node's name table
so that the publication scope lists are updated properly even if the
node's network address is changed in mid-operation. The routines now
recognize the default node address of <0.0.0> as an alias for "this node"
even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:40 -04:00
Allan Stephens 336ebf5bf5 tipc: Add routines for safe checking of node's network address
Introduces routines that test whether a given network address is
equal to a node's own network address or if it lies within the node's
own network cluster, and which work properly regardless of whether
the node is using the default network address <0.0.0> or a non-zero
network address that is assigned later on. In essence, these routines
ensure that address <0.0.0> is treated as an alias for "this node",
regardless of which network address the node is actually using.

Old users of the pre-existing more strict match in_own_cluster()
have been accordingly redirected to what is now called
in_own_cluster_exact() --- which does not extend matching to <0,0,0>.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:39 -04:00
Allan Stephens fd6eced8a4 tipc: Don't record failed publication attempt as a success
No longer increments counter of number of publications by a node
if an attempt to add a new publication fails. This prevents TIPC from
incorrectly blocking future publications because the configured maximum
number of publications has been reached.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:37 -04:00
Allan Stephens 1110b8d33a tipc: Update node-scope publications when network address is assigned
Ensures that node-scope name publications that exist prior to the
configuration of a node's network address are properly re-initialized
with that address when it is assigned. TIPC's node-scope publications
are now tracked using a publications list like the lists used for
cluster-scope and zone-scope publications so they can be easily updated
when required.

The inclusion of node scope name publications in a conventional publication
list means that they must now also be withdrawn, just like cluster and zone
scope publications are currently withdrawn.  So some conditional tests on
scope ==/!= TIPC_NODE_SCOPE are inserted/removed accordingly.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:36 -04:00
Allan Stephens a909804f7c tipc: Separate cluster-scope and zone-scope names into distinct lists
Utilizes distinct lists to track zone-scope and cluster-scope names
published by a node. For now, TIPC continues to process the entries
in both lists in the same way; however, an upcoming patch will utilize
the existence of the lists to prevent the sending of cluster-scope names
to nodes that are not part of the local cluster.

To achieve this, an array of publication lists is introduced, so
that they can be iterated over and accessed via publ->scope as
an index where convenient.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:05 -04:00
Eric W. Biederman 3adadc08cc net ax25: Reorder ax25_exit to remove races.
While reviewing the sysctl code in ax25 I spotted races in ax25_exit
where it is possible to receive notifications and packets after already
freeing up some of the data structures needed to process those
notifications and updates.

Call unregister_netdevice_notifier early so that the rest of the cleanup
code does not need to deal with network devices.  This takes advantage
of my recent enhancement to unregister_netdevice_notifier to send
unregister notifications of all network devices that are current
registered.

Move the unregistration for packet types, socket types and protocol
types before we cleanup any of the ax25 data structures to remove the
possibilities of other races.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 15:37:48 -04:00
Eric Dumazet cbf8f7bb20 ipv4: dont drop packet in defrag but consume it
When defragmentation is finalized, we clone a packet and kfree_skb() it.

Call consume_skb() to not confuse dropwatch, since its not a drop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:25:51 -04:00
Eric Dumazet daa8654828 net: gro: GRO_MERGED_FREE consumes packets
As part of GRO processing, merged skbs should be consumed, not freed, to
not confuse dropwatch/drop_monitor.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:56 -04:00
Eric Dumazet 85bb2a60fa net: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 7604adc2ff ipv6: dccp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet abc4e4fa29 packet: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet ab185d7b25 ipv6: tcp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 8460c00f6e netlink: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 9ff264492f ip6_tunnel: dont drop packet but consume it
When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Shan Wei 7426a5645f net: fix compile error of leaking kmemleak.h header
net/core/sysctl_net_core.c: In function ‘sysctl_core_init’:
net/core/sysctl_net_core.c:259: error: implicit declaration of function ‘kmemleak_not_leak’

with same error in net/ipv4/route.c

Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 00:11:39 -04:00
Eric Dumazet 22b4a4f22d tcp: fix retransmit of partially acked frames
Alexander Beregalov reported skb_over_panic errors and provided stack
trace.

I occurs commit a21d45726a (tcp: avoid order-1 allocations on wifi and
tx path) added a regression, when a retransmit is done after a partial
ACK.

tcp_retransmit_skb() tries to aggregate several frames if the first one
has enough available room to hold the following ones payload. This is
controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default :
enabled)

Problem is we must make sure _pskb_trim_head() doesnt fool
skb_availroom() when pulling some bytes from skb (this pull is done when
receiver ACK part of the frame).

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 16:52:45 -04:00
David S. Miller 56fa9b1630 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
From John:

Another batch of fixes intended for 3.4...

First up, we have a minor signedness fix for libertas from Amitkumar
Karwar.  Next, Arend gives us a brcm80211 fix for correctly enabling
Tx FIFOs on channels 12 and 13.  Bing Zhao gives us some register
address corrections for mwifiex.  Felix give us a trio of fixes --
one for ath9k to wake the hardware properly from full sleep, one for
mac80211 to properly handle packets in cooked monitor mode, and one
for ensuring that the proper HT mode selection is honored.

Hauke gives us a bcma fix for handling the lack of an sprom.  Jonathon
Bither gives us an ath5k fix for a missing THIS_MODULE build issue,
and another ath5k fix for an io mapping leak.  Lukasz Kucharczyk
fixes a bitwise check in cfg80211, and Sujith gives us an ath9k fix
for assigning sequence numbers for fragmented frames.  Finally, we
have a MAINTAINERS change from Wey-Yi Guy -- congrats to Johannes
Berg for taking the lead on iwlwifi. :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 15:26:52 -04:00
John W. Linville 59ef43e681 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
	include/net/nfc/nfc.h
	net/nfc/netlink.c
	net/wireless/nl80211.c
2012-04-18 14:27:48 -04:00
John W. Linville dbd717e37b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-04-18 13:37:33 -04:00
David S. Miller 91fbe33034 Included changes:
* remove duplicated line in comment
 * add htons() invocation for tt_crc as suggested by Al Viro
 * OriGinator Message seqno initial value is now random
 * some cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJPjnVmAAoJEFMQTLzJFOZFqpkH/33gzND7Ukfdax6CPYqb1AVm
 A63gtnZlNCwPf7dCJkq4yF4RVn/ir1pp+BwX5C9BIN9V/ZSaTsIKsMXAaZzUK3DH
 PCZEJCn+iys+ZX5KrpLum0wMSQyxt08GsGZLueiu+Rm0zRZLSCy58THNqLt2b6ZK
 mDH6tdbGxKXxrKeWzVz3PzQv8dPuFqApPiQ+M6ugf4YvjdYYEiGWFn8gad+XObeA
 oxbFGMt6MKdc+9EsKqd0Br1lqHiQ+RC2xXQiFEBizPe34LiYJ69irkEBki/6KV9Z
 ujeB0RxlMHXL75vUWoqyGcv/F2lzZd/tXQA6qz7ioCBHqzb1Mk/KGmVJ3KZ5CK8=
 =9lku
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
* remove duplicated line in comment
* add htons() invocation for tt_crc as suggested by Al Viro
* OriGinator Message seqno initial value is now random
* some cleanups and fixes
2012-04-18 13:21:59 -04:00
Stanislav Kinsbursky adae0fe0ea SUNRPC: register PipeFS file system after pernet sybsystem
PipeFS superblock creation routine relays on SUNRPC pernet data presense, which
is created on register_pernet_subsys() call in SUNRPC module init function.
Registering of PipeFS filesystem prior to registering of per-net subsystem
leads to races (mount of PipeFS can dereference uninitialized data).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-18 11:05:48 -04:00
Allan Stephens e11aa05971 tipc: Factor out name publication code to a separate function
This is done so that it can be reused with differing publication
lists, instead of being hard coded to the cluster publicaton list.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-18 09:36:02 -04:00
Allan Stephens 3f8375fee3 tipc: introduce publication lists struct
There is currently a single list that is containing both cluster-scope and
zone-scope publications, and the list count is a separate free floating
variable.  Create a struct to bind the count to the list, and to pave
the way for factoring out the publications into zone/cluster/node scope.

The current "publ_root" most matches what will be the cluster scope
list, so it is named accordingly in this commit.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-18 09:36:02 -04:00
Antonio Quartulli 1e5cc266db batman-adv: skip the window protection test when the originator has no neighbours
When we receive an OGM from from a node for the first time, the last_real_seqno
field of the orig_node structure has not been initialised yet. The value of this
field is used to compute the current ogm-seqno window and therefore the
protection mechanism will probably drop the packet due to an out-of-window error.
To avoid this situation this patch adds a check to skip the window protection
mechanism if no neighbour nodes have already been added. When the first
neighbour node is added, the last_real_seqno field is initialised too.

Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:02 +02:00
Antonio Quartulli c97c72b493 batman-adv: print OGM seq numbers as unsigned int
OGM sequence numbers are declared as uint32_t and so they have to printed
using %u instead of %d in order to avoid wrong representations.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:02 +02:00
Antonio Quartulli 0d125074eb batman-adv: use ETH_HLEN instead of sizeof(struct ethhdr)
Instead of using sizeof(struct ethhdr) it is strongly recommended to use the
kernel macro ETH_HLEN. This patch substitute each occurrence of the former
expressione with the latter one.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:01 +02:00
Marek Lindner 1eeb479fda batman-adv: mark existing ogm variables as batman iv
The coming protocol changes also will have a part called "OGM". That
makes it necessary to introduce a distinction in the code base.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:01 +02:00
Marek Lindner 76e3d7fc1a batman-adv: rename BATMAN_OGM_LEN to BATMAN_OGM_HLEN
Using BATMAN_OGM_LEN leaves one with the impression that this is
the full packet size which is not the case. Therefore the variable
is renamed.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:00 +02:00
Marek Lindner cd8b78e7e9 batman-adv: refactoring API: find generalized name for bat_ogm_init_primary callback
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:00 +02:00
Marek Lindner 77af7575c4 batman-adv: handle routing code initialization properly
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:59 +02:00
Marek Lindner 00a50076a3 batman-adv: add iface_disable() callback to routing API
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:59 +02:00
Marek Lindner d7d32ec0f1 batman-adv: randomize initial seqno to avoid collision
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:58 +02:00
Marek Lindner c2aca02235 batman-adv: refactoring API: find generalized name for bat_ogm_init callback
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:58 +02:00
Marek Lindner 8140625e30 batman-adv: move ogm initialization into the proper function
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:57 +02:00
Antonio Quartulli e88af9464f batman-adv: remove duplicated line in comment
Remove an accidentally added duplicated line in a function comment

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:57 +02:00
Antonio Quartulli 6d2003fc26 batman-adv: convert the tt_crc to network order
Before sending out a TT_Request packet we must convert the tt_crc field value
to network order (since it is 16bits long).

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:43:36 +02:00
majianpeng 798ec84d45 net/core:Remove memleak reports by kmemleak_not_leak.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:20:28 -04:00
majianpeng 7f59388108 net/ipv4:Remove two memleak reports by kmemleak_not_leak.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:20:28 -04:00
Julian Anastasov b922934d01 netns: do not leak net_generic data on failed init
ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:05:33 -04:00
Eric Dumazet 4d846f0239 tcp: fix tcp_grow_window() for large incoming frames
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.

tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.

This patch fixes one of the performance issue we noticed with GRO on.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-17 22:32:00 -04:00
Felix Fietkau 6741e7f048 mac80211: fix logic error in ibss channel type check
The broken check leads to rate control attempting to use HT40 while
the driver is configured for HT20. This leads to interesting hardware
issues.

HT40 can only be used if the channel type is either HT40- or HT40+
and if the channel type of the cell matches the local type.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-17 14:17:04 -04:00
Felix Fietkau 973ef21a67 mac80211: fix truncated packets in cooked monitor rx
Cooked monitor rx was recently changed to use ieee80211_add_rx_radiotap_header
instead of generating only limited radiotap information.
ieee80211_add_rx_radiotap_header assumes that FCS info is still present if
the hardware supports receiving it, however when cooked monitor rx packets
are processed, FCS info has already been stripped.
Fix this by adding an extra flag indicating FCS presence.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-17 14:17:04 -04:00
David Ward 244b65dbfe net_sched: gred: Fix oops in gred_dump() in WRED mode
A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).

However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:

tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
    DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS

This fixes gred_dump() in WRED mode to use the values held in wred_set.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 23:51:07 -04:00
Daniel Baluta a75afd4770 can: fix sparse warning for cgw_list
Make cgw_list static to remove the following sparse warning:
net/can/gw.c:69:1: warning: symbol 'cgw_list' was not declared.
Should it be static?

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-04-16 21:08:18 +02:00
Wey-Yi Guy 1dae27f84b mac80211: add function retrieve average rssi
Add utility function to provide the average rssi per vif

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:38:49 -04:00
Rajkumar Manoharan f9616e0f88 cfg80211: increse bss expire time
The background scan completion takes more time when the station is
having heavy uplink traffic. The scan state machine decides to fall
back to home channel on every off-channel visit when there are pending
frames in tx queue. bgscan completion took ~30sec on dual band US
regulatory card.

scan period = (20 active channels * probe timeout) +
              (12 passive channels * passive probe timeout) +
              (32 * timeout on home channel) +
              (32 * flush timeout)

Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:38:43 -04:00
Javier Cardona ec14bcd20f mac80211: Take into account TSF adjustment latency in Toffset setpoint
When testing mesh synchronization we observed a global TSF slowdown that
was dependent on the number of synchronized mesh stations.  This seems
to be caused by the TSF adjustment (read/write) latency.

Adding a small margin to the Toffset setpoint solved the problem.

Signed-off-by: Shinichi Hotori <hotorinn@gmail.com>
Signed-off-by: Yu Niiro <yu.niiro@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:19:29 -04:00
Javier Cardona a802a6eba1 mac80211: Choose a new toffset setpoint if a big tsf jump is detected.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:19:29 -04:00
Luciano Coelho bb3e10fb58 mac80211: check IEEE80211_HW_QUEUE_CONTROL in ieee80211_check_queues()
Commit 3a25a8c8 (mac80211: add improved HW queue control) introduced a
bug when running in AP mode without the IEEE80211_HW_QUEUE_CONTROL
flag set.  The ieee80211_check_queues() function always returns
-EINVAL, preventing AP mode from starting.  To fix this, check whether
this flag is set before checking if cab_queue is set properly.

Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:16:58 -04:00
Johannes Berg 8e8b41f9d8 cfg80211: enforce lack of interface combinations
My grand plan to allow drivers to gradually move over
to advertising virtual interface combinations and only
enforce with drivers that do want it enforced doesn't
seem to be working out, only Christian ever added the
advertising (to carl9170), nobody else did.

Begin enforcing combinations in cfg80211 so that users
can rely on the information reported about a device.

Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Nick Kossifidis <mickflemm@gmail.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:16:58 -04:00
Vishal Agarwal 6ec5bcadc2 Bluetooth: Temporary keys should be retained during connection
If a key is non persistent then it should not be used in future
connections but it should be kept for current connection. And it
should be removed when connecion is removed.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:45 +03:00
Vishal Agarwal 745c0ce35f Bluetooth: hci_persistent_key should return bool
This patch changes the return type of function hci_persistent_key
from int to bool because it makes more sense to return information
whether a key is persistent or not as a bool.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:40 +03:00
David S. Miller 56845d78ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/atheros/atlx/atl1.c
	drivers/net/ethernet/atheros/atlx/atl1.h

Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:19:04 -04:00
John Fastabend 3ff661c38c net: rtnetlink notify events for FDB NTF_SELF adds and deletes
It is useful to be able to monitor for FDB events in user space.
This patch adds support to generate netlink events when a change
is made to a device supporting the FDB ops.

This brings embedded switches inline with the SW net/bridge which
triggers events on FDB updates as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend d83b060360 net: add fdb generic dump routine
This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

v2: return error on nlmsg_put and use -EMSGSIZE instead
    of -ENOMEM this is inline other usages

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend 12a9463445 net: addr_list: add exclusive dev_uc_add and dev_mc_add
This adds a dev_uc_add_excl() and dev_mc_add_excl() calls
similar to the original dev_{uc|mc}_add() except it sets
the global bit and returns -EEXIST for duplicat entires.

This is useful for drivers that support SR-IOV, macvlan
devices and any other devices that need to manage the
unicast and multicast lists.

v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl()

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend 77162022ab net: add generic PF_BRIDGE:RTM_ FDB hooks
This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
Herbert Xu c5c2326059 bridge: Add multicast_querier toggle and disable queries by default
Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on start-up,
a toggle is provided to restore the previous behaviour.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:35 -04:00
Herbert Xu c83b8fab06 bridge: Restart queries when last querier expires
As it stands when we discover that a real querier (one that queries
with a non-zero source address) we stop querying.  However, even
after said querier has fallen off the edge of the earth, we will
never restart querying (unless the bridge itself is restarted).

This patch fixes this by kicking our own querier into gear when
the timer for other queriers expire.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:34 -04:00
Herbert Xu 748572162a bridge: Add br_multicast_start_querier
This patch adds the helper br_multicast_start_querier so that
the code which starts the queriers in br_multicast_toggle can
be reused elsewhere.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:34 -04:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Daniel Baluta 5e73ea1a31 ipv4: fix checkpatch errors
Fix checkpatch errors of the following type:
	* ERROR: "foo * bar" should be "foo *bar"
	* ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:37:19 -04:00
David S. Miller cf22f9a2b8 ipv6: Remove unused argument to addrconf_dad_start().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 21:37:40 -04:00
Vijay Subramanian a8cb05b238 tcp: Remove redundant code entering quickack mode
tcp_enter_quickack_mode() already calls tcp_incr_quickack() and sets
icsk->icsk_ack.ato  to TCP_ATO_MIN. This patch removes the duplication.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:29:02 -04:00
Alex Copot aacd9289af tcp: bind() use stronger condition for bind_conflict
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.

This tries to address the problems reported in patch:
	8d238b25b1
	Revert "tcp: bind() fix when many ports are bound"

Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:

	* 60000 sockets, 600 ports / IP:
		* 0.210 s, 620 (IP, port) duplicates without patch
		* 0.219 s, no duplicates with patch
	* 100000 sockets, 1000 ports / IP:
		* 0.371 s, 1720 duplicates without patch
		* 0.373 s, no duplicates with patch
	* 200000 sockets, 2000 ports / IP:
		* 0.766 s, 6900 duplicates without patch
		* 0.768 s, no duplicates with patch
	* 500000 sockets, 5000 ports / IP:
		* 2.227 s, 41500 duplicates without patch
		* 2.284 s, no duplicates with patch

Signed-off-by: Alex Copot <alex.mihai.c@gmail.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:28:55 -04:00
Eric Dumazet c72e118334 inet: makes syn_ack_timeout mandatory
There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:24:26 -04:00
Eric Dumazet fd4f2cead6 tcp: RFC6298 supersedes RFC2988bis
Updates some comments to track RFC6298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:24:26 -04:00
stephen hemminger 87b6d218f3 tunnel: implement 64 bits statistics
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 14:47:05 -04:00
Mohammed Shafi Shajakhan 0446b49c33 mac80211: remove ieee80211_rx_bss_get
its not used where, while we directly obtain ieee80211_bss's
pointer in ibss.c by calling cfg80211_get_bss

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:51 -04:00
Michal Kazior 4ee73f338a mac80211: remove hw.conf.channel usage where possible
Removes hw.conf.channel usage from the following functions:
 * ieee80211_mandatory_rates
 * ieee80211_sta_get_rates
 * ieee80211_frame_duration
 * ieee80211_rts_duration
 * ieee80211_ctstoself_duration

This is in preparation for multi-channel operation.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:50 -04:00
Lorenzo Bianconi d01b31604c mac80211: fix an issue in ieee80211_tx_info count field management
I noticed a possible issue in the status count field management of the
ieee80211_tx_info data structure. In particular, when the AGGR
processing is employed,
status.rates[].count is set just for the first frame and not for
others belonging to the same burst, leading to wrong statistic data in
the mac80211 debug file system.

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:49 -04:00
Pontus Fuchs d91df0e3a1 cfg80211: Add channel information to NL80211_CMD_GET_INTERFACE
If the current channel is known, add frequency and channel type to
NL80211_CMD_GET_INTERFACE.

Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:49 -04:00
Stanislaw Gruszka 5f5460706e mac80211: protect ->scanning by mutex in ieee80211_work_work()
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:31:50 -04:00
Stanislaw Gruszka 133d40f9a2 mac80211: do not scan and monitor connection in parallel
Before we send probes in connection monitoring we check if scan is not
pending. But we do that check without locking. Fix that and also do not
start scan if connection monitoring is in progress.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:31:49 -04:00
Lukasz Kucharczyk e55a4046da cfg80211: fix interface combinations check.
Signed-off-by: Lukasz Kucharczyk <lukasz.kucharczyk@tieto.com>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:05:35 -04:00
Rémi Denis-Courmont 8e052e5289 Phonet: missing headers (sparse)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:03:16 -04:00
Rémi Denis-Courmont 3c490a87f6 Phonet: phonet_net_id can be static (sparse)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:03:16 -04:00
Hiroaki SHIMODA dcd2ba92e8 neighbour: Make neigh_table_init_no_netlink() static.
neigh_table_init_no_netlink() is only used in net/core/neighbour.c file.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:00:44 -04:00
Eric Dumazet 447167bf56 udp: intoduce udp_encap_needed static_key
Most machines dont use UDP encapsulation (L2TP)

Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
test if L2TP never setup the encap_rcv on a socket.

Idea of this patch came after Simon Horman proposal to add a hook on TCP
as well.

If static_key is not yet enabled, the fast path does a single JMP .

When static_key is enabled, JMP destination is patched to reach the real
encap_type/encap_rcv logic, possibly adding cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:39:37 -04:00
stephen hemminger efacb309b5 rtnetlink & bonding: change args got get_tx_queues
Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.

Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:31:00 -04:00
David Ward 6f66cdc3e5 net/garp: fix GID rbtree ordering
The comparison operators were backwards in both garp_attr_lookup and
garp_attr_create, so the entire GID rbtree was in reverse order.
(There was no practical side effect to this though, except that PDUs
were sent with attributes listed in reverse order, which is still
valid by the protocol. This change is only for clarity.)

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:10:33 -04:00
David Woodhouse 9d02daf754 pppoatm: Fix excessive queue bloat
We discovered that PPPoATM has an excessively deep transmit queue. A
queue the size of the default socket send buffer (wmem_default) is
maintained between the PPP generic core and the ATM device.

Fix it to queue a maximum of *two* packets. The one the ATM device is
currently working on, and one more for the ATM driver to process
immediately in its TX done interrupt handler. The PPP core is designed
to feed packets to the channel with minimal latency, so that really
ought to be enough to keep the ATM device busy.

While we're at it, fix the fact that we were triggering the wakeup
tasklet on *every* pppoatm_pop() call. The comment saying "this is
inefficient, but doing it right is too hard" turns out to be overly
pessimistic... I think :)

On machines like the Traverse Geos, with a slow Geode CPU and two
high-speed ADSL2+ interfaces, there were reports of extremely high CPU
usage which could partly be attributed to the extra wakeups.

(The wakeup handling could actually be made a whole lot easier if we
 stop checking sk->sk_sndbuf altogether. Given that we now only queue
 *two* packets ever, one wonders what the point is. As it is, you could
 already deadlock the thing by setting the sk_sndbuf to a value lower
 than the MTU of the device, and it'd just block for ever.)

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:03:45 -04:00
Gao feng 1716a96101 ipv6: fix problem with expired dst cache
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 12:58:29 -04:00
Dmitry Tarnyagin 447648128e caif: set traffic class for caif packets
Set traffic class for CAIF packets, based on socket
priority, CAIF protocol type, or type of message.

Traffic class mapping for different packet types:
 - control:       TC_PRIO_CONTROL;
 - flow control:  TC_PRIO_CONTROL;
 - at:            TC_PRIO_CONTROL;
 - rfm:           TC_PRIO_INTERACTIVE_BULK;
 - other sockets: equals to socket's TC;
 - network data:  no change.

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:37:36 -04:00
David S. Miller e65ac4d545 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get CAIF bug fixes upon which
the following set of CAIF feature patches depend.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:36:48 -04:00
Tomasz Gregorek 5c699fb7d8 caif: Fix memory leakage in the chnl_net.c.
Added kfree_skb() calls in the chnk_net.c file on
the error paths.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00