These patches implement the final mechanism necessary to really allow
us to go without the route cache in ipv4.
We need a place to have long-term storage of PMTU/redirect information
which is independent of the routes themselves, yet does not get us
back into a situation where we have to write to metrics or anything
like that.
For this we use an "next-hop exception" table in the FIB nexthops.
The one thing I desperately want to avoid is having to create clone
routes in the FIB trie for this purpose, because that is very
expensive. However, I'm willing to entertain such an idea later
if this current scheme proves to have downsides that the FIB trie
variant would not have.
In order to accomodate this any such scheme, we need to be able to
produce a full flow key at PMTU/redirect time. That required an
adjustment of the interface call-sites used to propagate these events.
For a PMTU/redirect with a fully specified socket, we pass that socket
and use it to produce the flow key.
Otherwise we use a passed in SKB to formulate the key. There are two
cases that need to be distinguished, ICMP message processing (in which
case the IP header is at skb->data) and output packet processing
(mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)).
We also have to make the code able to handle the case where the dst
itself passed into the dst_ops->{update_pmtu,redirect} method is
invalidated. This matters for calls from sockets that have cached
that route. We provide a inet{,6} helper function for this purpose,
and edit SCTP specially since it caches routes at the transport rather
than socket level.
Signed-off-by: David S. Miller <davem@davemloft.net>
In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.
This is implemented here via nexthop exceptions.
The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5. A more sophisticated scheme can be
devised if that proves necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.
Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.
Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)
Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used so that we can compose a full flow key.
Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.
In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
I know that we're in fairly late stage to request pulls, but the IPVS people
pinged me with little patches with oops fixes last week.
One of them was recently introduced (during the 3.4 development cycle) while
cleaning up the IPVS netns support. They are:
* Fix one regression introduced in 3.4 while cleaning up the
netns support for IPVS, from Julian Anastasov.
* Fix one oops triggered due to resetting the conntrack attached to the skb
instead of just putting it in the forward hook, from Lin Ming. This problem
seems to be there since 2.6.37 according to Simon Horman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
on disassoc, ieee80211_set_disassoc() goes out of PS
before indicating BSS_CHANGED_ASSOC (not sure why this
is needed, but some drivers might count on the current
behavior).
However, it does it after sending the disassoc
frame, which results in null-data frame being sent
(in order to go out of ps) after we were already sent
the disassoc, which is invalid.
Fix it by going out of ps before sending the disassoc.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
regulatory_update() just calls wiphy_update_regulatory().
wiphy_update_regulatory() assumes you already have
the reg_mutex held so just move the call within locking
context and kill the superfluous regulatory_update().
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that we have wiphy_regulatory_register() we can
tuck away the core's regulatory_update() call there
and make it static.
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes it clearer what we're doing. This now makes a bit
more sense given that regardless of the wiphy if the cell
base station hint feature is supported we will be modifying the
way the regulatory core behaves.
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cellular base stations can provide hints to cfg80211 about
where they think we are. This can be done for example on
a cell phone. To enable these hints we simply allow them
through as user regulatory hints but we allow userspace
to clasify the hint as either coming directly from the
user or coming from a cellular base station. This option
is only available when you enable
CONFIG_CFG80211_CERTIFICATION_ONUS.
The base station hints themselves will not be processed
by the core unless at least one device on the system
supports this feature.
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds CONFIG_CFG80211_CERTIFICATION_ONUS which is to
be used for features / code which require a bit of work on
the system integrator's part to ensure that the system will
still pass 802.11 regulatory certification. This option is
also usable for researchers and experimenters looking to add
code in the kernel without impacting compliant code.
We'd use CONFIG_EXPERT alone but it seems that most standard
Linux distributions are enabling CONFIG_EXPERT already. This
allows us to define 802.11 specific kernel features under a
flag that is intended by design to be disabled by standard
Linux distributions, and only enabled by system integrators
or distributions that have done work to ensure regulatory
certification on the system with the enabled features.
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After commit 39f618b4fd (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While adding regulatory support to ath6kl I noticed that I easily
got the regulatory code confused. The way to reproduce the bug was:
1. iw reg set FI (in userspace)
2. cfg80211 calls ath6kl_reg_notify(FI)
3. ath6kl sets regdomain in firmware
4. firmware sends regdomain event to notify about the new regdomain (FI)
5. ath6kl calls regulatory_hint(FI)
And this (from FI to FI transition) confuses cfg80211 and after that I
only get "Pending regulatory request, waiting for it to be
processed...." messages and regdomain changes won't work anymore.
The reason why ath6kl calls regulatory_hint() is that firmware can change
the regulatory domain by it's own, for example due to 11d IEs. I could
of course workaround this in ath6kl but I think it's better to handle
the case in cfg80211.
The fix is pretty simple, use a different error code if the regdomain is
same and then just set the request processed so that it doesn't block new
requests.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Let the user configure serveral TX error conection quality monitoring
parameters: % error rate, survey interval, and # of attempted packets.
On exceeding the TX failure rate over the given interval, the driver
will send a CQM notify event with the actual TX failure rate and
packets attempted.
Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In one of my previous patches I erroneously
used nla_put_u32 for the wdev_id, fix that
to use nla_put_u64.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit "mac80211: unify SW/offload remain-on-channel"
moved the cookie assignment from ieee80211_mgmt_tx()
to ieee80211_start_roc_work(). But the latter is only
called where offchannel is needed. If offchannel isn't
needed/used, a uninitialized cookie value would be returned
to userspace.
This patch sets the cookie value when offchannel isn't used.
Signed-off-by: Nicolas Cavallari <cavallar@lri.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.
Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)
If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.
Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.
Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to check the passed in multicast address and return
appropriate errno(EINVAL) if it is not valid. And it's no need
to walk through the ipv6_mc_list in this situation.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following sparse warning:
* symbol 'sctp_init_cause_fixed' was not declared. Should it be
static?
Signed-off-by: Ioan Orghici <ioanorghici@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least there seems to be no reason to disallow ROSE sockets when
NETROM is loaded.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netem does an early orphaning of skbs. Doing so breaks TCP Small Queue
or any mechanism relying on socket sk_wmem_alloc feedback.
Ideally, we should perform this orphaning after the rate module and
before the delay module, to mimic what happens on a real link :
skb orphaning is indeed normally done at TX completion, before the
transit on the link.
+-------+ +--------+ +---------------+ +-----------------+
+ Qdisc +---> Device +--> TX completion +--> links / hops +->
+ + + xmit + + skb orphaning + + propagation +
+-------+ +--------+ +---------------+ +-----------------+
< rate limiting > < delay, drops, reorders >
If netem is used without delay feature (drops, reorders, rate
limiting), then we should avoid early skb orphaning, to keep pressure
on sockets as long as packets are still in qdisc queue.
Ideally, netem should be refactored to implement delay module
as the last stage. Current algorithm merges the two phases
(rate limiting + delay) so its not correct.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Mark Gordon <msg@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
unregister_netdevice_notifier() must be called before
unregister_pernet_subsys() to avoid accessing already freed
pernet memory. This fixes the following oops when doing rmmod:
Call Trace:
[<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
[<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
[<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
[<ffffffff810e7734>] sys_delete_module+0x1a4/0x300
[<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
[<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
[<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f
RIP
[<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
there are some out of bound accesses in netprio cgroup.
now before accessing the dev->priomap.priomap array,we only check
if the dev->priomap exist.and because we don't want to see
additional bound checkings in fast path, so we should make sure
that dev->priomap is null or array size of dev->priomap.priomap
is equal to max_prioidx + 1;
so in write_priomap logic,we should call extend_netdev_table when
dev->priomap is null and dev->priomap.priomap_len < max_len.
and in cgrp_create->update_netdev_tables logic,we should call
extend_netdev_table only when dev->priomap exist and
dev->priomap.priomap_len < max_len.
and it's not needed to call update_netdev_tables in write_priomap,
we can only allocate the net device's priomap which we change through
net_prio.ifpriomap.
this patch also add a return value for update_netdev_tables &
extend_netdev_table, so when new_priomap is allocated failed,
write_priomap will stop to access the priomap,and return -ENOMEM
back to the userspace to tell the user what happend.
Change From v3:
1. add rtnl protect when reading max_prioidx in write_priomap.
2. only call extend_netdev_table when map->priomap_len < max_len,
this will make sure array size of dev->map->priomap always
bigger than any prioidx.
3. add a function write_update_netdev_table to make codes clear.
Change From v2:
1. protect extend_netdev_table by RTNL.
2. when extend_netdev_table failed,call dev_put to reduce device's refcount.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hash size is doubled when it needs to grow and compared against
hash_max. The >= comparison will limit the hash table size to half
of what is expected i.e. the default 512 hash_max will not allow
the hash table to grow larger than 256.
Also print the hash table limit instead of the desirable size when
the limit is reached.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userspace implementations of network routing protocols sometimes need to
tell RA-originated IPv6 routes from other kernel routes to make proper
routing decisions. This makes most sense for RA routes with nexthops,
namely, default routes and Route Information routes.
The intended mean of preserving RA route origin in a netlink message is
through indicating RTPROT_RA as protocol code. Function rt6_fill_node()
tried to do that for default routes, but its test condition was taken
wrong. This change is modeled after the original mailing list posting
by Jeff Haran. It fixes the test condition for default route case and
sets the same behaviour for Route Information case (both types use
nexthops). Handling of the 3rd RA route type, Prefix Information, is
left unchanged, as it stands for interface connected routes (without
nexthops).
Signed-off-by: Denis Ovsienko <infrastation@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lenght field should be encoded using big endian byte order, such as intend in the specs.
As it is currently written, the len field would not be decoded properly on an implementation using the correct byte ordering. Hence, it could lead to interroperability issues.
Also, I rewrote the code so that iphc0 argument of lowpan_alloc_new_frame could be removed.
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tag field should be stored and accessed using big endian byte order (as
intended in the specs). Or else, when displayed with a trafic analyser, such a
Wireshark, the field not properly displayed (e.g. 0x01 00 instead of 0x00 01,
and so on).
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a UDP packet gets fragmented, a crash will occur at reassembly time.
This is because skb->transport_header is not set during earlier period of fragment reassembly.
As a consequence, call to udp_hdr() return NULL and uh (which is NULL) gets
dereferenced without much test.
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker says:
====================
This is the same eight commits as sent for review last week[1],
with just the incorporation of the pr_fmt change as suggested
by JoeP. There was no additional change requests, so unless you
can see something else you'd like me to change, please pull.
...
Erik Hugne (5):
tipc: use standard printk shortcut macros (pr_err etc.)
tipc: remove TIPC packet debugging functions and macros
tipc: simplify print buffer handling in tipc_printf
tipc: phase out most of the struct print_buf usage
tipc: remove print_buf and deprecated log buffer code
Paul Gortmaker (3):
tipc: factor stats struct out of the larger link struct
tipc: limit error messages relating to memory leak to one line
tipc: simplify link_print by divorcing it from using tipc_printf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A few days ago Dave Jones reported this oops:
[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140] ffff880147c03a10
[22766.387140] ffffffffa169f2b6
[22766.387140] ffff88013ed95728
[22766.387143] 0000000000000002
[22766.387143] 0000000000000000
[22766.387143] ffff880003fad062
[22766.387144] ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <ffff880147c039b0>
[22766.387142] ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.
Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.
I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.
I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.
I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch sock_diag works for init_net only and dumps
information about sockets from all namespaces.
This patch expands sock_diag for all name-spaces.
It creates a netlink kernel socket for each netns and filters
data during dumping.
v2: filter accoding with netns in all places
remove an unused variable.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add three SNMP TCP counters, to better track TCP behavior
at global stage (netstat -s), when packets are received
Out Of Order (OFO)
TCPOFOQueue : Number of packets queued in OFO queue
TCPOFODrop : Number of packets meant to be queued in OFO
but dropped because socket rcvbuf limit hit.
TCPOFOMerge : Number of packets in OFO that were merged with
other packets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors. This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.
Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register(). And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).
Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.
A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any. I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.
This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch is based on a user space (hciops) patch which never made it
upstream but does make sense to include in the mgmt part of the kernel.
(User space) commit message from Dmitriy Paliy:
"
Page scan interval in fast connectable mode is changed from 22.5 msec to
160 msec to perform less aggressive page scanning. This is done
accordingly to controller vendor recommendation.
Primary concern is that current parameters 22.5 interval, 11.25 window,
and interleaved scanning occupy whole radio bandwidth. Changing interval
to 160 msec should be sufficient for both speeding up connection
establishment and leaving space for other activities, like inquiry scan,
e.g.
"
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adjusts the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).
Signed-off-by: David S. Miller <davem@davemloft.net>
This abstracts away the call to dst_ops->update_pmtu() so that we can
transparently handle the fact that, in the future, the dst itself can
be invalidated by the PMTU update (when we have non-host routes cached
in sockets).
So we try to rebuild the socket cached route after the method
invocation if necessary.
This isn't used by SCTP because it needs to cache dsts per-transport,
and thus will need it's own local version of this helper.
Signed-off-by: David S. Miller <davem@davemloft.net>
This change addresses an L2CAP ERTM throughput problem when a remote
device does not fully utilize the available transmit window.
The L2CAP ERTM transmit window size determines the maximum number of
unacked frames that may be outstanding at any time. It is configured
separately for each direction of an ERTM connection. Each side sends a
configuration request with a tx_win field indicating how many unacked
frames it is capable of receiving before sending an ack. The
configuration response's tx_win field shows how many frames the
transmitter will actually send before waiting for an ack.
It's important to trace both the actual transmit window (to check for
validity of incoming frames) and the number of frames that the
transmitter will send before waiting (to send acks at the appropriate
time). Now there are separate tx_win and ack_win values. ack_win is
updated based on configuration responses, and is used to determine
when acks are sent.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
We start initializing the struct xfrm_dst at the first field
behind the struct dst_enty. This is error prone because it
might leave a new field uninitialized. So start initializing
the struct xfrm_dst right behind the dst_entry.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We start initializing the struct rt6_info at the first field
behind the struct dst_enty. This is error prone because it
might leave a new field uninitialized. So start initializing
the struct rt6_info right behind the dst_entry.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Linville says:
====================
Several drivers see updates: mwifiex, ath9k, iwlwifi, brcmsmac,
wlcore/wl12xx/wl18xx, and a handful of others. The bcma bus got a
lot of attention from Hauke Mehrtens. The cfg80211 component gets
a flurry of patches for multi-channel support, and the mac80211
component gets the first few VHT (11ac) and 60GHz (11ad) patches.
This also includes the removal of the iwmc3200 drivers, since the
hardware never became available to normal people.
Additionally, the NFC subsystem gets a series of updates. According to
Samuel, "Here are the interesting bits:
- A better error management for the HCI stack.
- An LLCP "late" binding implementation for a better NFC SAP usage. SAPs are
now reserved only when there's a client for it.
- Support for Sony RC-S360 (a.k.a. PaSoRi) pn533 based dongle. We can read and
write NFC tags and also establish a p2p link with this dongle now.
- A few LLCP fixes."
Finally, this includes another pull of the fixes from the wireless
tree in order to resolve some merge issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal log buffer handling functions can now safely be
removed since there is no code using it anymore. Requests to
interact with the internal tipc log buffer over netlink (in
config.c) will report 'obsolete command'.
This represents the final removal of any references to a
struct print_buf, and the removal of the struct itself.
We also get rid of a TIPC specific Kconfig in the process.
Finally, log.h is removed since it is not needed anymore.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does. It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer. All callers of
this function that used to pass a print_buf are updated.
Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.
Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate. This function is removed since it
is no longer referenced nor needed.
A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed. This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.
The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
tipc_printf was previously used both to construct debug traces
and to append data to buffers that should be sent over netlink
to the tipc-config application. A global print_buffer was
used to format the string before it was copied to the actual
output buffer. This could lead to concurrent access of the
global print_buffer, which then had to be lock protected.
This is simplified by changing tipc_printf to append data
directly to the output buffer using vscnprintf.
With the new implementation of tipc_printf, there is no longer
any risk of concurrent access to the internal log buffer, so
the lock (and the comments describing it) are no longer
strictly necessary. However, there are still a few functions
that do grab this lock before resizing/dumping the log
buffer. We leave the lock, and these functions untouched since
they will be removed with a subsequent commit that drops the
deprecated log buffer handling code
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
To pave the way for a pending cleanup of tipc_printf, and
removal of struct print_buf entirely, we make that task simpler
by converting link_print to issue its messages with standard
printk infrastructure. [Original idea separated from a larger
patch from Erik Hugne <erik.hugne@ericsson.com>]
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The link queue traces and packet level debug functions served
a purpose during early development, but are now redundant
since there are other, more capable tools available for
debugging at the packet level.
The TIPC_DEBUG Kconfig option is removed since it does not
provide any extra debugging features anymore.
This gets rid of a lot of tipc_printf usages, which will
make the pending cleanup work of that function easier.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
All messages should go directly to the kernel log. The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.
Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.
Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG). The latter will be
phased out completely
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
We only use it to fetch the rule's tclassid, so just store the
tclassid there instead.
This also decreases the size of fib_result by a full 8 bytes on
64-bit. On 32-bits it's a wash.
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert commit b78e8ceac2
("cfg80211: track monitor channel") and remove the
set_monitor_enabled() callback.
Due to the tracking happening in NETDEV_PRE_UP, it had
introduced bugs because the monitor interface callback
would be called before the device was started. It looks
like there's no way to fix this, and using NETDEV_PRE_UP
is broken anyway (since there's no NETDEV_UP_FAIL), so
remove all that code, track interfaces in NETDEV_UP and
also stop tracking the monitor channel in cfg80211.
This mostly reverts to before the tracking, except that
we keep the interface count tracking so that setting the
monitor channel can be rejected properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This essentially reverts commit 2e165b8184 but
introduces the get_channel operation with a new
wireless_dev argument so that you can retrieve
the channel per interface. This is necessary as
even though we can track all interface channels
(except monitor) we can't track the channel type
used.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 870d37fc22.
This code doesn't work as cfg80211 will call
set_monitor_enabled at the wrong time and it
doesn't seem to be possible to fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
roc is destroyed then roc->started is referenced. Keep a local cache.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Socket state LAST_ACK should allow TSQ to send additional frames,
or else we rely on incoming ACKS or timers to send them.
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to help improve performance by reducing the number of
locked operations required to allocate a frag on x86 and other platforms.
This is accomplished by using atomic_set operations on the page count
instead of calling get_page and put_page. It is based on work originally
provided by Eric Dumazet.
In addition it also helps to reduce memory overhead when using TCP. This
is done by recycling the page if the only holder of the frame is the
netdev_alloc_frag call itself. This can occur when skb heads are stolen by
either GRO or TCP and the driver providing the packets is using paged frags
to store all of the data for the packets.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be desirable to use WoWLAN without triggers to
keep the connection alive to the AP while suspended.
Allow this use by enabling WoWLAN without triggers if
no triggers were requested.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Without the discovered target nfcid1 and its length set properly, type 2
tags detection fails with the pn544 as it checks for them from
pn544_hci_complete_target_discovered().
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Reported-by: Mathias Jeppsson <mathias.jeppsson@sonymobile.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Entries that are in a sunrpc cache but are not valid should be reported
with a leading '#' so they look like a comment.
Commit d202cce896 (sunrpc: never return expired entries in sunrpc_cache_lookup)
broke this for expired entries.
This particularly applies to entries that have been replaced by newer entries.
sunrpc_cache_update sets the expiry of the replaced entry to '0', but it
remains in the cache until the next 'cache_clean'.
The result is that if you
echo 0 2000000000 1 0 > /proc/net/rpc/auth.unix.gid/channel
several times, then
cat /proc/net/rpc/auth.unix.gid/content
It will display multiple entries for the one uid, which is at least confusing:
#uid cnt: gids...
0 1: 0
0 1: 0
0 1: 0
With this patch, expired entries are marked as comments so you get
#uid cnt: gids...
0 1: 0
# 0 1: 0
# 0 1: 0
These expired entries will never be seen by cache_check() as they are always
*after* a non-expired entry with the same key - so the extra check is only
needed in c_show()
Signed-off-by: NeilBrown <neilb@suse.de>
--
It's not a big problem, but it had me confused for a while, so it could
well confuse others.
Thanks,
NeilBrown
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
All handler->err() routines expect that we've done a pskb_may_pull()
test to make sure that IP header length + 8 bytes can be safely
pulled.
Reported-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can be used to match packets against netfilter ip sets created via ipset(8).
skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.
Since ipset is usually called from netfilter, the ematch
initializes a fake xt_action_param, pulls the ip header into the
linear area and also sets skb->data to the IP header (otherwise
matching Layer 4 set types doesn't work).
Tested-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6lowpan module starts collecting incomming frames and fragments
right after lowpan_module_init() therefor it will be better to
clean unfinished fragments in lowpan_cleanup_module() function
instead of doing it when link goes down.
Changed spinlocks type to prevent deadlock with expired timer event
and removed unused one.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function lowpan_alloc_new_frame() takes u8 tag as an argument. However,
its only caller, lowpan_process_data() passes down a u16. Hence,
the tag value can get corrupted. This prevent 6lowpan fragment reassembly of a
message when the fragment tag value is over 256.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make symbols static to avoid the following warning shown up
by sparse:
warning: symbol ... was not declared. Should it be static?
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netdev_alloc_skb_ip_align() instead of alloc_skb() to get some
extra headroom in case we need to forward this frame in a tunnel or
something else.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add method to get the device short 802.15.4 address. This call
needed by ieee802154 layer to satisfy 'iz list' request from
the user space.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert the commit 768f7c7c12 to initialize
spinlock in the more preferable way and make it static to avoid sparse
warning.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this change, running AP + station on the same wiphy
does not work since the commit "cfg80211: add channel checking
for iface combinations". The stopped AP prevents the client
from connecting to an AP on a different channel.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[line-break commit message to < 72 chars]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With the default name table size of 1024, it is possible that
the sanity check in tipc_nametbl_stop could spam out 1024
essentially identical error messages if memory was corrupted
or similar. Limit it to issuing no more than a single message.
The actual chain number (i.e. 0 --> 1023) wouldn't provide any
useful insight if/when such an instance happened, so don't
bother printing out that value.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This is done to improve readability, and so that we can give
the struct a name that will allow us to declare a local
pointer to it in code, instead of having to always redirect
through the link struct to get to it.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If the virtual monitor interface is requested
by the driver, it should also be iterated over
when the driver wants to iterate all active
interfaces.
To allow that protect it with the iflist_mtx.
Change-Id: I58ac5de2f4ce93d12c5a98ecd2859f60158d5d69
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To call cfg80211_get_chan_state() we need to lock
the wdev, so we need to lock the wdev_iter mutex
in cfg80211_can_use_iftype_chan(). This needs to
use nested locking for lockdep.
Also, cfg80211_get_chan_state() doesn't actually
use the rdev, so remove that completely including
the lock assertion that isn't needed.
Reported-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When sample_idx is set to a value other than -1 it activates
the IEEE80211_TX_CTL_RATE_CTRL_PROBE flag which disables
frame aggregation. To allow frame aggregation during fixed
rate it is necessary to set max_tp_rate, max_tp_rate2 and
max_prob_rate instead of sample_idx.
Signed-off-by: Sylvain Roger Rieunier <sylvain.roger.rieunier@gmail.com>
[reword commit message a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When drop_unencrypted is enabled and MFP is disabled,
non-robust management frames for not-yet associated STA are dropped.
This isn't visible as many management frames sent from the kernel
have TX_INTFL_DONT_ENCRYPT set and management frames injected
from a monitor vif have TX_CTL_INJECTED so aren't dropped.
But management frames sent from userspace via NL80211_CMD_FRAME
do not have this flag set, so are dropped.
This patch make it always accept non-robust management frames.
Signed-off-by: Nicolas Cavallari <cavallar@lri.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The "no key" case in key selection that decides
whether to drop the frame or not is impossible
to understand, restructure the code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[cavallar@lri.fr: removed blank line and restructured action frame clause]
Signed-off-by: Nicolas Cavallari <cavallar@lri.fr>
Some drivers (iwlegacy, iwlwifi and rt2x00) today use the
bss_conf.last_tsf value. By itself though that value is
completely worthless since it may be ancient. What really
is needed is synchronisation between some device time and
the TSF.
To clarify this, rename bss_conf.last_tsf to sync_tsf and
add sync_device_ts which is obtained from rx_status which
gets a new field device_timestamp for this purpose. This
is intentionally not using the mactime field since that
is used for other things and in IBSS is expected to sync
with the IBSS's TSF which isn't necessarily true for the
device timestamp.
Also, since we have the information and it's useful even
before the connection has been established, give all the
timing details to the driver before authenticating.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Scan receive is rather inefficient when there are
multiple virtual interfaces. We iterate all of the
virtual interfaces and then notify cfg80211 about
each beacon many times.
Redesign scan RX to happen before everything else.
Then we can also get rid of IEEE80211_RX_IN_SCAN
since we don't have to accept frames into the RX
handlers for scanning or scheduled scanning any
more. Overall, this simplifies the code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of tracking whether or not we're in a
scheduled scan, track the virtual interface
(sdata) in an RCU-protected pointer to make it
usable from RX to check the MAC address.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Making the scan_sdata pointer usable with RCU makes
it possible to dereference it in the RX path to see
if a received frame actually matches the interface
that is scanning. This is just preparations, making
the pointer __rcu.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function building probe-request IEs does not validate the band is
supported before dereferencing it. This can result in a panic when
all bands are traversed, as done during sched-scan start.
Warn when this happens and return an empty probe request. Also fix
sched-scan to not waste memory on unsupported bands.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The new P2P Device will have to be able to scan for
P2P search, so move scanning to use struct wireless_dev
instead of struct net_device.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After a new virtual interface is created, reply
to userspace with a message detailing it so it
knows the new wdev identifier.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to be able to create P2P Device wdevs, move
the virtual interface management over to wireless_dev
structures.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent patch "tcp: Maintain dynamic metrics in local cache." introduced
an out of bounds access due to what appears to be a typo. I believe this
change should resolve the issue by replacing the access to RTAX_CWND with
TCP_METRIC_CWND.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the problem of unreachable mesh STA from
Distribution System (DS) due to the introduction of previous
patch solving the mesh STA joining from one MBSS to another
MBSS.
Reported-by: Georgiewskiy Yuriy <bottleman@icf.org.ru>
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To help debugging printed hex object use standard bluetooth
specifiers in hci_event. The patch changes format from 0x%04x to 0x%4.4x;
print manufacturer id and handle in hex instead of int; print opcode
always in 0x%4.4x format; status in 0x%2.2x.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
mld->mld_maxdelay is net endian, so we should use ntohs, not htons
CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
In driver reload test there is a memory leak.
The structure vlan_info was not freed when the driver was removed.
It was not released since the nr_vids var is one after last vlan was removed.
The nr_vids is one, since vlan zero is added to the interface when the interface
is being set, but the vlan zero is not deleted at unregister.
Fix - delete vlan zero when we unregister the device.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAk/2EMQACgkQpGgxIkP9cweoqgCeNGrHU9HxBnKXSylNcqhQBzqr
9jMAni+gJX+lzmrA2j1w/rCaamuNpbJG
=mXZq
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix a bug generated by the wrong interaction between the GW feature and the
Bridge Loop Avoidance
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defining a function with no parameters as 'T foo()' is the deprecated
K&R style, and is not strictly equivalent to defining it as 'T foo(void)'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing every writes to ipv4 metrics any longer.
PMTU is stored in rt->rt_pmtu.
Dynamic TCP metrics are stored in a special TCP metrics cache,
completely outside of the routes.
Therefore ->cow_metrics() can simply nothing more than a WARN_ON
trigger so we can catch anyone who tries to add new writes to
ipv4 route metrics.
Signed-off-by: David S. Miller <davem@davemloft.net>
Blackhole routes have a COW metrics operation that returns NULL
always, therefore this dst_copy_metrics() call did absolutely
nothing.
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't maintain it dynamically any longer, so reporting it would
be extremely misleading. Report zero instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain a local hash table of TCP dynamic metrics blobs.
Computed TCP metrics are no longer maintained in the route metrics.
The table uses RCU and an extremely simple hash so that it has low
latency and low overhead. A simple hash is legitimate because we only
make metrics blobs for fully established connections.
Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking. But
the basic design seems sound.
With help from Eric Dumazet and Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to cast return value of nla_data()
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
- set message type to RTM_NEWROUTE
- relate to original request by inheriting the sequence and port number.
- set NLM_F_MULTI because it's a dump and more messages will follow
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Also use nla_get_u32() instead of nla_memcpy() to access u32 attribtues.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
nlmsg_end() will take care of this when we finalize the message.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Correct places where CID and PSM were printed as int. For CID: 0x%4.4x
is used and for PSM: 0x%2.2x.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Avoid unneeded type conversion by correcting type specifiers in debug
statements for L2CAP.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
ieee802_1d_to_ac is defined as a const int[8],
but the tid parameter has a range from 0 to 15.
Cc: stable@vger.kernel.org
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The HCP message should be added to transmit queue, not the other way around.
Signed-off-by: Mathias Jeppsson <mathias.jeppsson@sonymobile.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
list_first_entry() will never return NULL. Instead use
list_for_each_entry_safe() to iterate through the list.
Signed-off-by: Mathias Jeppsson <mathias.jeppsson@sonymobile.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. This seems to be a copy-paste bug from a previous
debugging message, and so the meaningless value is just deleted.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.
But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap() & skb_update_prio()
With help from Gao Feng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices (e.g. Sony's PaSoRi) can not do type B polling, so we have
to make a distinction between ISO14443 type A and B poll modes.
Cc: Eric Lapuyade <eric.lapuyade@intel.com>
Cc: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The socket local pointer can be NULL when a socket is created but never
bound or connected.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When receiving such frame, the sockets waiting for a connection to finish
should be woken up. Connecting to an unbound LLCP service will trigger a
DM as a response.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the LLCP 16 local SAPs we can potentially quickly run out of source
SAPs for non well known services.
With the so called late binding we will reserve an SAP only when we actually
get a client connection for a local service. The SAP will be released once
the last client is gone, leaving it available to other services.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With not Well Known Services there is no guarantees as to which
SSAP the server will be listening on, so there is no reason to
support binding to a specific source SAP.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This patch fixes a typo and return the correct error when trying to
bind 2 sockets to the same service name.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The LLCP SAP should only be freed when the socket owning it is released.
As long as the socket is alive, the SAP should be reserved in order to
e.g. send the right wks array when bringing the MAC up.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the MAC link goes down, we should only keep the bound sockets
alive. They will be closed by sock_release or when the underlying
NFC device is moving away.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Drivers will need them before starting a poll or when being activated
as targets. Mostly WKS can have changed between device registration and
then so we need to re-build the whole array.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some NFC chips will statically create and open pipes for both standard
and proprietary gates. The driver can now pass this information to HCI
such that HCI will not attempt to create and open them, but will instead
directly use the passed pipe ids.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If the device is polling we sent a 0 target found event.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The semantics for a zero target found event is that the polling operation
could not complete.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
There can ever be only one call to nfc_targets_found() after polling
has been engaged. This could be from a target discovered event from
the driver, or from an error handler to notify poll will never complete.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If there is an ongoing HCI command executing, it will be completed,
thereby pushing the error up to the core. Otherwise, HCI will directly
notify the core with the error.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
HCI cmd can be completed either from an HCI response or from an
internal driver or HCI error. This requires to factorize the
completion code outside of the device lock.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This API should be used by drivers, HCI, SHDLC or NCI stacks to report an
unrecoverable error.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
An HCI command can complete either from an HCI response
(with an HCI result) or as a consequence of any other system
error during processing. The completion therefore needs to take
a standard errno code. The HCI response will convert its result
to a standard errno before calling the completion.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now report an ENOMEM error up to the HCI layer.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
nfc_hci_recv_frame can not be called with a NULL skb.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
shdlc reset may leave HCI in an inconsistent state by loosing parts of
HCI frames. Handle this case by reporting an unrecoverable error to HCI.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The questions asked in the comments have been answered and addressed.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If association failed due to internal error (e.g. no
supported rates IE), we call ieee80211_destroy_assoc_data()
with assoc=true, while we actually reject the association.
This results in the BSSID not being zeroed out.
After passing assoc=false, we no longer have to call
sta_info_destroy_addr() explicitly. While on it, move
the "associated" message after the assoc_success check.
Cc: stable@vger.kernel.org [3.4+]
Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
msp has type struct minstrel_ht_sta_priv not struct minstrel_ht_sta.
(This incorporates the fixup originally posted as "mac80211: fix kzalloc
memory corruption introduced in minstrel_ht". -- JWL)
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The management frame and remain-on-channel APIs will be
needed in the P2P device abstraction, so move them over
to the new wdev-based APIs. Userspace can still use both
the interface index and wdev identifier for them so it's
backward compatible, but for the P2P Device wdev it will
be able to use the wdev identifier only.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are a few places that iterate the wdev
list and assume wdev->netdev exists, check
there. The rfkill one has to be extended for
each non-netdev type later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since soon there will be virtual interfaces that
don't have a netdev, use the wdev identifier for
the API to retrieve interface data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some nl80211 callbacks will soon need the wdev instead
of the netdev, so add NL80211_FLAG_NEED_WDEV to allow
them to request that. Add NL80211_FLAG_NEED_WDEV_UP as
well which checks the netdev is UP if one exists.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to support a P2P device abstraction and
Bluetooth high-speed AMPs, we need to have a way
to identify virtual interfaces that don't have a
netdev associated.
Do this by adding a NL80211_ATTR_WDEV attribute
to identify a wdev which may or may not also be
a netdev.
To simplify things, use a 64-bit value with the
high 32 bits being the wiphy index for this new
wdev identifier in the nl80211 API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This API call was intended to be used by drivers
if they want to optimize key handling by removing
one key when another is added. Remove it since no
driver is using it. If needed, it can always be
added back.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ieee80211_mlme_notify_scan_completed() iterates all
interfaces and doesn't need to assign anything to
the sdata variable before the loop.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the AC parameters change, drivers might rely
on getting a bss_info_changed notification with
BSS_CHANGED_QOS in addition to the conf_tx call.
Always call the function when userspace updates
are made (in AP/GO modes) and also set the change
flag when updates were made by the AP (in managed
mode.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
* One to get the timeout special parameter for the SET target back working
(this was introduced while trying to fix another bug in 3.4) from
Jozsef Kadlecsik.
* One crash fix if containers and nf_conntrack are used reported by Hans
Schillstrom by myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "127f559 netfilter: ipset: fix timeout value overflow bug"
broke the SET target when no timeout was specified.
Reported-by: Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.
So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.
when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.
fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comments were wrong here because "AX25_MAX_DIGIS" is 8 but the
comments say 6. Also I've changed the "7" to "AX25_ADDR_LEN".
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix two netem bugs :
1) When a frame was dropped by tfifo_enqueue(), drop counter
was incremented twice.
2) When reordering is triggered, we enqueue a packet without
checking queue limit. This can OOM pretty fast when this
is repeated enough, since skbs are orphaned, no socket limit
can help in this situation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mark Gordon <msg@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While doing some recent work on sctp sack bundling I noted that
sctp_packet_append_chunk was pretty inefficient. Specifially, it was called
recursively while trying to bundle auth and sack chunks. Because of that we
call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4 times for
every call to sctp_packet_append_chunk, knowing that at least 3 of those calls
will do nothing.
So lets refactor sctp_packet_bundle_auth to have an outer part that does the
attempted bundling, and an inner part that just does the chunk appends. This
saves us several calls per iteration that we just don't need.
Also, noticed that the auth and sack bundling fail to free the chunks they
allocate if the append fails, so make sure we add that in
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when sending data over datagram, the send function will attempt to
allocate any size passed on from the userspace.
We should make sure that this size is checked and limited. We'll limit it
to the MTU of the device, which is checked later anyway.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quick fix for method being invoked without checking its existence.
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Because ieee80211_tx_status in status.c checks if
outgoing BlockAck requests have been acked, it is
necessary to tell the driver that tx feedback for
this sort of frame is important.
Otherwise, the stack will continue to send the same
BlockAck request over and over, which can cause
the receiver to flush or clean its reorder buffer
over and over.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Larry (and some others I think) reported that with
single-queue drivers mac80211 crashes when waking
the queues. This happens because we allocate just
a single queue for each virtual interface in case
the driver doesn't have at least 4 queues, but the
code stopping/waking the virtual interface queues
wasn't taking this into account.
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"
Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove redundant declarations, they belong in include/net/tcp.h
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the interfaces were removed just before a restart
work was started, open_count will be 0, and most of
the reconfig work will be skipped, including the
resetting of local->in_reconfig to false.
Leaving local->inconfig = true will result in
dropping any incoming packet.
Fix it by always setting local->in_reconfig = false
(even if there are no active interfaces).
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Due to the way the default key links are created,
it happens that a link is left dangling:
* both unicast/multicast links are created
* unicast link is destroyed, and the links
are updated
* during this update, adding the multicast
link again fails because it is present,
destroying the debugfs pointer
* removing the multicast link won't work as
the pointer has been destroyed
Fix this by always removing the links and then
re-creating them if needed.
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Reported-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.
It's just pure overhead.
Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.
Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to close a socket that is in the OPENING state. For
example, it can happen if ceph_con_close() is called on the con before
the TCP connection is established. con_work() will come around and shut
down the socket.
Signed-off-by: Sage Weil <sage@inktank.com>
Do not re-initialize the con on every connection attempt. When we
ceph_con_close, there may still be work queued on the socket (e.g., to
close it), and re-initializing will clobber the work_struct state.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This patch gathers a few small changes in "net/ceph/messenger.c":
out_msg_pos_next()
- small logic change that mostly affects indentation
write_partial_msg_pages().
- use a local variable trail_off to represent the offset into
a message of the trail portion of the data (if present)
- once we are in the trail portion we will always be there, so we
don't always need to check against our data position
- avoid computing len twice after we've reached the trail
- get rid of the variable tmpcrc, which is not needed
- trail_off and trail_len never change so mark them const
- update some comments
read_partial_message_bio()
- bio_iovec_idx() will never return an error, so don't bother
checking for it
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Currently a ceph connection enters a "CONNECTING" state when it
begins the process of (re-)connecting with its peer. Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information exhange has begun. The
CONNECTING bit/state continues to be set during this phase.
Rather than have the CONNECTING state continue while the NEGOTIATING
bit is set, interpret these two phases as distinct states. In other
words, when NEGOTIATING is set, clear CONNECTING. That way only
one of them will be active at a time.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There are two phases in the process of linking together the two ends
of a ceph connection. The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.
When initiating a connection, the client side now queues to send
its information for both phases of this process at the same time.
This is probably a bit more efficient, but it is slightly messier
from a layering perspective in the code.
So rearrange things so that the client doesn't send the connection
information until it has received and processed the response in the
initial banner phase (in process_banner()).
Move the code (in the (con->sock == NULL) case in try_write()) that
prepares for writing the connection information, delaying doing that
until the banner exchange has completed. Move the code that begins
the transition to this second "NEGOTIATING" phase out of
process_banner() and into its caller, so preparing to write the
connection information and preparing to read the response are
adjacent to each other.
Finally, preparing to write the connection information now requires
the output kvec to be reset in all cases, so move that into the
prepare_write_connect() and delete it from all callers.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There is no state explicitly defined when a ceph connection is fully
operational. So define one.
It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.
Be a little more careful when examining the old state when a socket
disconnect event is reported.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server). But that bit
is not cleared again--at least not until another connection attempt
is initiated.
Instead, clear it as soon as the connection is fully established.
Also, clear it when a socket connection gets prematurely closed
in the midst of establishing a ceph connection (in case we had
reached the point where it was set).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
A connection that is closed will no longer be connecting. So
clear the CONNECTING state bit in ceph_con_close(). Similarly,
if the socket has been closed we no longer are in connecting
state (a new connect sequence will need to be initiated).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.
Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing being done on a socket being shut down. We no longer set
SOCK_CLOSED in the socket event routine conditionally, so setting
that bit here no longer provides whatever benefit it might have
provided before.
A race condition could still leave the SOCK_CLOSED bit set even
after we've issued the call to con_close_socket(), so we still clear
that bit after shutting the socket down. Add a comment explaining
the reason for this.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
When a TCP_CLOSE or TCP_CLOSE_WAIT event occurs, the SOCK_CLOSED
connection flag bit is set, and if it had not been previously set
queue_con() is called to ensure con_work() will get a chance to
handle the changed state.
con_work() atomically checks--and if set, clears--the SOCK_CLOSED
bit if it was set. This means that even if the bit were set
repeatedly, the related processing in con_work() only gets called
once per transition of the bit from 0 to 1.
What's important then is that we ensure con_work() gets called *at
least* once when a socket close event occurs, not that it gets
called *exactly* once.
The work queue mechanism already takes care of queueing work
only if it is not already queued, so there's no need for us
to call queue_con() conditionally.
So this patch just makes it so the SOCK_CLOSED flag gets set
unconditionally in ceph_sock_state_change().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.
Changing connection information during handling of a socket event is
not very clean, so instead move this assignment inside con_work(),
where it can be done during normal connection-level processing (and
under protection of the connection mutex as well).
Move the handling of a socket closed event up to the top of the
processing loop in con_work(); there's no point in handling backoff
etc. if we have a newly-closed socket to take care of.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The following commit changed it so SOCK_CLOSED bit was stored in
a connection's new "flags" field rather than its "state" field.
libceph: start separating connection flags from state
commit 928443cd
That bit is used in con_close_socket() to protect against setting an
error message more than once in the socket event handler function.
Unfortunately, the field being operated on in that function was not
updated to be "flags" as it should have been. This fixes that
error.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Recently a bug was fixed in which the bio_iter field in a ceph
message was not being properly re-initialized when a message got
re-transmitted:
commit 43643528cc
Author: Yan, Zheng <zheng.z.yan@intel.com>
rbd: Clear ceph_msg->bio_iter for retransmitted message
We are now only initializing the bio_iter field when we are about to
start to write message data (in prepare_write_message_data()),
rather than every time we are attempting to write any portion of the
message data (in write_partial_msg_pages()). This means we no
longer need to use the msg->bio_iter field as a flag.
So just don't do that any more. Trust prepare_write_message_data()
to ensure msg->bio_iter is properly initialized, every time we are
about to begin writing (or re-writing) a message's bio data.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
If a message has a non-null bio pointer, its bio_iter field is
initialized in write_partial_msg_pages() if this has not been done
already. This is really a one-time setup operation for sending a
message's (bio) data, so move that initialization code into
prepare_write_message_data() which serves that purpose.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Move init_bio_iter() and iter_bio_next() up in their source file so
the'll be defined before they're needed.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
This is a nit, but prepare_write_message() sets the FOOTER_COMPLETE
flag before the CRC for the data portion (recorded in the footer)
has been completely computed. Hold off setting the complete flag
until we've decided it's ready to send.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In write_partial_msg_pages(), once all the data from a page has been
sent we advance to the next one. Put the code that takes care of
this into its own function.
While modifying write_partial_msg_pages(), make its local variable
"in_trail" be Boolean, and use the local variable "msg" (which is
just the connection's current out_msg pointer) consistently.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Move the code that prepares to write the data portion of a message
into its own function.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
If the gateway functionality is used, some broadcast packets (DHCP
requests) may be transmitted as unicast packets. As the bridge loop
avoidance code now only considers the payload Ethernet destination,
it may drop the DHCP request for clients which are claimed by other
backbone gateways, because it falsely infers from the broadcast address
that the right backbone gateway should havehandled the broadcast.
Fix this by checking and delegating the batman-adv packet type used
for transmission.
Reported-by: Guido Iribarren <guidoiribarren@buenosaireslibre.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
added a neighbour pointer to rt6_info. Currently we don't initialize
this pointer at allocation time. We assume this pointer to be valid
if it is not a null pointer, so initialize it on allocation.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
60g band uses different from .11n MCS scheme, so bitrate
should be calculated differently
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Until now, a u16 value was used to represent bitrate value.
With VHT bitrates this becomes too small.
Introduce a new 32-bit bitrate attribute. nl80211 will report
both the new and the old attribute, unless the bitrate doesn't
fit into the old u16 attribute in which case only the new one
will be reported.
User space tools encouraged to prefer the 32-bit attribute, if
available (since it won't be available on older kernels.)
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[reword commit message and comments a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts the commit cdf49c283e which
replaces lowpan '.ndo_set_mac_address' method by ethernet's one.
Accorind to the IEEE 802.15.4 standard, device has 8-byte length address,
so this hook loses the last 2 bytes which may rise a compatibility problems
with other IEEE 802.15.4 standard implementations.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most multi-queue networking driver consider the number of online cpus when
configuring RSS queues.
This patch adds a wrapper to the number of cpus, setting an upper limit on the
number of cpus a driver should consider (by default) when allocating resources
for his queues.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_compile() can avoid calling fib_compute_spec_dst()
by default, and perform the call only if needed.
David suggested to add a helper to make the call only once.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes for a simplified conversion away from dst_get_neighbour*().
All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().
Signed-off-by: David S. Miller <davem@davemloft.net>