The size of the snapshot has to be the same as the size of the region,
therefore no need to pass it again during snapshot creation. Remove the
arg and use region->size instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting from commit d41a69f1d3 ("tcp: make tcp_sendmsg() aware of socket backlog")
loopback flows got hurt, because for each skb sent, the socket receives an
immediate ACK and sk_flush_backlog() causes extra work.
Intent was to not let the backlog grow too much, but we went a bit too far.
We can check the backlog every 16 skbs (about 1MB chunks)
to increase TCP over loopback performance by about 15 %
Note that the call to sk_flush_backlog() handles a single ACK,
thanks to coalescing done on backlog, but cleans the 16 skbs
found in rtx rb-tree.
Reported-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kmap instead of page_address as it's not always in low memory.
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.
This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!
Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.
Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.
The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.
Fixes: 4f95dd78a7 ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Flows that are in teardown state (due to RST / FIN TCP packet) still
have their offload flag set on. Hence, the conntrack garbage collector
may race to undo the timeout adjustment that the fixup routine performs,
leaving the conntrack entry in place with the internal offload timeout
(one day).
Update teardown flow state to ESTABLISHED and set tracking to liberal,
then once the offload bit is cleared, adjust timeout if it is more than
the default fixup timeout (conntrack might already have set a lower
timeout from the packet path).
Fixes: da5984e510 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update conntrack entry to pick up expired flows, otherwise the conntrack
entry gets stuck with the internal offload timeout (one day). The TCP
state also needs to be adjusted to ESTABLISHED state and tracking is set
to liberal mode in order to give conntrack a chance to pick up the
expired flow.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.
[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[ 24.657858] Mem abort info:
[ 24.660686] ESR = 0x96000004
[ 24.663769] Exception class = DABT (current EL), IL = 32 bits
[ 24.669725] SET = 0, FnV = 0
[ 24.672804] EA = 0, S1PTW = 0
[ 24.675975] Data abort info:
[ 24.678880] ISV = 0, ISS = 0x00000004
[ 24.682743] CM = 0, WnR = 0
[ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[ 24.692207] [0000000000040434] pgd=0000000000000000
[ 24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 24.889414] Call trace:
[ 24.891870] __nf_tables_abort+0x3f0/0x7a0
[ 24.895984] nf_tables_abort+0x20/0x40
[ 24.899750] nfnetlink_rcv_batch+0x17c/0x588
[ 24.904037] nfnetlink_rcv+0x13c/0x190
[ 24.907803] netlink_unicast+0x18c/0x208
[ 24.911742] netlink_sendmsg+0x1b0/0x350
[ 24.915682] sock_sendmsg+0x4c/0x68
[ 24.919185] ___sys_sendmsg+0x288/0x2c8
[ 24.923037] __sys_sendmsg+0x7c/0xd0
[ 24.926628] __arm64_sys_sendmsg+0x2c/0x38
[ 24.930744] el0_svc_common.constprop.0+0x94/0x158
[ 24.935556] el0_svc_handler+0x34/0x90
[ 24.939322] el0_svc+0x8/0xc
[ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---
Fixes: f6ac858589 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.
Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.
Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.
Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.
Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).
See commit 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.
v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]
They are not used since commit 77ddaff218 ("fq_codel: Kill
useless per-flow dropped statistic")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.
We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface. The clones are dropped at reception
and have no functional role except making the traffic visible.
The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.
v3 - Function dev_nit_active used to check for network taps.
- Procedure netif_rx_ni used to send cloned messages to loopback device.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In error case, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It provide a callback list to find the blocks of tc
and nft subsystems
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
of passing it to the stack immediately, place it on a list in the napi
struct. Then, at flush time (napi_complete_done(), napi_poll(), or
napi_busy_loop()), call netif_receive_skb_list_internal() on the list.
We'd like to do that in napi_gro_flush(), but it's not called if
!napi->gro_bitmask, so we have to do it in the callers instead. (There are
a handful of drivers that call napi_gro_flush() themselves, but it's not
clear why, or whether this will affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
list whenever it exceeds gro_normal_batch, a new net/core sysctl that
defaults to 8.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit d4289fcc9b ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).
By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e334 ("Revert
"ipv4: use skb coalescing in defragmentation"")).
Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.
When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.
Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.
This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.
[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1
Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- Replace usage of strlcpy with strscpy, by Sven Eckelmann
- Add OGMv2 per-interface queue and aggregations, by Linus Luessing
(2 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHekWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoW8TEACox7vhtW4MS8QCzPySCU3f8V6m
f+2PPlUbM4CqXcOPGw/jQng6PAtcb4gNPssm52GaxdeB9jFqkI/ELdSn5mCh+EcG
QRrhf5DVruYyqBU2gNhovEe7SlJl8IJno6kFdAggaMngXnvlBzIr7n4FMIUKNFYn
6kFbA8pugBXXvhiRcuzs+l5iUxdKUxTsUNPyppyqnqb8lrb0/30/681dfq87PmcV
zehEf8Ry23W7CVQv6YougVJvK0GUwysULsvm8Wc8FsOke7CeeQIPLEF2Pcrl/CFM
mfynXVXngE41MPBC59eUcWBGlRYwkuwm4Q+YQ8OUjr5+X5YP06jR5Dh8u6KVAMuy
QWGSwyrlXSsCE6BTxoijdJqsLzHDXCmYY0GQI2tEMCyDnL95CU3tuTk4vckusuf+
NlhHv7m+Bo0w9ztDUBifzNyURW9VgUCoOZfW9rdYRWjjN8Oe6wnMWCFttGnkg0qu
zrCJn5mGvz3Vp434K0uGY1wOZincSdM6grBSgmZv1UNMaBdlfNroJsUqvz6IE5Fe
iI5kqRXoUG+ftfwacgyFEK08HpcZHvJkNDiHHRlOPPZm75yE7mwreVUrGRaibywQ
pzTEwM+H2MX9xF+osjiPVc197fxpnkX9fI2LhDOEiblaJbdiIxd+cgWRMFS1YPvd
ANaFfAh58+gDWcRqhg==
=Cbzf
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Replace usage of strlcpy with strscpy, by Sven Eckelmann
- Add OGMv2 per-interface queue and aggregations, by Linus Luessing
(2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann
- Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHO0WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobHvD/9RQLU/2zrbq22QIhLwEt2DeGg3
Z//qHgK/LhDnLJRHA0F14NlA/4vPtzbfIX8m0Ai/mrPrk0naDKEeM6TSBw8IHQEz
bsibnHkwB+cdgqGtVKFEHqL5seaD5Nnlu89xIvgTyeImq2Zd7heOphN8lLmJPdem
3TugLIkbEqO4P+KU8fdjuA+N4GcRzaArsendH+3z4LrlXJnh4YWvook4/gSjQwWe
ydF+e2ktZvNEj8Xl5vsDpykBEFll+k4BThW8toIMx4FCLiw1Hqedp/agwIbs5Tjc
ccdYa/wuKYUAwz5rsrqgx3FpH6z0w6h0b3AB++cLaahmas9qx/K7lT68PsJcPz0E
qhZmbw0snRO/N+TZQPVd1B9i0H3efAaQzzUu/xfL2RRA91ztB5O2+3HZF3PBzw/J
u1Ewt56vv4LV0QmeQLzczKMBS47DPoARk8mLOx0dpGl6vvkQYWBFbn+gk/+uPalk
K15Yozd3h/nSEX3SWk50X3+AU/p8l7rAYcrOrpWsRmQGl06XBbRDr9mRijMoHnTE
LM+AKiZLvOS1Q/UT/C0YwL2i665aG3NqzqMXq+0Mtciu6Lsefu8+JOZFxcK4GXGU
XJ+Kkygf/3/5dhu3Nxm4GchQQSLVJRn/i0nCKJeWS/Qxdvcalk8lPQPrX2a2l6GY
kW2KxW/KjwayzLpwSQ==
=g2Gm
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20190808' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix netlink dumping of all mcast_flags buckets, by Sven Eckelmann
- Fix deletion of RTR(4|6) mcast list entries, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.
The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).
However, when dropping the initial link-local frame, its backing memory
was being leaked.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.
Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed. That is the job of kfree_skb(), which
also calls skb_unref().
But it turns out that freeing the cached stampable_skb is in fact not
necessary. It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp. So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.
Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC mirred actions (redirect and mirred) can send to egress or ingress of a
device. Currently only egress is used for hw offload rules.
Modify the intermediate representation for hw offload to include mirred
actions that go to ingress. This gives drivers access to such rules and
can decide whether or not to offload them.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC rules can impliment skbedit actions. Currently actions that modify the
skb mark are passed to offloading drivers via the hardware intermediate
representation in the flow_offload API.
Extend this to include skbedit actions that modify the packet type of the
skb. Such actions may be used to set the ptype to HOST when redirecting a
packet to ingress.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is almost impossible to get anything other than a 0 out of
flow->dropped statistic with a tc class dump, as it resets to 0
on every round.
It also conflates ecn marks with drops.
It would have been useful had it kept a cumulative drop count, but
it doesn't. This patch doesn't change the API, it just stops
tracking a stat and state that is impossible to measure and nobody
uses.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the field fq_codel is often used with a smaller memory or
packet limit than the default, and when the bulk dropper is hit,
the drop pattern bifircates into one that more slowly increases
the codel drop rate and hits the bulk dropper more than it should.
The scan through the 1024 queues happens more often than it needs to.
This patch increases the codel count in the bulk dropper, but
does not change the drop rate there, relying on the next codel round
to deliver the next packet at the original drop rate
(after that burst of loss), then escalate to a higher signaling rate.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the CPU port registers dump into the master interface registers
dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
structures of the CPU port into the dump.
drvinfo->regdump_len will contain the full data length, while regs->len
will contain only the master interface registers dump length.
This allows for example to dump the CPU port registers on a ZII Dev
C board like this:
# ethtool -d eth1
0x004: 0x00000000
0x008: 0x0a8000aa
0x010: 0x01000000
0x014: 0x00000000
0x024: 0xf0000102
0x040: 0x6d82c800
0x044: 0x00000020
0x064: 0x40000000
0x084: RCR (Receive Control Register) 0x47c00104
MAX_FL (Maximum frame length) 1984
FCE (Flow control enable) 0
BC_REJ (Broadcast frame reject) 0
PROM (Promiscuous mode) 0
DRT (Disable receive on transmit) 0
LOOP (Internal loopback) 0
0x0c4: TCR (Transmit Control Register) 0x00000004
RFC_PAUSE (Receive frame control pause) 0
TFC_PAUSE (Transmit frame control pause) 0
FDEN (Full duplex enable) 1
HBC (Heartbeat control) 0
GTS (Graceful transmit stop) 0
0x0e4: 0x76735d6d
0x0e8: 0x7e9e8808
0x0ec: 0x00010000
.
.
.
88E6352 Switch Port Registers
------------------------------
00: Port Status 0x4d04
Pause Enabled 0
My Pause 1
802.3 PHY Detected 0
Link Status Up
Duplex Full
Speed 100 or 200 Mbps
EEE Enabled 0
Transmitter Paused 0
Flow Control 0
Config Mode 0x4
01: Physical Control 0x003d
RGMII Receive Timing Control Default
RGMII Transmit Timing Control Default
200 BASE Mode 100
Flow Control's Forced value 0
Force Flow Control 0
Link's Forced value Up
Force Link 1
Duplex's Forced value Full
Force Duplex 1
Force Speed 100 or 200 Mbps
.
.
.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add various extack messages to make drop_monitor more user friendly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove multiple blank lines which are visually annoying and useless.
This suppresses the "Please don't use multiple blank lines" checkpatch
messages.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and
'send_timer' fields can be accessed concurrently by the CPU sending the
netlink notification to user space from the workqueue and the CPU
tracing kfree_skb(). This spinlock is meant to protect against that.
Document its scope and suppress the checkpatch message "spinlock_t
definition without comment".
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'trace_state_mutex' does not only protect the global 'trace_state'
variable, but also the global 'hw_stats_list'.
Subsequent patches are going add more operations from user space to
drop_monitor and these all need to be mutually exclusive.
Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and
document its scope.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error code 'ENOTSUPP' is reserved for use with NFS. Use 'EOPNOTSUPP'
instead.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This Kconfig option is unused, drop it.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).
For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
the br_vlan_bridge_event stub when bridge vlans are disabled
[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389
Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.
Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).
Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cd ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the unlock path in __ip6_rt_update_pmtu by using a
single point where rcu_read_unlock is called.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nexthop path in rt6_update_exception_stamp_rt needs to call
rcu_read_unlock if it fails to find a fib6_nh match rather than
just returning.
Fixes: e659ba31d8 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like we were slightly overzealous with the shutdown()
cleanup. Even though the sock->sk_state can reach CLOSED again,
socket->state will not got back to SS_UNCONNECTED once
connections is ESTABLISHED. Meaning we will see EISCONN if
we try to reconnect, and EINVAL if we try to listen.
Only listen sockets can be shutdown() and reused, but since
ESTABLISHED sockets can never be re-connected() or used for
listen() we don't need to try to clean up the ULP state early.
Fixes: 32857cf57f ("net/tls: fix transition through disconnect with close")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When generic-XDP was moved to a later processing step by commit
458bf2f224 ("net: core: support XDP generic on stacked devices.")
a regression was introduced when using bpf_xdp_adjust_head.
The issue is that after this commit the skb->network_header is now
changed prior to calling generic XDP and not after. Thus, if the header
is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
also need to be updated again. Fix by calling skb_reset_network_header().
Fixes: 458bf2f224 ("net: core: support XDP generic on stacked devices.")
Reported-by: Brandon Cazander <brandon.cazander@multapplied.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently init call of all actions (except ipt) init their 'parm'
structure as a direct pointer to nla data in skb. This leads to race
condition when some of the filter actions were initialized successfully
(and were assigned with idr action index that was written directly
into nla data), but then were deleted and retried (due to following
action module missing or classifier-initiated retry), in which case
action init code tries to insert action to idr with index that was
assigned on previous iteration. During retry the index can be reused
by another action that was inserted concurrently, which causes
unintended action sharing between filters.
To fix described race condition, save action idr index to temporary
stack-allocated variable instead on nla data.
Fixes: 0190c1d452 ("net: sched: atomically check-allocate action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes the previously added 'encap test' pass.
Because its possible that the xfrm dst entry becomes stale while such
a flow is offloaded, we need to call dst_check() -- the notifier that
handles this for non-tunneled traffic isn't sufficient, because SA or
or policies might have changed.
If dst becomes stale the flow offload entry will be tagged for teardown
and packets will be passed to 'classic' forwarding path.
Removing the entry right away is problematic, as this would
introduce a race condition with the gc worker.
In case flow is long-lived, it could eventually be offloaded again
once the gc worker removes the entry from the flow table.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of transmitting individual OGMv2 packets from the aggregation
queue merge those OGMv2 packets into a single one and transmit this
aggregate instead.
This reduces overhead as it saves an ethernet header and a transmission
per aggregated OGMv2 packet.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
In preparation for the OGMv2 packet aggregation, hold OGMv2 packets for
up to BATADV_MAX_AGGREGATION_MS milliseconds (100ms) on per
hard-interface queues, before transmitting.
This allows us to later squash multiple OGMs into a single frame
and transmission for reduced overhead.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The ipvs module parse the user buffer and save it to sysctl,
then check if the value is valid. invalid value occurs
over a period of time.
Here, I add a variable, struct ctl_table tmp, used to read
the value from the user buffer, and save only when it is valid.
I delete proc_do_sync_mode and use extra1/2 in table for the
proc_dointvec_minmax call.
Fixes: f73181c828 ("ipvs: add support for sync threads")
Signed-off-by: Junwei Hu <hujunwei4@huawei.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use shared sysctl variables for zero and one constants, as in commit
eec4844fae ("proc/sysctl: add shared variables for range check")
Fixes: 8f14c99c7e ("netfilter: conntrack: limit sysctl setting for boolean options")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After introduce "mss_encode" field in the synproxy_options struct the field
"mss" is a little confusing. It has been renamed to "mss_option".
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):
1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.
After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.
Concurrently, hvs_close_connection() runs in another thread:
calls vsock_remove_sock() to decrease the refcnt by 2;
call sock_put() to decrease the refcnt to 0, and free the sk;
next, the "release_sock(sk)" may hang due to use-after-free.
In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.
The issue can be resolved if an extra reference is taken when the
connection is established.
Fixes: a9eeb998c2 ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 365ad353c2 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.
Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.
We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.
Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
broke older tipc tools that use compat interface (e.g. tipc-config from
tipcutils package):
% tipc-config -p
operation not supported
The commit started to reject TIPC netlink compat messages that do not
have attributes. It is too restrictive because some of such messages are
valid (they don't need any arguments):
% grep 'tx none' include/uapi/linux/tipc_config.h
#define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */
#define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */
#define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */
#define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */
#define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */
#define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */
#define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */
#define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */
This patch relaxes the original fix and rejects messages without
arguments only if such arguments are expected by a command (reg_type is
non zero).
Fixes: 2753ca5d90 ("tipc: fix uninit-value in tipc_nl_compat_doit")
Cc: stable@vger.kernel.org
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When permanent entries were introduced by the commit below, they were
exempt from timing out and thus igmp leave wouldn't affect them unless
fast leave was enabled on the port which was added before permanent
entries existed. It shouldn't matter if fast leave is enabled or not
if the user added a permanent entry it shouldn't be deleted on igmp
leave.
Before:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
< join and leave 229.1.1.1 on eth4 >
$ bridge mdb show
$
After:
$ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
$ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
< join and leave 229.1.1.1 on eth4 >
$ bridge mdb show
dev br0 port eth4 grp 229.1.1.1 permanent
Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* lots more HE (802.11ax) support, particularly things
relevant for the the AP side, but also mesh support
* debugfs cleanups from Greg
* some more work on extended key ID
* start using genl parallel_ops, as preparation for
weaning ourselves off RTNL and getting parallelism
* various other changes all over
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BtsAACgkQB8qZga/f
l8QTdQ//bD3NLitiIW4rBmC/w9str2TRpUbRN8Hf9xTnRh1smLGQop8jI8xpYdAE
hXFlWn4glCkTkZrqtT8IDMcERb2YPHI94L7lMR1L6dXH7e1tiBZ7Qum5+rojLUCQ
/XXZJnVa9AbdzBDtkhzCjkN8dCldMnkId0m6vkGdFre/b70FLDg2GlolqIchbDCz
GxeKaw/uTLwu9nxEJhspDmiXDQtQd1ZsGNZRrovQ2nUuUfvX9sU54OmaoDHhqrUM
iSwFZJAhrCV7nidOLs2X1rVs8hRymbrYnGu6NdUlfTQMqaOVgWhqniM1RMnJnnVi
Ec/1OQg4KIj2rjJIXW/3QXAzAO6TDwV8xNk9al3m+yAkkSXR7tXP01plNUoMU4UW
YxcaPD341H919GIqa71lieeEMyi7XIKKFfaZvGFLiDzidKKi0JkDuC+1w5UOJrbj
PM/dzvKqvwOMEa5XahjrUQLsiGMgxBkDo36+F4rfaelzDKPuJBOokDS23vg/lYf5
2v74WbeJVc45cAhSwp5/0RTsG0xhmFImfrE66VNB1vcJUVLvHTuATfXsBbRahOz/
SglGBAGqigxJwIeYfgec3lPNfdV+0LwYnPAEjBIjCO8qlGvst4jLroHg7ayApr2E
JfUbaWtpEb9cgAz7X6tEjv/BAnsRWPzpxb8Nm6JoZttTtA2VakE=
=eUnh
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
We have a reasonably large number of changes:
* lots more HE (802.11ax) support, particularly things
relevant for the the AP side, but also mesh support
* debugfs cleanups from Greg
* some more work on extended key ID
* start using genl parallel_ops, as preparation for
weaning ourselves off RTNL and getting parallelism
* various other changes all over
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* revert NETIF_F_LLTX usage as it caused problems
* avoid warning on WMM parameters from AP that are too short
* fix possible null-ptr dereference in hwsim
* fix interface combinations with 4-addr and crypto control
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BfGcACgkQB8qZga/f
l8TE/A/9G98DIi7UpuseyrW1y+3Bab5G+N4JhNDYMF0fC+f+6VZPsQRQi0aH2GVE
fBDnFQogiqIY8WwgQ5RoyUjOux43INpNp3uPjOCzcdjj9+D+lBX/27yhNVBVYK5H
Uxq08W2Mv+GWcswa5NMGUF9FlARbIMSu1BsDML031LzSckNvSLwMuANSx/1/d/gr
rUa1waaGeC2yT/baefNaHDiqYBj/6Xm6U0Adl1PphGJZ587UjlylCwozNNIrUIwW
PeysRXIUakePEqNs0Kl6U6lUJoH3g1KB+fLl5ea94L2aBHHRRaBAG1yROidg1/JG
fmz8jostoasG+wuycNSdETVgY3TT37Hv0R7rRB0Wj5QvrdXqgyrAZSR97dMWR9wR
fZLZCXtYnxOVZwKUlDusSjDFm9CKdhM+KqINHXBa6RRWpQjFYn4AvbQlJEZrbyuJ
lBOvXAvmF1fsX5x9hmDLRM7Irtb9Awb7eHbT+T5c+BtWKvnR2GsujMy2f6TY+swN
WPmER6MtQ80VPk9xpLVxU02vLs0EIdoPxE7I348ozDmW6sln0kRI1Pp7uOqDO1RC
sX6uA8bSTp9u2tKsgevkHqiz3YiJOqmy475tcOg9F/CefsULZzVfBE0bsU4v0jnJ
LTXJS2GtmPhG4AJFn+qiqvvOzJGhAJAEsvmPvUgH0AmB4m0+Cqg=
=rE1e
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a few fixes:
* revert NETIF_F_LLTX usage as it caused problems
* avoid warning on WMM parameters from AP that are too short
* fix possible null-ptr dereference in hwsim
* fix interface combinations with 4-addr and crypto control
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree:
1) memleak in ebtables from the error path for the 32/64 compat layer,
from Florian Westphal.
2) Fix inverted meta ifname/ifidx matching when no interface is set
on either from the input/output path, from Phil Sutter.
3) Remove goto label in nft_meta_bridge, also from Phil.
4) Missing include guard in xt_connlabel, from Masahiro Yamada.
5) Two patch to fix ipset destination MAC matching coming from
Stephano Brivio, via Jozsef Kadlecsik.
6) Fix set rename and listing concurrency problem, from Shijie Luo.
Patch also coming via Jozsef Kadlecsik.
7) ebtables 32/64 compat missing base chain policy in rule count,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of HE AP-STA link, ieee80211_send_nullfunc() will not
send the QOS NULL packet to check if AP is still associated.
In this case, probe_send_count will be non-zero and
ieee80211_sta_work() will later disassociate the AP, even
though no packet was ever sent.
Fix this by decrementing probe_send_count and not calling
ieee80211_send_nullfunc() in case of HE link, so that we
still wait for some time for the AP beacon to reappear and
don't disconnect right away.
Signed-off-by: Shay Bar <shay.bar@celeno.com>
Link: https://lore.kernel.org/r/20190703131848.22879-1-shay.bar@celeno.com
[clarify commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Store the OBSS PD parameters inside bss_conf when bringing up an AP and/or
when a station connects to an AP. This allows the driver to configure the
HW accordingly.
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-3-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a strict start type so all new attributes starting from
NL80211_ATTR_HE_OBSS_PD are validated strictly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the data structure, policy and parsing code allowing userland to send
the OBSS PD information into the kernel.
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190730163701.18836-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This came up in fuzz testing, and really we don't consider
all-zeroes to be a valid MAC address in most places, so
also reject it here to avoid confusion later on.
Signed-off-by: Karthikeyan Periyasamy <periyasa@codeaurora.org>
Link: https://lore.kernel.org/r/1563959770-21570-1-git-send-email-periyasa@codeaurora.org
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.
Reviewed-By: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190729143109.18683-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The return from ieee80211_get_sband can potentially be a null pointer, so
it seems prudent to add a null check to avoid a null pointer dereference
on sband.
Addresses-Coverity: ("Dereference null return")
Fixes: 2ab4587675 ("mac80211: add support for the ADDBA extension element")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20190730143205.14261-1-colin.king@canonical.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This allows us to call this function before an SKB has been
allocated.
Signed-off-by: Petar Penkov <ppenkov@google.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add DSA tag code for Microchip KSZ8795 switch. The switch is simpler
and the tag is only 1 byte, instead of 2 as is the case with KSZ9477.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now we are able to split packets, we can avoid limiting
their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
packet size.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the packets to sent to the guest are bigger than the buffer
available, we can split them, using multiple buffers and fixing
the length in the packet header.
This is safe since virtio-vsock supports only stream sockets.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
the same spinlock also if we are in the TX path.
Move also buf_alloc under the same lock.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to reduce the number of credit update messages,
we send them only when the space available seen by the
transmitter is less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).
The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.
This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.
Guillaume Nault adds:
And the implementation was broken until 2016 (see 29e73269aa ("pppoe:
fix reference counting in PPPoE proxy")), and nobody ever noticed. I
should probably have removed this ioctl entirely instead of fixing it.
Clearly, it has never been used.
Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.
All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.
This should apply to all stable kernels.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it adds a peer with the other addr into the
asoc after this asoc is created with the 1st addr.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this function factored out from sctp_sendmsg_new_asoc() and
__sctp_connect(), it creates the asoc and adds a peer with the
1st addr.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc.
To factor out common functions, this patch is to clean up their code
to make them look more similar:
1. create the asoc and add a peer with the 1st addr.
2. add peers with the other addrs into this asoc one by one.
while at it, also remove the unused 'addrcnt'.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
sctp_inet_connect(), the latter has done addr_size check with size
of sa_family_t.
In the next patch to clean up __sctp_connect(), we will remove
addr_size check with size of sa_family_t from __sctp_connect()
for the 1st address.
So before doing that, __sctp_setsockopt_connectx() should do
this check first, as sctp_inet_connect() does.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'addr' passed to sctp_transport_init is not always a whole size
of union sctp_addr, like the path:
sctp_sendmsg() ->
sctp_sendmsg_new_asoc() ->
sctp_assoc_add_peer() ->
sctp_transport_new() -> sctp_transport_init()
In the next patches, we will also pass the address length of data
only to sctp_assoc_add_peer().
So sctp_transport_init() should copy the only available data from
addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
cause slab-out-of-bounds.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rxkad sometimes triggers a warning about oversized stack frames when
building with clang for a 32-bit architecture:
net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=]
net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=]
The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in
rxkad_verify_packet()/rxkad_secure_packet() with the relatively large
scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt().
The warning does not show up when using gcc, which does not inline the
functions as aggressively, but the problem is still the same.
Allocate the cipher buffers from the slab instead, caching the allocated
packet crypto request memory used for DATA packet crypto in the rxrpc_call
struct.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the fact that a notification isn't sent to the recvmsg side to indicate
a call failed when sendmsg() fails to transmit a DATA packet with the error
ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.
Without this notification, the afs client just sits there waiting for the
call to complete in some manner (which it's not now going to do), which
also pins the rxrpc call in place.
This can be seen if the client has a scope-level IPv6 address, but not a
global-level IPv6 address, and we try and transmit an operation to a
server's IPv6 address.
Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.
Fixes: c54e43d752 ("rxrpc: Fix missing start of call timeout")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby
rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces
the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which
the tries to take the already held lock.
Fix this by providing a version of rxrpc_put_peer() that can be called in
situations where the lock is already held.
The bug may produce the following lockdep report:
============================================
WARNING: possible recursive locking detected
5.2.0-next-20190718 #41 Not tainted
--------------------------------------------
kworker/0:3/21678 is trying to acquire lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
__rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435
but task is already holding lock:
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh
/./include/linux/spinlock.h:343 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline]
00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at:
rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430
Fixes: 330bdcfadc ("rxrpc: Fix the keepalive generator [ver #2]")
Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Revert this for now, it has been reported multiple times that it
completely breaks connectivity on various devices.
Cc: stable@vger.kernel.org
Fixes: 8dbb000ee7 ("mac80211: set NETIF_F_LLTX when using intermediate tx queues")
Reported-by: Jean Delvare <jdelvare@suse.de>
Reported-by: Peter Lebbing <peter@digitalbrains.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jozsef Kadlecsik says:
====================
ipset patches for the nf tree
- When the support of destination MAC addresses for hash:mac sets was
introduced, it was forgotten to add the same functionality to hash:ip,mac
types of sets. The patch from Stefano Brivio adds the missing part.
- When the support of destination MAC addresses for hash:mac sets was
introduced, a copy&paste error was made in the code of the hash:ip,mac
and bitmap:ip,mac types: the MAC address in these set types is in
the second position and not in the first one. Stefano Brivio's patch
fixes the issue.
- There was still a not properly handled concurrency handling issue
between renaming and listing sets at the same time, reported by
Shijie Luo.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ebtables doesn't include the base chain policies in the rule count,
so we need to add them manually when we call into the x_tables core
to allocate space for the comapt offset table.
This lead syzbot to trigger:
WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649
Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
Fixes: 2035f3ff8e ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 60649d4e0a ("can: remove obsolete empty ioctl() handler") replaced the
almost empty can_ioctl() function with sock_no_ioctl() which always returns
-EOPNOTSUPP.
Even though we don't have any ioctl() functions on socket/network layer we need
to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX
to the network driver layer.
This patch fixes the wrong return codes in the CAN network layer protocols.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 60649d4e0a ("can: remove obsolete empty ioctl() handler")
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IS_ERR() already calls unlikely(), so this extra unlikely() call
around IS_ERR() is not needed.
Signed-off-by: Enrico Weigelt <info@metux.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A common pattern when using xdp_redirect_map() is to create a device map
where the lookup key is simply ifindex. Because device maps are arrays,
this leaves holes in the map, and the map has to be sized to fit the
largest ifindex, regardless of how many devices actually are actually
needed in the map.
This patch adds a second type of device map where the key is looked up
using a hashmap, instead of being used as an array index. This allows maps
to be densely packed, so they can be smaller.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.
This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.
Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
ip netns exec A ipset create test_hash hash:ip,mac
ip netns exec A ipset add test_hash 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset correctly matches a test packet:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Mark switch cases where we are expecting to fall through.
This patch fixes the following warnings:
net/iucv/af_iucv.c: warning: this statement may fall
through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6
Notice that, in this particular case, the code comment is
modified in accordance with what GCC is expecting to find.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On initialization failure we have to delete the local fdb which was
inserted due to the default pvid creation. This problem has been present
since the inception of default_pvid. Note that currently there are 2 cases:
1) in br_dev_init() when br_multicast_init() fails
2) if register_netdevice() fails after calling ndo_init()
This patch takes care of both since br_vlan_flush() is called on both
occasions. Also the new fdb delete would be a no-op on normal bridge
device destruction since the local fdb would've been already flushed by
br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
called last when adding a port thus nothing can fail after it.
Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
Fixes: 5be5a2df40 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dequeue_func(), there is an if statement on line 74 to check whether
skb is NULL:
if (skb)
When skb is NULL, it is used on line 77:
prefetch(&skb->end);
Thus, a possible null-pointer dereference may occur.
To fix this bug, skb->end is used when skb is not NULL.
This bug is found by a static analysis tool STCheck written by us.
Fixes: 76e3cc126b ("codel: Controlled Delay AQM")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
attribute being present, but doesn't check whether it is actually
provided by the user. Same goes for nfc_genl_fw_download() and
NFC_ATTR_FIRMWARE_NAME.
This patch adds appropriate checks.
Found with syzkaller.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HE allows peers to negotiate the aggregation fragmentation level to be used
during transmission. The level can be 1-3. The Ext element is added behind
the ADDBA request inside the action frame. The responder will then reply
with the same level or a lower one if the requested one is not supported.
This patch only handles the negotiation part as the ADDBA frames get passed
to the ATH11k firmware, which does the rest of the magic for us aswell as
generating the requests.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190729104512.27615-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Upon a successful assoc a station shall store the content of the HE
operation element inside bss_conf so that the driver can setup the
hardware accordingly.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20190729102342.8659-2-john@phrozen.org
[use struct copy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Depending on the regulatory domain, leaving a DFS channel requires
a new CAC to be performed when returning back to that channel.
If needed, update dfs states after a driver channel switch.
Signed-off-by: Michael Vassernis <michael.vassernis@tandemg.com>
Link: https://lore.kernel.org/r/20190729060024.5660-1-michael.vassernis@tandemg.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When associated BSS completes channel switch procedure, its channel
record needs to be updated. The existing mac80211 solution was
extended to cfg80211 in commit 5dc8cdce1d ("mac80211/cfg80211:
update bss channel on channel switch").
However that solution still appears to be incomplete as it may lead
to duplicated scan entries for associated BSS after channel switch.
The root cause of the problem is as follows. Each BSS entry is
included into the following data structures:
- bss list rdev->bss_list
- bss search tree rdev->bss_tree
Updating BSS channel record without rebuilding bss_tree may break
tree search since cmp_bss considers all of the following: channel,
bssid, ssid. When BSS channel is updated, but its location in bss_tree
is not updated, then subsequent search operations may fail to locate
this BSS since they will be traversing bss_tree in wrong direction.
As a result, for scan performed after associated BSS channel switch,
cfg80211_bss_update may add the second entry for the same BSS to both
bss_list and bss_tree, rather then update the existing one.
To summarize, if BSS channel needs to be updated, then bss_tree should
be rebuilt in order to put updated BSS entry into a proper location.
This commit suggests the following straightforward solution:
- if new entry has been already created for BSS after channel switch,
then use its IEs to update known BSS entry and then remove new
entry completely
- use rb_erase/rb_insert_bss reinstall updated BSS in bss_tree
- for nontransmit BSS entry, the whole transmit BSS hierarchy
is updated
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20190726163922.27509-3-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch implements minor refactoring for cfg80211_bss_update function.
Code path for updating known BSS is extracted into dedicated
cfg80211_update_known_bss function.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20190726163922.27509-2-sergey.matyukevich.os@quantenna.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In a very similar spirit to commit c470bdc1aa ("mac80211: don't WARN
on bad WMM parameters from buggy APs"), an AP may not transmit a
fully-formed WMM IE. For example, it may miss or repeat an Access
Category. The above loop won't catch that and will instead leave one of
the four ACs zeroed out. This triggers the following warning in
drv_conf_tx()
wlan0: invalid CW_min/CW_max: 0/0
and it may leave one of the hardware queues unconfigured. If we detect
such a case, let's just print a warning and fall back to the defaults.
Tested with a hacked version of hostapd, intentionally corrupting the
IEs in hostapd_eid_wmm().
Cc: stable@vger.kernel.org
Signed-off-by: Brian Norris <briannorris@chromium.org>
Link: https://lore.kernel.org/r/20190726224758.210953-1-briannorris@chromium.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ipv6_find_hdr() prints a non-rate limited error message
when it cannot find an ipv6 header at a specific offset.
This could be used as a DoS, so just remove it.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_rdma_cm_event_handler_cmn(), there are some if statements to
check whether conn is NULL, such as on lines 65, 96 and 112.
But conn is not checked before being used on line 108:
trans->cm_connect_complete(conn, event);
and on lines 140-143:
rdsdebug("DISCONNECT event - dropping connection "
"%pI6c->%pI6c\n", &conn->c_laddr,
&conn->c_faddr);
rds_conn_drop(conn);
Thus, possible null-pointer dereferences may occur.
To fix these bugs, conn is checked before being used.
These bugs are found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable bucket is being initialized with a value that is never
read and it is being updated later with a new value in a following
for-loop. The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
can cause a possible use-after-free accessing iph/ipv6h pointer
since the packet will be 'uncloned' running pskb_expand_head if
it is a cloned gso skb.
Fixes: 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include wiphy address setup in wiphy dumps and new wiphy events. The
wiphy permanent address is exposed as ATTR_MAC. If addr_mask is setup,
then it is included as ATTR_MAC_MASK attribute. If multiple addresses
are available, then their are exposed in a nested ATTR_MAC_ADDRS array.
This information is already exposed via sysfs, but it makes sense to
include it in the wiphy dump as well.
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Link: https://lore.kernel.org/r/20190722113312.14031-3-denkenz@gmail.com
[use just nla_nest_start(), this is new functionality]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement the basics required for supporting high efficiency with mesh:
include HE information elements in beacons, probe responses, and peering
action frames, and check for compatible HE configurations when peering.
Signed-off-by: Sven Eckelmann <seckelmann@datto.com>
Forwarded: https://patchwork.kernel.org/patch/11029299/
Link: https://lore.kernel.org/r/20190724163359.3507-2-sven@narfation.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add IEEE80211_KEY_FLAG_GENERATE_MMIE flag to ieee80211_key_flags in order
to allow the driver to notify mac80211 to generate MMIE and that it
requires sequence number generation only.
This is a preliminary patch to add BIP_CMAC_128 hw support to mt7615
driver
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/dfe275f9aa0f1cc6b33085f9efd5d8447f68ad13.1563228405.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support to mac80211 for parsing SPR elements as per
P802.11ax_D4.0 section 9.4.2.241.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190618061915.7102-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support to ieee80211_add_tx_radiotap_header() for handling rates
reported via ieee80211_tx_status. This allows us to also report HE rates.
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190714154419.11854-4-john@phrozen.org
[remove text about 60 GHz, mac80211 doesn't support it, fix endianness issue]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 33d915d9e8 ("{nl,mac}80211: allow 4addr AP operation on
crypto controlled devices") has introduced a change which allows
4addr operation on crypto controlled devices (ex: ath10k). This
change has inadvertently impacted the interface combinations logic
on such devices.
General rule is that software interfaces like AP/VLAN should not be
listed under supported interface combinations and should not be
considered during validation of these combinations; because of the
aforementioned change, AP/VLAN interfaces(if present) will be checked
against interfaces supported by the device and blocks valid interface
combinations.
Consider a case where an AP and AP/VLAN are up and running; when a
second AP device is brought up on the same physical device, this AP
will be checked against the AP/VLAN interface (which will not be
part of supported interface combinations of the device) and blocks
second AP to come up.
Add a new API cfg80211_iftype_allowed() to fix the problem, this
API works for all devices with/without SW crypto control.
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Fixes: 33d915d9e8 ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices")
Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we just read jiffies over and over again, a non-transmitting
entry may have a newer timestamp than the transmitting one,
leading to possible confusion on expiry. Give them all the same
timestamp when creating them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20190703133823.10530-3-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Don't parse the multi-BSSID structures if we couldn't even create
their transmitting BSS, this would confuse all of our tracking.
This also means that non_tx_data->tx_bss will never be NULL, so
we can clean up a little bit.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20190703133823.10530-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211_inform_single_bss_frame_data() doesn't need the
non_tx_data data argument since it's always NULL.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20190703133823.10530-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190703070142.GA29993@kroah.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Extended Key ID allows A-MPDU sessions while rekeying as long as each
A-MPDU aggregates only MPDUs with one keyid together.
Drivers able to segregate MPDUs accordingly can tell mac80211 to not
stop A-MPDU sessions when rekeying by setting the new flag
IEEE80211_HW_AMPDU_KEYBORDER_SUPPORT.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20190629195015.19680-3-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
1) Drop IEEE80211_HW_EXT_KEY_ID_NATIVE and let drivers directly set
the NL80211_EXT_FEATURE_EXT_KEY_ID flag.
2) Drop IEEE80211_HW_NO_AMPDU_KEYBORDER_SUPPORT and simply assume all
drivers are unable to handle A-MPDU key borders.
The new Extended Key ID API now requires all mac80211 drivers to set
NL80211_EXT_FEATURE_EXT_KEY_ID when they implement set_key() and can
handle Extended Key ID. For drivers not providing set_key() mac80211
itself enables Extended Key ID support, using the internal SW crypto
services.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20190629195015.19680-2-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since ieee80211_tx_dequeue() must not be called with softirqs enabled
(i.e. from process context without proper disable of bottom halves),
we add a wrapper that disables bottom halves before calling
ieee80211_tx_dequeue()
The new function is named ieee80211_tx_dequeue_ni() just as all other
from-process-context versions found in mac80211.
The documentation of ieee80211_tx_dequeue() is also updated so it
mentions that the function should not be called from process context.
Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
Link: https://lore.kernel.org/r/20190617200140.6189-1-erik.stromdahl@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of open coding the CCM aead mode in the driver, and invoking
the AES block cipher block by block, use a ccm(aes) aead transform
which already encapsulates this functionality. This is a cleaner use
of the crypto API, and permits optimized implementations to be used,
which are typically much faster and deal more efficiently with the
SIMD register file, which usually needs to be preserved/restored in
order to use special AES instructions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Link: https://lore.kernel.org/r/20190617091901.7063-1-ard.biesheuvel@linaro.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The remove_sta_debugfs callback in struct rate_control_ops is no longer
used by any driver, as there is no need for it (the debugfs directory is
already removed recursivly by the mac80211 core.) Because no one needs
it, just remove it to keep anyone else from accidentally using it in the
future.
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190612142658.12792-5-gregkh@linuxfoundation.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This low level driver can find it useful to get the vif
when a remain on channel session is cancelled.
iwlwifi will need this soon.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://lore.kernel.org/r/20190723180001.5828-1-emmanuel.grumbach@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for exporting ipv6 flow label via bpf_flow_keys.
Export flow label from bpf_flow.c and also return early when
BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed.
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This will allow us to write tests for those flags.
v2:
* Swap kfree(data) and kfree(user_ctx) (Song Liu)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible. Pass
those flags to the BPF flow dissector so it can make the same
decisions. In the next commits I'll add support for those flags to
our reference bpf_flow.c
v3:
* Export copy of flow dissector flags instead of moving (Alexei Starovoitov)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-07-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix segfault in libbpf, from Andrii.
2) fix gso_segs access, from Eric.
3) tls/sockmap fixes, from Jakub and John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need the same checks introduced by commit cb9f1b7838
("ip: validate header length on virtual device xmit") for
ipip tunnel.
Fixes: cb9f1b7838 ("ip: validate header length on virtual device xmit")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In conjunction with changing the interfaces' MTU (e.g. especially in
the case of a bonding) where the TIPC links are brought up and down
in a short time, a couple of issues were detected with the current link
changeover mechanism:
1) When one link is up but immediately forced down again, the failover
procedure will be carried out in order to failover all the messages in
the link's transmq queue onto the other working link. The link and node
state is also set to FAILINGOVER as part of the process. The message
will be transmited in form of a FAILOVER_MSG, so its size is plus of 40
bytes (= the message header size). There is no problem if the original
message size is not larger than the link's MTU - 40, and indeed this is
the max size of a normal payload messages. However, in the situation
above, because the link has just been up, the messages in the link's
transmq are almost SYNCH_MSGs which had been generated by the link
synching procedure, then their size might reach the max value already!
When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size
will exceed the link's MTU. As a result, the messages are dropped
silently and the failover procedure will never end up, the link will
not be able to exit the FAILINGOVER state, so cannot be re-established.
2) The same scenario above can happen more easily in case the MTU of
the links is set differently or when changing. In that case, as long as
a large message in the failure link's transmq queue was built and
fragmented with its link's MTU > the other link's one, the issue will
happen (there is no need of a link synching in advance).
3) The link synching procedure also faces with the same issue but since
the link synching is only started upon receipt of a SYNCH_MSG, dropping
the message will not result in a state deadlock, but it is not expected
as design.
The 1) & 3) issues are resolved by the last commit that only a dummy
SYNCH_MSG (i.e. without data) is generated at the link synching, so the
size of a FAILOVER_MSG if any then will never exceed the link's MTU.
For the 2) issue, the only solution is trying to fragment the messages
in the failure link's transmq queue according to the working link's MTU
so they can be failovered then. A new function is made to accomplish
this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the
original message size is too large, it will be fragmented & reassembled
at the receiving side.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit along with the next one are to resolve the issues with the
link changeover mechanism. See that commit for details.
Basically, for the link synching, from now on, we will send only one
single ("dummy") SYNCH message to peer. The SYNCH message does not
contain any data, just a header conveying the synch point to the peer.
A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for
backward compatible!
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Suggested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
act_ife at least requires TCA_IFE_PARMS, so we have to bail out
when there is no attribute passed in.
Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
Fixes: ef6980b6be ("introduce IFE action")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The label is used just once and the code it points at is not reused, no
point in keeping it.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in
situations where required data is missing leads to unexpected behaviour
with inverted checks like so:
| meta iifname != eth0 accept
This rule will never match if there is no input interface (or it is not
known) which is not intuitive and, what's worse, breaks consistency of
iptables-nft with iptables-legacy.
Fix this by falling back to placing a value in dreg which never matches
(avoiding accidental matches), i.e. zero for interface index and an
empty string for interface name.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sock_efree() releases the sock refcnt, if we don't hold this refcnt
when setting skb->destructor to it, the refcnt would not be balanced.
This leads to several bug reports from syzbot.
I have checked other users of sock_efree(), all of them hold the
sock refcnt.
Fixes: c8c8218ec5 ("netrom: fix a memory leak in nr_rx_frame()")
Reported-and-tested-by: <syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com>
Reported-and-tested-by: <syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com>
Reported-and-tested-by: <syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com>
Reported-and-tested-by: <syzbot+a34e5f3d0300163f0c87@syzkaller.appspotmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some functions in the datapath code are factored out so that each
one has a stack frame smaller than 1024 bytes with gcc. However,
when compiling with clang, the functions are inlined more aggressively
and combined again so we get
net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
Marking both get_flow_actions() and ovs_nla_init_match_and_action()
as 'noinline_for_stack' gives us the same behavior that we see with
gcc, and no warning. Note that this does not mean we actually use
less stack, as the functions call each other, and we still get
three copies of the large 'struct sw_flow_key' type on the stack.
The comment tells us that this was previously considered safe,
presumably since the netlink parsing functions are called with
a known backchain that does not also use a lot of stack space.
Fixes: 9cc9a5cb17 ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl04JHsTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdIB52B/9S17IKTEpoNwM5KvrR+ZMqWZTx6zng
iKNef14uuAemHzThMoSAJoJBGGzcgmZeu/nd0fyGg0f51kXVkapU4TsvYa2azkt9
WcTZQG3N4uUuW3D8hEbY2+au8e8ubNR+NnWQEqeoa5yuDGg7nSjD3wYd7rcY6jF5
veSGoZFQ9+MpjuYBHqtwiAauPAUqj8pHPAawGsaT/eNvsTHa7y43Iy38VbS2XTrF
Ds6mwp8unE0cuBUFD49YLs3JrYsbuQRf5q2kX3/qvIO9QtJ/GQ6Dt/Sy3iC7Eo9Y
PAVLK90sNb+LlRBKhfCAKUdgZsxdJ/kv6RfVDOiCoQDMVl/Ukv9UWB9Y
=1E76
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.3-20190724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-07-24
this is a pull reqeust of 7 patches for net/master.
The first patch is by Rasmus Villemoes add a missing netif_carrier_off() to
register_candev() so that generic netdev trigger based LEDs are initially off.
Nikita Yushchenko's patch for the rcar_canfd driver fixes a possible IRQ storm
on high load.
The patch by Weitao Hou for the mcp251x driver add missing error checking to
the work queue allocation.
Both Wen Yang's and Joakim Zhang's patch for the flexcan driver fix a problem
with the stop-mode.
Stephane Grosjean contributes a patch for the peak_usb driver to fix a
potential double kfree_skb().
The last patch is by YueHaibing and fixes the error path in can-gw's
cgw_module_init() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
which may change skb->data, so we need to re-load ipv6h at
the right place.
Fixes: 898b29798e ("ip6_gre: Refactor ip6gre xmit codes")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add error path for cgw_module_init to avoid possible crash if
some error occurs.
Fixes: c1aabdf379 ("can-gw: add netlink based CAN routing")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add missing SPDX identifiers for the CAN network layer and correct the SPDX
license for two of its include files to make sure the BSD-3-Clause applies
for the entire subsystem.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>