Commit Graph

856749 Commits

Author SHA1 Message Date
Vlad Buslov 73edca736e net/mlx5e: Protect hairpin entry flows list with spinlock
To remove dependency on rtnl lock, extend hairpin entry with spinlock and
use it to protect list of flows attached to hairpin entry from concurrent
modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-09 14:54:08 -07:00
Vlad Buslov e4f9abbd38 net/mlx5e: Extend hairpin entry with reference counter
List of flows attached to hairpin entry is used as implicit reference
counter (hairpin entry is deallocated when list becomes free) and as a
mechanism to obtain hairpin entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to hairpin entry is possible. Proper atomic reference counter is
required to support concurrent access.

As a preparation for extending hairpin with reference counting, extract
code that deletes hairpin entry into standalone function. In order to
remove this dependency on external locking, extend hairpin entry with
reference counter to manage its lifetime and extend flow structure with
direct pointer to hairpin entry that flow is attached to.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-09 14:54:08 -07:00
YueHaibing ca497fb6aa taprio: remove unused variable 'entry_list_policy'
net/sched/sch_taprio.c:680:32: warning:
 entry_list_policy defined but not used [-Wunused-const-variable=]

One of the points of commit a3d43c0d56 ("taprio: Add support adding
an admin schedule") is that it removes support (it now returns "not
supported") for schedules using the TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY
attribute (which were never used), the parsing of those types of schedules
was the only user of this policy. So removing this policy should be fine.

Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:41:24 -07:00
Holger Hoffstätte a7eb6a4f25 r8169: fix performance issue on RTL8168evl
Disabling TSO but leaving SG active results is a significant
performance drop. Therefore disable also SG on RTL8168evl.
This restores the original performance.

Fixes: 93681cd7d9 ("r8169: enable HW csum and TSO")
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:37:39 -07:00
Josh Hunt 1555e6fdf0 tcp: Update TCP_BASE_MSS comment
TCP_BASE_MSS is used as the default initial MSS value when MTU probing is
enabled. Update the comment to reflect this.

Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:03:30 -07:00
Josh Hunt c04b79b6cf tcp: add new tcp_mtu_probe_floor sysctl
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.

Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.

The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 13:03:30 -07:00
Jiri Pirko 3a5e523479 devlink: remove pointless data_len arg from region snapshot create
The size of the snapshot has to be the same as the size of the region,
therefore no need to pass it again during snapshot creation. Remove the
arg and use region->size instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 11:21:46 -07:00
Eric Dumazet 1a9914884d tcp: batch calls to sk_flush_backlog()
Starting from commit d41a69f1d3 ("tcp: make tcp_sendmsg() aware of socket backlog")
loopback flows got hurt, because for each skb sent, the socket receives an
immediate ACK and sk_flush_backlog() causes extra work.

Intent was to not let the backlog grow too much, but we went a bit too far.

We can check the backlog every 16 skbs (about 1MB chunks)
to increase TCP over loopback performance by about 15 %

Note that the call to sk_flush_backlog() handles a single ACK,
thanks to coalescing done on backlog, but cleans the 16 skbs
found in rtx rb-tree.

Reported-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09 11:03:27 -07:00
Denis Efremov fcc32a2165 liquidio: Use pcie_flr() instead of reimplementing it
octeon_mbox_process_cmd() directly writes the PCI_EXP_DEVCTL_BCR_FLR
bit, which bypasses timing requirements imposed by the PCIe spec.
This patch fixes the function to use the pcie_flr() interface instead.

Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:41:02 -07:00
Heiner Kallweit 32879f0001 r8169: allocate rx buffers using alloc_pages_node
We allocate 16kb per rx buffer, so we can avoid some overhead by using
alloc_pages_node directly instead of bothering kmalloc_node. Due to
this change buffers are page-aligned now, therefore the alignment check
can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:34:55 -07:00
YueHaibing 018e5b4587 fq_codel: remove set but not used variables 'prev_ecn_mark' and 'prev_drop_count'
Fixes gcc '-Wunused-but-set-variable' warning:

net/sched/sch_fq_codel.c: In function fq_codel_dequeue:
net/sched/sch_fq_codel.c:288:23: warning: variable prev_ecn_mark set but not used [-Wunused-but-set-variable]
net/sched/sch_fq_codel.c:288:6: warning: variable prev_drop_count set but not used [-Wunused-but-set-variable]

They are not used since commit 77ddaff218 ("fq_codel: Kill
useless per-flow dropped statistic")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:32:19 -07:00
Jiri Pirko da382875c6 mlxsw: spectrum: Extend to support Spectrum-3 ASIC
Extend existing driver for Spectrum and Spectrum-2 ASICs
to support Spectrum-3 ASIC as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:27:09 -07:00
David S. Miller eb716a649f Merge branch 'stmmac-next'
Jose Abreu says:

====================
net: stmmac: Improvements for -next

[ This is just a rebase of v2 into latest -next in order to avoid a merge
conflict ]

Couple of improvements for -next tree. More info in commit logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu ccfc639a94 net: stmmac: selftests: Add a selftest for Flexible RX Parser
Add a selftest for the Flexible RX Parser feature.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu d6e1c12cf9 net: stmmac: Add Flexible RX Parser support in XGMAC
XGMAC cores also support the Flexible RX Parser feature. Add the support
for it in the XGMAC core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 56e58d6c8a net: stmmac: Implement Safety Features in XGMAC core
XGMAC also supports Safety Features. This patch implements the
configuration and handling of this feature in XGMAC core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 74043f6b22 net: stmmac: selftests: Add test for VLAN and Double VLAN Filtering
Add a selftest for VLAN and Double VLAN Filtering in stmmac.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 3cd1cfcba2 net: stmmac: Implement VLAN Hash Filtering in XGMAC
Implement the VLAN Hash Filtering feature in XGMAC core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 1fbdad0005 net: stmmac: selftests: Add RSS test
Add a test for RSS in the stmmac selftests.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 76067459c6 net: stmmac: Implement RSS and enable it in XGMAC core
Implement the RSS functionality and add the corresponding callbacks in
XGMAC core.

Changes from v1:
	- Do not use magic constants (Jakub)
	- Use ethtool_rxfh_indir_default() (Jakub)

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 7035aad875 net: stmmac: xgmac: Implement tx_queue_prio()
Implement the TX Queue Priority callback in XGMAC core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu 5656ac5542 net: stmmac: xgmac: Implement set_mtl_tx_queue_weight()
Implement the TX Queue Weight callback. In order for this to be active
we also need to set ETS algorithm when configuring Queue.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:19 -07:00
Jose Abreu b6cdf09f51 net: stmmac: xgmac: Implement MMC counters
Implement the MMC counters feature in XGMAC core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:20:18 -07:00
John Rutherford 6c9081a391 tipc: add loopback device tracking
Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.

We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface.  The clones are dropped at reception
and have no functional role except making the traffic visible.

The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.

v3 - Function dev_nit_active used to check for network taps.
   - Procedure netif_rx_ni used to send cloned messages to loopback device.

Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 22:11:39 -07:00
David S. Miller 2339ef1cf3 Merge branch 'flow_offload-add-indr-block-in-nf_table_offload'
wenxu says:

====================
flow_offload: add indr-block in nf_table_offload

This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.

The first four patches mv tc indr block to flow offload and
rename to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The fifth patch provide a callback list to get
flow_block of each subsystem immediately when the device
register and contain a block.
The last patch make nf_tables_offload support flow-indr-block.

This version add a mutex lock for add/del flow_indr_block_ing_cb
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:31 -07:00
wenxu 9a32669fec netfilter: nf_tables_offload: support indr block call
nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.

nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu 1150ab0f1b flow_offload: support get multi-subsystem block
It provide a callback list to find the blocks of tc
and nft subsystems

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu 4e481908c5 flow_offload: move tc indirect block to flow offload
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu e4da910211 cls_api: add flow_indr_block_call function
This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu f843698857 cls_api: remove the tcf_block cache
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu 242453c227 cls_api: modify the tc_indr_block_ing_cmd parameters.
This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
David S. Miller 61552d2ce8 Merge branch 'net-batched-receive-in-GRO-path'
Edward Cree says:

====================
net: batched receive in GRO path

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.
Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
 onto a list in the napi struct, which is received at the end of the napi
 poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  The receive side
 (which was the device under test) had the NetPerf process pinned to one CPU,
 and the device interrupts pinned to a second CPU.  CPU utilisation figures
 (used in cases of line-rate performance) are summed across all CPUs.
net.core.gro_normal_batch was left at its default value of 8.

TCP 4 streams, GRO on: all results line rate (9.415Gbps)
net-next: 210.3% cpu
after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
after #3: 196.7% cpu (- 8.4%, p=0.136 vs net-next)
TCP 4 streams, GRO off:
net-next: 8.017 Gbps
after #1: 7.785 Gbps (- 2.9%, p=0.385 vs net-next)
after #3: 7.604 Gbps (- 5.1%, p=0.282 vs net-next.  But note *)
TCP 1 stream, GRO off:
net-next: 6.553 Gbps
after #1: 6.444 Gbps (- 1.7%, p=0.302 vs net-next)
after #3: 6.790 Gbps (+ 3.6%, p=0.169 vs net-next)
TCP 1 stream, GRO on, busy_read = 50: all results line rate
net-next: 156.0% cpu
after #1: 174.5% cpu (+11.9%, p=0.015 vs net-next)
after #3: 165.0% cpu (+ 5.8%, p=0.147 vs net-next)
TCP 1 stream, GRO off, busy_read = 50:
net-next: 6.488 Gbps
after #1: 6.625 Gbps (+ 2.1%, p=0.059 vs net-next)
after #3: 7.351 Gbps (+13.3%, p=0.026 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload
net-next: 995.083 us
after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload, busy_read = 50:
net-next:   2.851 ms
after #1:   2.871 ms (+ 0.7%, p=0.134 vs net-next)
after #3:   2.937 ms (+ 3.0%, p<0.001 vs net-next)
TCP_RR 100 streams, GRO off, 1 byte payload, busy_read = 50:
net-next: 867.317 us
after #1: 865.717 us (- 0.2%, p=0.334 vs net-next)
after #3: 868.517 us (+ 0.1%, p=0.414 vs net-next)

(*) These tests produced a mixture of line-rate and below-line-rate results,
 meaning that statistically speaking the results were 'censored' by the
 upper bound, and were thus not normally distributed, making a Welch t-test
 mathematically invalid.  I therefore also calculated estimators according
 to [1], which gave the following:
net-next: 8.133 Gbps
after #1: 8.130 Gbps (- 0.0%, p=0.499 vs net-next)
after #3: 7.680 Gbps (- 5.6%, p=0.285 vs net-next)
(though my procedure for determining ν wasn't mathematically well-founded
 either, so take that p-value with a grain of salt).
A further check came from dividing the bandwidth figure by the CPU usage for
 each test run, giving:
net-next: 3.461
after #1: 3.198 (- 7.6%, p=0.145 vs net-next)
after #3: 3.641 (+ 5.2%, p=0.280 vs net-next)

The above results are fairly mixed, and in most cases not statistically
 significant.  But I think we can roughly conclude that the series
 marginally improves non-GROable throughput, without hurting latency
 (except in the large-payload busy-polling case, which in any case yields
 horrid performance even on net-next (almost triple the latency without
 busy-poll).  Also, drivers which, unlike sfc, pass UDP traffic to GRO
 would expect to see a benefit from gaining access to batching.

Changed in v3:
 * gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one
 * removed RFC tags (no comments after a week means no-one objects, right?)

Changed in v2:
 * During busy poll, call gro_normal_list() to receive batched packets
   after each cycle of the napi busy loop.  See comments in Patch #3 for
   complications of doing the same in busy_poll_stop().

[1]: Cohen 1959, doi: 10.1080/00401706.1959.10489859
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree 323ebb61e3 net: use listified RX for handling GRO_NORMAL skbs
When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
 of passing it to the stack immediately, place it on a list in the napi
 struct.  Then, at flush time (napi_complete_done(), napi_poll(), or
 napi_busy_loop()), call netif_receive_skb_list_internal() on the list.
We'd like to do that in napi_gro_flush(), but it's not called if
 !napi->gro_bitmask, so we have to do it in the callers instead.  (There are
 a handful of drivers that call napi_gro_flush() themselves, but it's not
 clear why, or whether this will affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
 list whenever it exceeds gro_normal_batch, a new net/core sysctl that
 defaults to 8.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree 6727013694 sfc: falcon: don't score irq moderation points for GRO
Same rationale as for sfc, except that this wasn't performance-tested.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree 5e040d4b1a sfc: don't score irq moderation points for GRO
We already scored points when handling the RX event, no-one else does this,
 and looking at the history it appears this was originally meant to only
 score on merges, not on GRO_NORMAL.  Moreover, it gets in the way of
 changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
 throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
 in RX CPU usage (n=6, p=0.03).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Rahul Verma 5e6d9fc761 qed: Add new ethtool supported port types based on media.
Supported ports in ethtool <eth1> are displayed based on media type.
For media type fibre and twinaxial, port type is "FIBRE". Media type
Base-T is "TP" and media KR is "Backplane".

V1->V2:
Corrected the subject.

Signed-off-by: Rahul Verma <rahulv@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:14:07 -07:00
Chuhong Yuan ad2dcba008 cxgb4: smt: Use normal int for refcount
All refcount operations are protected by spinlocks now.
Then the atomic counter can be replaced by a normal int.

This patch depends on PATCH 1/2.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:12:17 -07:00
Chuhong Yuan 4a8937b838 cxgb4: smt: Add lock for atomic_dec_and_test
The atomic_dec_and_test() is not safe because it is
outside of locks.
Move the locks of t4_smte_free() to its caller,
cxgb4_smt_release() to protect the atomic decrement.

Fixes: 3bdb376e69 ("cxgb4: introduce SMT ops to prepare for SMAC rewrite support")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:12:17 -07:00
David Ahern e858ef1cd4 selftests: Add l2tp tests
Add IPv4 and IPv6 l2tp tests. Current set is over IP and with
IPsec.

v2
- add l2tp.sh to TEST_PROGS in Makefile

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:08:09 -07:00
Alexey Dobriyan 9d2f112383 net: delete "register" keyword
Delete long obsoleted "register" keyword.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:03:42 -07:00
Chuhong Yuan 4b4de39850 mkiss: Use refcount_t for refcount
refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 17:58:40 -07:00
Chuhong Yuan 31168a6d12 dpaa_eth: Use refcount_t for refcount
refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 17:57:54 -07:00
David S. Miller b3a598eb0d This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
 
  - Replace usage of strlcpy with strscpy, by Sven Eckelmann
 
  - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
    (2 patches)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1MHekWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoW8TEACox7vhtW4MS8QCzPySCU3f8V6m
 f+2PPlUbM4CqXcOPGw/jQng6PAtcb4gNPssm52GaxdeB9jFqkI/ELdSn5mCh+EcG
 QRrhf5DVruYyqBU2gNhovEe7SlJl8IJno6kFdAggaMngXnvlBzIr7n4FMIUKNFYn
 6kFbA8pugBXXvhiRcuzs+l5iUxdKUxTsUNPyppyqnqb8lrb0/30/681dfq87PmcV
 zehEf8Ry23W7CVQv6YougVJvK0GUwysULsvm8Wc8FsOke7CeeQIPLEF2Pcrl/CFM
 mfynXVXngE41MPBC59eUcWBGlRYwkuwm4Q+YQ8OUjr5+X5YP06jR5Dh8u6KVAMuy
 QWGSwyrlXSsCE6BTxoijdJqsLzHDXCmYY0GQI2tEMCyDnL95CU3tuTk4vckusuf+
 NlhHv7m+Bo0w9ztDUBifzNyURW9VgUCoOZfW9rdYRWjjN8Oe6wnMWCFttGnkg0qu
 zrCJn5mGvz3Vp434K0uGY1wOZincSdM6grBSgmZv1UNMaBdlfNroJsUqvz6IE5Fe
 iI5kqRXoUG+ftfwacgyFEK08HpcZHvJkNDiHHRlOPPZm75yE7mwreVUrGRaibywQ
 pzTEwM+H2MX9xF+osjiPVc197fxpnkX9fI2LhDOEiblaJbdiIxd+cgWRMFS1YPvd
 ANaFfAh58+gDWcRqhg==
 =Cbzf
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Replace usage of strlcpy with strscpy, by Sven Eckelmann

 - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
   (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 11:28:40 -07:00
David S. Miller 13dfb3fa49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 18:44:57 -07:00
Linus Torvalds 33920f1ec5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Yeah I should have sent a pull request last week, so there is a lot
  more here than usual:

   1) Fix memory leak in ebtables compat code, from Wenwen Wang.

   2) Several kTLS bug fixes from Jakub Kicinski (circular close on
      disconnect etc.)

   3) Force slave speed check on link state recovery in bonding 802.3ad
      mode, from Thomas Falcon.

   4) Clear RX descriptor bits before assigning buffers to them in
      stmmac, from Jose Abreu.

   5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
      loops, from Nishka Dasgupta.

   6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.

   7) Need to hold sock across skb->destructor invocation, from Cong
      Wang.

   8) IP header length needs to be validated in ipip tunnel xmit, from
      Haishuang Yan.

   9) Use after free in ip6 tunnel driver, also from Haishuang Yan.

  10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
      Heiner Kallweit.

  11) Upon bridge device init failure, we need to delete the local fdb.
      From Nikolay Aleksandrov.

  12) Handle erros from of_get_mac_address() properly in stmmac, from
      Martin Blumenstingl.

  13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
      Kadlecsik.

  14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
      some devices, so revert. From Johannes Berg.

  15) Fix deadlock in rxrpc, from David Howells.

  16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
      Haibing.

  17) Fix mvpp2 crash on module removal, from Matteo Croce.

  18) Fix race in genphy_update_link, from Heiner Kallweit.

  19) bpf_xdp_adjust_head() stopped working with generic XDP when we
      fixes generic XDP to support stacked devices properly, fix from
      Jesper Dangaard Brouer.

  20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
      David Ahern.

  21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
  net: dsa: sja1105: Fix memory leak on meta state machine error path
  net: dsa: sja1105: Fix memory leak on meta state machine normal path
  net: dsa: sja1105: Really fix panic on unregistering PTP clock
  net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
  net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
  net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
  net: sched: sample: allow accessing psample_group with rtnl
  net: sched: police: allow accessing police->params with rtnl
  net: hisilicon: Fix dma_map_single failed on arm64
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  tc-testing: updated vlan action tests with batch create/delete
  net sched: update vlan action for batched events operations
  net: stmmac: tc: Do not return a fragment entry
  net: stmmac: Fix issues when number of Queues >= 4
  net: stmmac: xgmac: Fix XGMAC selftests
  be2net: disable bh with spin_lock in be_process_mcc
  net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
  net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
  net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
  ...
2019-08-06 17:11:59 -07:00
David S. Miller 05bb520376 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-08-05

This series contains updates to i40e driver only.

Dmitrii adds missing statistic counters for VEB and VEB TC's.

Slawomir adds support for logging the "Disable Firmware LLDP" flag
option and its current status.

Jake fixes an issue where VF's being notified of their link status
before their queues are enabled which was causing issues.  So always
report link status down when the VF queues are not enabled.  Also adds
future proofing when statistics are added or removed by adding checks to
ensure the data pointer for the strings lines up with the expected
statistics count.

Czeslaw fixes the advertised mode reported in ethtool for FEC, where the
"None BaseR RS" was always being displayed no matter what the mode it
was in.  Also added logging information when the PF is entering or
leaving "allmulti" (or promiscuous) mode.  Fixed up the logging logic
for VF's when leaving multicast mode to not include unicast as well.

v2: drop Aleksandr's patch (previously patch #2 in the series) to
    display the VF MAC address that is set by the VF while community
    feedback is addressed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:41:45 -07:00
Yifeng Sun aa733660db openvswitch: Print error when ovs_execute_actions() fails
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:38:12 -07:00
David S. Miller feac1d6802 Merge branch 'sja1105-fixes'
Vladimir Oltean says:

====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP

This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).

Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.

Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.

Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean 93fa8587b2 net: dsa: sja1105: Fix memory leak on meta state machine error path
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.

The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).

However, when dropping the initial link-local frame, its backing memory
was being leaked.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean f163fed276 net: dsa: sja1105: Fix memory leak on meta state machine normal path
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.

Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.

The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed.  That is the job of kfree_skb(), which
also calls skb_unref().

But it turns out that freeing the cached stampable_skb is in fact not
necessary.  It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp.  So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00