Commit Graph

740763 Commits

Author SHA1 Message Date
Wei Yongjun 004c3cf1a1 cxgb4: fix error return code in adap_init0()
Fix to return a negative error code from the hash filter init error
handling case instead of 0, as done elsewhere in this function.

Fixes: 5c31254e35 ("cxgb4: initialize hash-filter configuration")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30 09:44:29 -04:00
Joe Perches 26c97c5d8d netfilter: ipset: Use is_zero_ether_addr instead of static and memcmp
To make the test a bit clearer and to reduce object size a little.

Miscellanea:

o remove now unnecessary static const array

$ size ip_set_hash_mac.o*
   text	   data	    bss	    dec	    hex	filename
  22822	   4619	     64	  27505	   6b71	ip_set_hash_mac.o.allyesconfig.new
  22932	   4683	     64	  27679	   6c1f	ip_set_hash_mac.o.allyesconfig.old
  10443	   1040	      0	  11483	   2cdb	ip_set_hash_mac.o.defconfig.new
  10507	   1040	      0	  11547	   2d1b	ip_set_hash_mac.o.defconfig.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 12:20:44 +02:00
Florian Westphal e3b5e1ec75 Revert "netfilter: x_tables: ensure last rule in base chain matches underflow/policy"
This reverts commit 0d7df906a0.

Valdis Kletnieks reported that xtables is broken in linux-next since
0d7df906a0  ("netfilter: x_tables: ensure last rule in base chain
matches underflow/policy"), as kernel rejects the (well-formed) ruleset:

[   64.402790] ip6_tables: last base chain position 1136 doesn't match underflow 1344 (hook 1)

mark_source_chains is not the correct place for such a check, as it
terminates evaluation of a chain once it sees an unconditional verdict
(following rules are known to be unreachable). It seems preferrable to
fix libiptc instead, so remove this check again.

Fixes: 0d7df906a0 ("netfilter: x_tables: ensure last rule in base chain matches underflow/policy")
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 12:20:32 +02:00
Ben Hutchings 9ba5c404bf netfilter: x_tables: Add note about how to free percpu counters
Due to the way percpu counters are allocated and freed in blocks,
it is not safe to free counters individually.  Currently all callers
do the right thing, but let's note this restriction.

Fixes: ae0ac0ed6f ("netfilter: x_tables: pack percpu counter allocations")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:44:27 +02:00
Arushi Singhal c47d36b385 netfilter: Merge assignment with return
Merge assignment with return statement to directly return the value.

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:44:27 +02:00
Pablo Neira Ayuso a3073c17dd netfilter: nf_tables: use nft_set_lookup_global from nf_tables_newsetelem()
Replace opencoded implementation of nft_set_lookup_global() by call to
this function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:20 +02:00
Pablo Neira Ayuso 10659cbab7 netfilter: nf_tables: rename to nft_set_lookup_global()
To prepare shorter introduction of shorter function prefix.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:20 +02:00
Pablo Neira Ayuso 43a605f2f7 netfilter: nf_tables: enable conntrack if NAT chain is registered
Register conntrack hooks if the user adds NAT chains. Users get confused
with the existing behaviour since they will see no packets hitting this
chain until they add the first rule that refers to conntrack.

This patch adds new ->init() and ->free() indirections to chain types
that can be used by NAT chains to invoke the conntrack dependency.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:19 +02:00
Pablo Neira Ayuso 02c7b25e5f netfilter: nf_tables: build-in filter chain type
One module per supported filter chain family type takes too much memory
for very little code - too much modularization - place all chain filter
definitions in one single file.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:19 +02:00
Pablo Neira Ayuso cc07eeb0e5 netfilter: nf_tables: nft_register_chain_type() returns void
Use WARN_ON() instead since it should not happen that neither family
goes over NFPROTO_NUMPROTO nor there is already a chain of this type
already registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:18 +02:00
Pablo Neira Ayuso 32537e9184 netfilter: nf_tables: rename struct nf_chain_type
Use nft_ prefix. By when I added chain types, I forgot to use the
nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too,
otherwise there is an overlap.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:29:17 +02:00
Joe Perches 9124a20d87 netfilter: ebt_stp: Use generic functions for comparisons
Instead of unnecessary const declarations, use the generic functions to
save a little object space.

$ size net/bridge/netfilter/ebt_stp.o*
   text	   data	    bss	    dec	    hex	filename
   1250	    144	      0	   1394	    572	net/bridge/netfilter/ebt_stp.o.new
   1344	    144	      0	   1488	    5d0	net/bridge/netfilter/ebt_stp.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:24:29 +02:00
Pablo Neira Ayuso 19b351f16f netfilter: add flowtable documentation
This patch adds initial documentation for the Netfilter flowtable
infrastructure.

Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:04:41 +02:00
Bernie Harris 1be3ac9844 netfilter: ebtables: Add string filter
This patch is part of a proposal to add a string filter to
ebtables, which would be similar to the string filter in
iptables. Like iptables, the ebtables filter uses the xt_string
module.

Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:04:12 +02:00
Bernie Harris 39c202d228 netfilter: ebtables: Add support for specifying match revision
Currently ebtables assumes that the revision number of all match
modules is 0, which is an issue when trying to use existing
xtables matches with ebtables. The solution is to modify ebtables
to allow extensions to specify a revision number, similar to
iptables. This gets passed down to the kernel, which is then able
to find the match module correctly.

To main binary backwards compatibility, the size of the ebt_entry
structures is not changed, only the size of the name field is
decreased by 1 byte to make room for the revision field.

Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30 11:03:39 +02:00
David S. Miller 18845557fd wireless-drivers-next patches for 4.17
Smaller new features to various drivers but nothing really out of
 ordinary.
 
 Major changes:
 
 ath10k
 
 * enable chip temperature measurement for QCA6174/QCA9377
 
 * add firmware memory dump for QCA9984
 
 * enable buffer STA on TDLS link for QCA6174
 
 * support different beacon internals in multiple interface scenario
   for QCA988X/QCA99X0/QCA9984/QCA4019
 
 iwlwifi
 
 * support for new PCI IDs for the 9000 family
 
 * support for a new firmware API version
 
 * support for advanced dwell and Optimized Connectivity Experience
   (OCE) in scanning
 
 btrsi
 
 * fix kconfig dependencies
 
 wil6210
 
 * support multiple virtual interfaces
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJavOUVAAoJEG4XJFUm622bf2cH/i/we50/PR64rxq5RV8HWQPt
 9P/bn4I1pEDRDJZw00f/bqe20w5ps1UuHs1hgOiJzOsZNfxVv+B17MMY/dN/+5Ob
 hBYeZQFZ0MJ1cEF6J6gzebH1BKVEApVPI49MmAUzdtATO/6YgCnxovb61bl51RYg
 vtMNcrXBQDplPTh5W7YHXrUz4sV1d3daCk/6Coea+SoF6eL1tn+qndo6GhNIMboE
 HA0vV27Wf2RCzlxcQ4WGD4cre6n9sRaX4lcFzMEtbM7ziM0E0SES7eEThB5ST+A6
 3K0+zSuyEfOu40QRJF1g9Wl8FXvp+9SRVC7s1rXjDYUv9sGl+a5fhb7VPgrYn/g=
 =AgFb
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.17

Smaller new features to various drivers but nothing really out of
ordinary.

Major changes:

ath10k

* enable chip temperature measurement for QCA6174/QCA9377

* add firmware memory dump for QCA9984

* enable buffer STA on TDLS link for QCA6174

* support different beacon internals in multiple interface scenario
  for QCA988X/QCA99X0/QCA9984/QCA4019

iwlwifi

* support for new PCI IDs for the 9000 family

* support for a new firmware API version

* support for advanced dwell and Optimized Connectivity Experience
  (OCE) in scanning

btrsi

* fix kconfig dependencies

wil6210

* support multiple virtual interfaces
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 16:24:06 -04:00
David S. Miller e15f20ea33 We have a fair number of patches, but many of them are from the
first bullet here:
  * EAPoL-over-nl80211 from Denis - this will let us fix
    some long-standing issues with bridging, races with
    encryption and more
  * DFS offload support from the qtnfmac folks
  * regulatory database changes for the new ETSI adaptivity
    requirements
  * various other fixes and small enhancements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlq85QQACgkQB8qZga/f
 l8Sdbg//bc8C/4F1TUdJqZWGK1j9Lwd6nZpP0iFqH5Ees3MMnti5XdV2dCL31ivz
 c8DOuRAU8ZG/RLPgSTHVZHwh+f7S5/TxSKg8WOBvrYk7a0C1uvVVhe5XZQEmqE7g
 eqM0+UQ5DyzUYnu0kSUrFKPV7BqDa2YzVDdK8e8iozqZmAnvGN9k8H7EDEeUxtxk
 LEl+bEcmhDSfIssU2Iaksl+9qoZP6BkoVGAOmDzIL654WV4XVKorxRX6vndqSQGu
 0cCz2Occ+/0hfvszONBRR4M/gtI/Yyn3u+D1Q0YD3X40Q9gJE11fcodmMT61l5C7
 rGcu94RIGilvRvjZScK3giiU2L7DD+VETUa+YGnjd8gLpmrYd6cURxlm4yuHbw8C
 UScLCApAUuY+skmPLeuyHW9mnzHaC336vzVjk8OhdNRhX23/rB8nk1yIywgqETVW
 g/iub8/Xp6TRfdyh76I18wlfqCp1It2JAeICgKH5NPlwUA6U0xFR0/ddSR8FuAcK
 ZLY8mgsc2kIH6r4x5sjeH+Yb6tGi/Z3HMZM2hna+t4vSpn6Q5+GPsA6yuHuBUhJb
 8QswMiLDSux8I4guKgQyROiHaCzE3zOigJ4o1z9XITKsgluZVxnKr+ETKdr88WFp
 II8U0qH/kejXIfxUjbv5Wla70J9wi/hjxR6vOfSkEtYNvIApdfc=
 =hI0F
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a fair number of patches, but many of them are from the
first bullet here:
 * EAPoL-over-nl80211 from Denis - this will let us fix
   some long-standing issues with bridging, races with
   encryption and more
 * DFS offload support from the qtnfmac folks
 * regulatory database changes for the new ETSI adaptivity
   requirements
 * various other fixes and small enhancements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 16:23:26 -04:00
David S. Miller e3e67a4f07 Merge branch 'dsa-Add-ATU-VTU-statistics'
Andrew Lunn says:

====================
Add ATU/VTU statistics

Previous patches have added basic support for Address Translation Unit
and VLAN translation Unit violation interrupts. Add statistics
counters for when these occur, which can be accessed using
ethtool. Downgrade one of the particularly spammy warnings from VTU
violations to debug only, now that we have a counter for it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Andrew Lunn 7f20d834ea net: dsa: mv88e6xxx: Make VTU miss violations less spammy
VTU miss violations can happen under normal conditions. Don't spam the
kernel log, downgrade the output to debug level only. The statistics
counter will indicate it is happening, if anybody not debugging is
interested.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Andrew Lunn 65f60e4582 net: dsa: mv88e6xxx: Keep ATU/VTU violation statistics
Count the numbers of various ATU and VTU violation statistics and
return them as part of the ethtool -S statistics.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 15:04:22 -04:00
Arnd Bergmann 7ae665f132 sctp: fix unused lable warning
The proc file cleanup left a label possibly unused:

net/sctp/protocol.c: In function 'sctp_defaults_init':
net/sctp/protocol.c:1304:1: error: label 'err_init_proc' defined but not used [-Werror=unused-label]

This adds an #ifdef around it to match the respective 'goto'.

Fixes: d47d08c8ca ("sctp: use proc_remove_subtree()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:38:27 -04:00
Wei Yongjun 75498aa1af net: cavium: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:36:33 -04:00
Wei Yongjun 335ab8ba61 net: bcmgenet: return NULL instead of plain integer
Fixes the following sparse warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c:1351:16: warning:
 Using plain integer as NULL pointer

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:35:51 -04:00
Dan Carpenter 99fe29d3a2 test_bpf: Fix NULL vs IS_ERR() check in test_skb_segment()
The skb_segment() function returns error pointers on error.  It never
returns NULL.

Fixes: 76db8087c4 ("net: bpf: add a test for skb_segment in test_bpf module")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:33:29 -04:00
Russell King 981f1f8035 sfp: allow cotsworks modules
Cotsworks modules fail the checksums - it appears that Cotsworks
reprograms the EEPROM at the end of production with the final product
information (serial, date code, and exact part number for module
options) and fails to update the checksum.

Work around this by detecting the Cotsworks name in the manufacturer
field, and reducing the checksum failures to warnings rather than a
hard error.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:30:41 -04:00
David S. Miller 5593e3340f Merge branch 'qed-flash-upgrade-support'
Sudarsana Reddy Kalluru says:

====================
qed*: Flash upgrade support.

The patch series adds adapter flash upgrade support for qed/qede drivers.

Please consider applying it to net-next branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:56 -04:00
Sudarsana Reddy Kalluru ccfa110cfe qede: Ethtool flash update support.
The patch adds ethtool callback implementation for flash update.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:56 -04:00
Sudarsana Reddy Kalluru 3a69cae80c qed: Adapter flash update support.
This patch adds the required driver support for updating the flash or
non volatile memory of the adapter. At highlevel, flash upgrade comprises
of reading the flash images from the input file, validating the images and
writing them to the respective paritions.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru 62e4d4386a qed: Add APIs for flash access.
This patch adds APIs for flash access.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru d8bf47af24 qed: Fix PTT entry leak in the selftest error flow.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Sudarsana Reddy Kalluru 43645ce03e qed: Populate nvm image attribute shadow.
This patch adds support for populating the flash image attributes.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:29:55 -04:00
Michal Kalderon 50bc60cb15 qed*: Utilize FW 8.33.11.0
This FW contains several fixes and features

RDMA Features
- SRQ support
- XRC support
- Memory window support
- RDMA low latency queue support
- RDMA bonding support

RDMA bug fixes
- RDMA remote invalidate during retransmit fix
- iWARP MPA connect interop issue with RTR fix
- iWARP Legacy DPM support
- Fix MPA reject flow
- iWARP error handling
- RQ WQE validation checks

MISC
- Fix some HSI types endianity
- New Restriction: vlan insertion in core_tx_bd_data can't be set
  for LB packets

ETH
- HW QoS offload support
- Fix vlan, dcb and sriov flow of VF sending a packet with
  inband VLAN tag instead of default VLAN
- Allow GRE version 1 offloads in RX flow
- Allow VXLAN steering

iSCSI / FcoE
- Fix bd availability checking flow
- Support 256th sge proerly in iscsi/fcoe retransmit
- Performance improvement
- Fix handle iSCSI command arrival with AHS and with immediate
- Fix ipv6 traffic class configuration

DEBUG
- Update debug utilities

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:18:02 -04:00
Eric Dumazet 18dcbe12fe ipv6: export ip6 fragments sysctl to unprivileged users
IPv4 was changed in commit 52a773d645 ("net: Export ip fragment
sysctl to unprivileged users")

The only sysctl that is not per-netns is not used :
ip6frag_secret_interval

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:14:03 -04:00
Intiyaz Basha 697fefc7c1 liquidio: Prioritize control messages
During heavy tx traffic, control messages (sent by liquidio driver to NIC
firmware) sometimes do not get processed in a timely manner.  Reason is:
the low-level metadata of control messages and that of egress network
packets indicate that they have the same priority.

Fix it by setting a higher priority for control messages through the new
ctrl_qpg field in the oct_txpciq struct.  It is the NIC firmware that does
the actual setting of priority by writing to the new ctrl_qpg field; the
host driver treats that value as opaque and just assigns it to pki_ih3->qpg

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:13:49 -04:00
David S. Miller b349e0b5ec Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'
David Ahern says:

====================
net: Allow FIB notifiers to fail add and replace

I wanted to revisit how resource overload is handled for hardware offload
of FIB entries and rules. At the moment, the in-kernel fib notifier can
tell a driver about a route or rule add, replace, and delete, but the
notifier can not affect the action. Specifically, in the case of mlxsw
if a route or rule add is going to overflow the ASIC resources the only
recourse is to abort hardware offload. Aborting offload is akin to taking
down the switch as the path from data plane to the control plane simply
can not support the traffic bandwidth of the front panel ports. Further,
the current state of FIB notifiers is inconsistent with other resources
where a driver can affect a user request - e.g., enslavement of a port
into a bridge or a VRF.

As a result of the work done over the past 3+ years, I believe we are
at a point where we can bring consistency to the stack and offloads,
and reliably allow the FIB notifiers to fail a request, pushing an error
along with a suitable error message back to the user. Rather than
aborting offload when the switch is out of resources, userspace is simply
prevented from adding more routes and has a clear indication of why.

This set does not resolve the corner case where rules or routes not
supported by the device are installed prior to the driver getting loaded
and registering for FIB notifications. In that case, hardware offload has
not been established and it can refuse to offload anything, sending
errors back to userspace via extack. Since conceptually the driver owns
the netdevices associated with its asic, this corner case mainly applies
to unsupported rules and any races during the bringup phase.

Patch 1 fixes call_fib_notifiers to extract the errno from the encoded
response from handlers.

Patches 2-5 allow the call to call_fib_notifiers to fail the add or
replace of a route or rule.

Patch 6 adds a simple resource controller to netdevsim to illustrate
how a FIB resource controller can limit the number of route entries.

Changes since RFC
- correct return code for call_fib_notifier
- dropped patch 6 exporting devlink symbols
- limited example resource controller to init_net only
- updated Kconfig for netdevsim to use MAY_USE_DEVLINK
- updated cover letter regarding startup case noted by Ido
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern 37923ed6b8 netdevsim: Add simple FIB resource controller via devlink
Add devlink support to netdevsim and use it to implement a simple,
profile based resource controller. Only one controller is needed
per namespace, so the first netdevsim netdevice in a namespace
registers with devlink. If that device is deleted, the resource
settings are deleted.

The resource controller allows a user to limit the number of IPv4 and
IPv6 FIB entries and FIB rules. The resource paths are:
    /IPv4
    /IPv4/fib
    /IPv4/fib-rules
    /IPv6
    /IPv6/fib
    /IPv6/fib-rules

The IPv4 and IPv6 top level resources are unlimited in size and can not
be changed. From there, the number of FIB entries and FIB rule entries
are unlimited by default. A user can specify a limit for the fib and
fib-rules resources:

    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
    $ devlink dev reload netdevsim/netdevsim0

such that the number of rules or routes is limited (96 ipv4 routes in the
example above):
    $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done
    Error: netdevsim: Exceeded number of supported fib entries.

    $ devlink resource show netdevsim/netdevsim0
    netdevsim/netdevsim0:
      name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non
        resources:
          name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables
    ...

With this template in place for resource management, it is fairly trivial
to extend and shows one way to implement a simple counter based resource
controller typical of network profiles.

Currently, devlink only supports initial namespace. Code is in place to
adapt netdevsim to a per namespace controller once the network namespace
issues are resolved.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern 2233000cba net/ipv6: Move call_fib6_entry_notifiers up for route adds
Move call to call_fib6_entry_notifiers for new IPv6 routes to right
before the insertion into the FIB. At this point notifier handlers can
decide the fate of the new route with a clean path to delete the
potential new entry if the notifier returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:31 -04:00
David Ahern c1d7ee67ac net/ipv4: Allow notifier to fail route replace
Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
Allows a notifier handler to fail the replace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern 6635f311ea net/ipv4: Move call_fib_entry_notifiers up for new routes
Move call to call_fib_entry_notifiers for new IPv4 routes to right
before the call to fib_insert_alias. At this point the only remaining
failure path is memory allocations in fib_insert_node. Handle that
very unlikely failure with a call to call_fib_entry_notifiers to
tell drivers about it.

At this point notifier handlers can decide the fate of the new route
with a clean path to delete the potential new entry if the notifier
returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern 9776d32537 net: Move call_fib_rule_notifiers up in fib_nl_newrule
Move call_fib_rule_notifiers up in fib_nl_newrule to the point right
before the rule is inserted into the list. At this point there are no
more failure paths within the core rule code, so if the notifier
does not fail then the rule will be inserted into the list.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David Ahern c30d9356e9 net: Fix fib notifer to return errno
Notifier handlers use notifier_from_errno to convert any potential error
to an encoded format. As a consequence the other side, call_fib_notifier{s}
in this case, needs to use notifier_to_errno to return the error from
the handler back to its caller.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:10:30 -04:00
David S. Miller 6e2135ce54 mlx5-updates-2018-03-27 (Misc updates & SQ recovery)
This series contains Misc updates and cleanups for mlx5e rx path
 and SQ recovery feature for tx path.
 
 From Tariq: (RX updates)
     - Disable Striding RQ when PCI devices, striding RQ limits the use
       of CQE compression feature, which is very critical for slow PCI
       devices performance, in this change we will prefer CQE compression
       over Striding RQ only on specific "slow"  PCIe links.
     - RX path cleanups
     - Private flag to enable/disable striding RQ
 
 From Eran: (TX fast recovery)
     - TX timeout logic improvements, fast SQ recovery and TX error reporting
       if a HW error occurs while transmitting on a specific SQ, the driver will
       ignore such error and will wait for TX timeout to occur and reset all
       the rings. Instead, the current series improves the resiliency for such
       HW errors by detecting TX completions with errors, which will report them
       and perform a fast recover for the specific faulty SQ even before a TX
       timeout is detected.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJauuK/AAoJEEg/ir3gV/o+Ro4IAIBcQl4S/6jPttGDOPwElU4b
 Q8MuQbnB9k36Kf6x3GrhBqwtoS3OYBLvfyOeZsghD8uyzklhnqR/psrCt1YhLXq7
 aI5iFbuQucCUwIYBscuZegXBs+PTJYALoil05hOwSkwhEBZR53ZvdRQ9Jg0G0QiY
 sYEYAl41A1DWQOLET5lwleUpVQnVnNvIcXqNHKBKbS4f8rmLpfsZ6YAxJiiIjeDJ
 gLbsaxUyY3p77594okBk6DuFBWZsEAKwumspdSc92OEMrZSYE5ugNL+UgVEE7I8F
 JgK/4OUarw3K7xwhAg//ZhTM1iRRdJv0Sz2GPhhWRzYB0jWACc3ZG2/M/WmxFIg=
 =Uslu
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-03-27 (Misc updates & SQ recovery)

This series contains Misc updates and cleanups for mlx5e rx path
and SQ recovery feature for tx path.

From Tariq: (RX updates)
    - Disable Striding RQ when PCI devices, striding RQ limits the use
      of CQE compression feature, which is very critical for slow PCI
      devices performance, in this change we will prefer CQE compression
      over Striding RQ only on specific "slow"  PCIe links.
    - RX path cleanups
    - Private flag to enable/disable striding RQ

From Eran: (TX fast recovery)
    - TX timeout logic improvements, fast SQ recovery and TX error reporting
      if a HW error occurs while transmitting on a specific SQ, the driver will
      ignore such error and will wait for TX timeout to occur and reset all
      the rings. Instead, the current series improves the resiliency for such
      HW errors by detecting TX completions with errors, which will report them
      and perform a fast recover for the specific faulty SQ even before a TX
      timeout is detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:01:40 -04:00
David S. Miller 038d49baab Merge branch 'Introduce-net_rwsem-to-protect-net_namespace_list'
Kirill Tkhai says:

====================
Introduce net_rwsem to protect net_namespace_list

The series introduces fine grained rw_semaphore, which will be used
instead of rtnl_lock() to protect net_namespace_list.

This improves scalability and allows to do non-exclusive sleepable
iteration for_each_net(), which is enough for most cases.

scripts/get_maintainer.pl gives enormous list of people, and I add
all to CC.

Note, that this patch is independent of "Close race between
{un, }register_netdevice_notifier and pernet_operations":
https://patchwork.ozlabs.org/project/netdev/list/?series=36495

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai 152f253152 net: Remove rtnl_lock() in nf_ct_iterate_destroy()
rtnl_lock() doesn't protect net::ct::count,
and it's not needed for__nf_ct_unconfirmed_destroy()
and for nf_queue_nf_hook_drop().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai ec9c780925 ovs: Remove rtnl_lock() from ovs_exit_net()
Here we iterate for_each_net() and removes
vport from alive net to the exiting net.

ovs_net::dps are protected by ovs_mutex(),
and the others, who change it (ovs_dp_cmd_new(),
__dp_destroy()) also take it.
The same with datapath::ports list.

So, we remove rtnl_lock() here.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:54 -04:00
Kirill Tkhai 350311aab4 security: Remove rtnl_lock() in selinux_xfrm_notify_policyload()
rt_genid_bump_all() consists of ipv4 and ipv6 part.
ipv4 part is incrementing of net::ipv4::rt_genid,
and I see many places, where it's read without rtnl_lock().

ipv6 part calls __fib6_clean_all(), and it's also
called without rtnl_lock() in other places.

So, rtnl_lock() here was used to iterate net_namespace_list only,
and we can remove it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00
Kirill Tkhai 10256debb9 net: Don't take rtnl_lock() in wireless_nlevent_flush()
This function iterates over net_namespace_list and flushes
the queue for every of them. What does this rtnl_lock()
protects?! Since we may add skbs to net::wext_nlevents
without rtnl_lock(), it does not protects us about queuers.

It guarantees, two threads can't flush the queue in parallel,
that can change the order, but since skb can be queued
in any order, it doesn't matter, how many threads do this
in parallel. In case of several threads, this will be even
faster.

So, we can remove rtnl_lock() here, as it was used for
iteration over net_namespace_list only.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00
Kirill Tkhai f0b07bb151 net: Introduce net_rwsem to protect net_namespace_list
rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00
David S. Miller 906edee91e Merge branch 'net-bgmac-Couple-of-small-bgmac-changes'
Florian Fainelli says:

====================
net: bgmac: Couple of small bgmac changes

This patch series addresses two minor issues with the bgmac driver:

- provides the interface name through /proc/interrupts rather than "bgmac"
- makes sure the interrupts are masked during probe, in case the block was
  not properly reset
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 12:06:11 -04:00
Florian Fainelli 34322615cb net: bgmac: Mask interrupts during probe
We can have interrupts left enabled form e.g: the bootloader which used
the network device for network boot. Make sure we have those disabled as
early as possible to avoid spurious interrupts.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 12:06:10 -04:00