Kirill Tkhai says:
====================
Converting pernet_operations (part #4)
this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. The patches touch mostly netfilter,
also there are small number of changes in other places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations register and unregister sysctl.
nf_conntrack_l4proto_gre4->init_net is simple memory
initializer. Also, exit method removes gre keymap_list,
which is per-net. This looks safe to be executed
in parallel with other pernet_operations.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations register and unregister
two conntrack notifiers, and they seem to be safe
to be executed in parallel.
General/not related to async pernet_operations JFI:
ctnetlink_net_exit_batch() actions are grouped in batch,
and this could look like there is synchronize_rcu()
is forgotten. But there is synchronize_rcu() on module
exit patch (in ctnetlink_exit()), so this batch may
be reworked as simple .exit method.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
#ifdef CONFIG_IP_SET
extern func(void);
#else
static inline func(void);
#endif
So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).
ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations initialize and destroy
pernet net_generic(net, fou_net_id) list.
The rest of net_generic(net, fou_net_id) accesses
may happen after netlink message, and in-tree
pernet_operations do not send FOU_GENL_NAME messages.
So, these pernet_operations are safe to be marked
as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations looks similar to dccp_v4_ops,
and they are also safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Init method just allocates memory for new cfg, and
assigns net_generic(net, caif_net_id). Despite there is
synchronize_rcu() on error path in cfcnfg_create(),
in real this function does not use global lists,
so it looks like this synchronize_rcu() is some legacy
inheritance. Exit method removes caif devices under
rtnl_lock().
There could be a problem, if someone from foreign net
pernet_operations dereference caif_net_id of this net.
It's dereferenced in get_cfcnfg() and caif_device_list().
get_cfcnfg() is used from netdevice notifiers, where
they are called under rtnl_lock(). The notifiers can't
be called from foreign nets pernet_operations. Also,
it's used from caif_disconnect_client() and from
caif_connect_client(). The both of the functions work
with caif socket, and there is the only possibility
to have a socket, when the net is dead. This may happen
only of the socket was created as kern using sk_alloc().
Grep by PF_CAIF shows we do not create kern caif sockets,
so get_cfcnfg() is safe.
caif_device_list() is used in netdevice notifiers and exit
method under rtnl lock. Also, from caif_get() used in
the netdev notifiers and in caif_flow_cb(). The last item
is skb destructor. Since there are no kernel caif sockets
nobody can send net a packet in parallel with init/exit,
so this is also safe.
So, these pernet_operations are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations call xt_proto_init() and xt_proto_fini(),
which just register and unregister /proc entries.
They are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations use nf_log_set() and nf_log_unset()
in their methods:
nf_log_bridge_net_ops
nf_log_arp_net_ops
nf_log_ipv4_net_ops
nf_log_ipv6_net_ops
nf_log_netdev_net_ops
Nobody can send such a packet to a net before it's became
registered, nobody can send a packet after all netdevices
are unregistered. So, these pernet_operations are able
to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations use ebt_register_table() and
ebt_unregister_table() to act on the tables, which
are used as argument in ebt_do_table(), called from
ebtables hooks.
Since there are no net-related bridge packets in-flight,
when the init and exit methods are called, these
pernet_operations are safe to be executed in parallel
with any other.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For tests using veth interfaces, the test infrastructure can create
the netdevs if they do not exist. Arguably this is a preferred approach
since the tests require p$N and p$(N+1) to be pairs.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a generic netlink family for NCSI. This supports three commands;
NCSI_CMD_PKG_INFO which returns information on packages and their
associated channels, NCSI_CMD_SET_INTERFACE which allows a specific
package or package/channel combination to be set as the preferred
choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds TCP_NLA_CA_STATE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports ca_state of socket, when timestamp is generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds TCP_NLA_SENDQ_SIZE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports no. of bytes present in send queue, when timestamp is
generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tc action test is used only to test mirred redirect
action. This patch extends it for mirred mirror.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LRO and RX-FCS offloads cannot be enabled at the same time since it is
not clear what should happen to the FCS of each coalesced packet.
The FCS is not really part of the TCP payload, hence cannot be merged
into one big packet. On the other hand, providing one big LRO packet
with one FCS contradicts the RX-FCS feature goal.
Use the fix features mechanism in order to prevent intersection of the
features and drop LRO in case RX-FCS is requested.
Enabling RX-FCS while LRO is enabled will result in:
$ ethtool -K ens6 rx-fcs on
Actual changes:
large-receive-offload: off [requested on]
rx-fcs: on
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corrected stats mismatch between Host Tx and its peer Rx stats
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- bump copyright years, by Sven Eckelmann
- fix macro indendation for checkpatch, by Sven Eckelmann
- fix comparison operator for bool returning functions,
by Sven Eckelmann
- assume 2-byte packet alignments for all packet types,
by Matthias Schiffer
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlqZkD8WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoaQeD/0cHQz/x6ZuXF6MDrwOf78oDvjw
AksSt2e/055Ommhzplz0Aa0hthK7Xb7+dBD+caAOxBwrXtWsWc6H5fsfeiBCTntH
PxDyS68o+iZDonquaOKz+gtILiPbUE0hG3hMizWdF95nDtsJ3rHoL3fU7Wo9eZNq
WtlwCCloyEZeGfPtpJXCo7QmGOvTLk+Qt+pkGWKHFIIXNaxa9yfqp7JCFRfyrkl3
7wb4+YaZCCUDSUiDW6SQY0rTx3bxJPfHOnrG7rB6chmGVyyssH+tjhLP52331/oI
dloVSz96pilAcKivbhLRENcIcUNuj9mHyV6rFM4/paYeDOGotoA5DCkEi3NSTYdN
p2UiDBJZfiA6M40AxmLYK5mLPWYLGpyrZNBNskemrQdlrXDLU/D6UWMUKbFhouPx
AjgX+Yk7giqXdxUjCcaFsjDBf2SxC/Xjv39qvPR0P4rW4xD/Y3xqHDJ8yLrPj7Si
M1NJv6E+gBkIQg+JuoWeQb3kvbtNQsu49XBbrYlUrdPgkJhVC6DIP5jie/TutAGz
9OU1cdDZNUVkI6+iuGP8B3Aj0Mj+zlpXHvhBa5R9duAumdt0uiqwU98k4h89yix+
GK1K9dPyW/r9qmwtemaZH8RQ6iqoxBPVZ43PoM7W/xe04IwslPJWyZ8GT64Yqekl
9m5HhMLLiJx9pklR9w==
=KJdN
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20180302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- bump copyright years, by Sven Eckelmann
- fix macro indendation for checkpatch, by Sven Eckelmann
- fix comparison operator for bool returning functions,
by Sven Eckelmann
- assume 2-byte packet alignments for all packet types,
by Matthias Schiffer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we allow the creation of 8021q devices on top of
ipvlan, but such devices are nonfunctional, as the underlying
ipvlan rx_hanlder hook can't match the relevant traffic.
Be explicit and forbid the creation of such nonfunctional devices.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes testns after test failure so that next test can
continue with clean ns
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu says:
====================
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native tunnel mode.
The first patch adds sequence number support for gre collect
metadata mode, and the second patch tests it using BPF.
RFC2890 defines GRE sequence number to be specific to the traffic
flow identified by the key. However, this patch does not implement
per-key seqno. The sequence number is shared in the same tunnel
device. That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
A new BFP uapi tunnel flag 'BPF_F_SEQ_NUMBER' is added.
--
v1->v2:
rename BPF_F_GRE_SEQ to BPF_F_SEQ_NUMBER suggested by Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds tests for GRE sequence number
support for metadata mode tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently GRE sequence number can only be used in native
tunnel mode. This patch adds sequence number support for
gre collect metadata mode. RFC2890 defines GRE sequence
number to be specific to the traffic flow identified by the
key. However, this patch does not implement per-key seqno.
The sequence number is shared in the same tunnel device.
That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan says:
====================
enic update
This series adds support for IPv6 vxlan offload and UDP rss along with a
bug fix in filling the rq ring.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.
Also increment driver version
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rq should be enabled before posting the buffers to rq desc. If not hw sees
stale value and casuses DMAR errors.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New hardware needs UDP flag set to enable UDP L4 rss hash. Add ethtool
get option to display supported rss flow hash.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some adaptors do not support vxlan offload when multi wq is configured.
If hw supports multi wq, BIT(2) is set in a1.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New adaptors supports vxlan offload for inner IPv6 and outer IPv6 vxlan
pkts.
Fw sets BIT(0) & BIT(1) in a1 if hw supports ipv6 inner & outer pkt
offload.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To compute pseudo IP header csum, we need to check the inner header for
encap pkt, not outer IP header.
Also add pseudo csum for IPv6 inner pkt.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable tx_index is being initialized with a value that is never
read and re-assigned a little later, hence the initialization is redundant
and can be removed.
Cleans up clang warning:
drivers/net/ethernet/amd/amd8111e.c:652:6: warning: Value stored to
'tx_index' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a follow up to the commit
4c45d24a75 ("r8169: switch to device-managed functions in probe")
to move towards managed resources even more.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to dereference struct rtl8169_private to get mmio_addr
in almost every function in the driver.
Replace it by using pointer to struct rtl8169_private directly.
No functional change intended.
Next step might be a conversion of RTL_Wxx() / RTL_Rxx() macros
to inline functions for sake of type checking.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in comments and error message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Virtual Interfaces are connected to an internal switch on the chip
which allows VIs attached to the same port to talk to each other even
when the port link is down. As a result, we generally want to always
report a VI's link as being "up".
Based on the original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
Export SERDES stats via ethtool -S
The mv88e6352 family has a SERDES interface which can be used for
example to connect to SFF/SFP modules. This interface has a couple of
statistics counters. Add support for including these counters in the
output of ethtool -S.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading the SERDES statistics of the mv88e8352, using
the standard ethtool -S option. The SERDES interface can be mapped to
either port 4 or 5, so only return statistics on those ports, if the
SERDES interface is in use.
The counters are reset on read, so need to be accumulated. Add a per
port structure to hold the stats counters. The 6352 only has a single
SERDES interface and so only one port will using the newly added
array. However the 6390 family has as many SERDES interfaces as ports,
each with statistics counters. Also, PTP has a number of counters per
port which will also need accumulating.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the existing code. This helper will be used for SERDES
statistics.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When gettting the number of statistics, the strings and the actual
statistics, call the SERDES ops if implemented. This means the stats
code needs to return the number of strings/stats they have placed into
the data, so that the SERDES strings/stats can follow on.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, there has been no need to hold the reg mutex while getting
the count of statistics, or the strings, because the hardware was not
accessed. When adding support for SERDES statistics, it is necessary
to access the hardware, to determine if a port is using the SERDES
interface. So add mutex lock/unlocks.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By passing the port, we allow different ports to have different
statistics. This is useful since some ports have SERDES interfaces
with their own statistic counters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a command line arg to suppress tap output. Handy in case
all the tap output is being supplied by the plugins.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net/ipv6: Add support for path selection using hash of 5-tuple
Hardware supports multipath selection using the standard L4 5-tuple
instead of just L3 and the flow label. In addition, some network
operators prefer IPv6 path selection to use the 5-tuple. To that end,
add support to IPv6 for multipath hash policy similar to
bf4e0a3db9 ("net: ipv4: add support for ECMP hash policy choice").
The default is still L3 which covers source and destination addresses
along with flow label and IPv6 protocol. This gives users a choice in
hash algorithms if they believe L3 only and the IPv6 flow label are not
sufficient for their use case.
A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use
different algorithms if desired.
The first 3 patches modify the IPv4 variant so that at the end of the
patch set the ipv4 and ipv6 implementations are direct parallels.
Patch 4 refactors the existing rt6_multipath_hash in preparation for
adding the policy option.
Patch 5 renames the existing netevent to have IPv4 in the name so ipv4
changes can be distinguished from IPv6 if the netevent handler cares.
Patch 6 adds the skb as an argument through the FIB lookup functions
to the multipath selection. Needed for the forwarding case.
Patch 7 adds the L4 hash support.
Patch 8 adds the hook for the netevent to the spectrum driver to update
the ASIC.
Patch 9 removes no longer used code.
Patch 10 adds a testcase for IPv6 multipath with L4 hash.
v3
- comments from Ido:
- removed fib_info arg in patch 1; left by mistake on rebase to net-next
- removed __get_hash_from_flowi4 declaration
- line wrap change to spectrum_router.c to maintain 80 chars
v2
- rebased to top of tree
- added refactor of fib_multipath_hash following recent change
- plumb skb through lookup functions to multipath selection
- fix sysctl setting; was missing the data set in ipv6_sysctl_net_init
- added test case
RFC to v1:
- rebase to top of net-next
- fix addr_type in hash_keys and removed flow label as noticed by Ido
- added a comment to cover letter about choice in algorithms based on
use case per Or's comments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv6 multipath test using L4 hashing. Created with inputs from
Ido Schimmel.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>