This patch adds the RSS indirection table support, allowing to use the
ethtool -x and -X options to dump and set this table.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
[Maxime: Small warning fixes, use one table per port]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
to be used is chosen according to the default rx queue that would be
used for the packet.
This default rx queue is set in the Lookup_id Table (also called
Decoding Table), and is equal to the port->first_rxq.
Since the Classifier itself isn't active at any time for the moment,
this doesn't have a direct effect, the default rx queue at the moment is
the one where all packets end-up into.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
which was specified is actually named "RSS_HASH_SEL", but isn't used by
this driver at all.
Based on how this register was used, it should have been the
RXQ2RSS_TABLE register, which allows to select the RSS table that will
be used for the incoming packet.
The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
register.
Since RSS tables are actually not used by the driver for now, this
commit does not fix a runtime bug.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch fixing a typo in one of the RSS comments.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of receive queue per port is :
- MVPP2_DEFAULT_RXQ if in single queue mode
- MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode
with MVPP2_DEFAULT_RXQ = 4.
However, we don't use the extra rx queues at the moment, we really only
need one per port per CPU, until some more advanced classification rules
are implemented.
Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a dedicated #define that indicates the number of rx queues per
port per cpu, this commit removes a harcoded use of that value
This doesn't fix any runtime bugs since the harcoded value matches the
expected value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since RSS only applies when we have per-cpu rx queues, it should only
be enabled when the driver is configured to make use of multi-queue
mode.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The multi queue mode is needed to have RSS available, and offers some
nice advantages, being able to have one rx queue vector per CPU.
This mode has been usable through the use of a module parameter, this
commit makes it the default value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 driver defines 2 "queue_modes" :
- QDIST_SINGLE_MODE, where each port share one rx queue vector
between all CPUs
- QDIST_MULTI_MODE, where each port has one rx queue vector per CPU.
Multi queue mode isn't available on PPv2.1, make sure we fallback to
single mode when running on this revision.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of the the RSS indirection tables should be defined in mvpp2.h,
so that we can use it in all files of the PPv2 driver.
This commit moves the define in mvpp2.h, and adds the missing #include
in mvpp2_cls.h.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include guards should be put before #includes. This doesn't fix any bug,
but prevent future compilation issues when adding new files in the mvpp2
driver
The Header Parser init function needs the platform_device definition,
and with the fixed include guards we need to add the missing include.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: updates 2018-07-11
please apply this first batch of qeth patches for net-next. It brings the
usual cleanups, and some performance improvements to the transmit paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the xmit of offload-eligible (ie IPv4) traffic on OSA over to the
new, copy-free path.
As with L2, we'll need to preserve the skb_orphan() behaviour of the
old code path until TX completion is sufficiently fast.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a new xmit path for L3 HiperSockets, which carves the
HW header from skb headroom instead of allocating it from the hdr cache.
It also adds NETIF_F_SG support.
The delta in qeth_l3_xmit() is all just removal of IQD-specific code and
some minor consolidation.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for future work, move the high-level xmit work into a
separate wrapper. This matches the L2 xmit code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a L3 device doesn't offer TSO, allow the stack to build full-size
GSO skbs.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some redundant EXPORTs. While at it, also move some L2-only
prototypes into the proper header file.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reshuffle the code a bit so that everything is in one place.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate duplicated code, fix the misuse of RTN_UNSPEC and simplify
the handling of non-unicast traffic on IQD devices.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing a device's address lists (or its promisc mode) already triggers
an RX modeset, there's no need to do it manually from the L2 driver's
ndo_vlan_rx_kill_vid() hook.
Also when setting a device online, dev_open() already calls
dev_set_rx_mode(). So a manual modeset is only necessary from the
recovery path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for tracing, the pointer is not used.
At the same time, accessing it from qeth_qdio_output_handler() is racy:
whenever qeth_qdio_cq_handler() gets control, its call to
qeth_qdio_handle_aob() frees the AOB.
So the AOB pointer that qeth_qdio_output_handler() stores into 'buffer'
can go stale at any time, and trigger a use-after-free.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new qeth_scrub_qdio_buffer() helper, remove an extra parameter
from qeth_clear_output_buffer(), init the bufstates.user field just once
(in qeth_flush_buffers()) and remove some noisy trace messages.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 5fa12739a5 ("net: ipv4: listify ip_rcv_finish") calling
dst_input(skb) was split-out. The ip_sublist_rcv_finish() just calls
dst_input(skb) in a loop.
The problem is that ip_sublist_rcv_finish() forgot to remove the SKB
from the list before invoking dst_input(). Further more we need to
clear skb->next as other parts of the network stack use another kind
of SKB lists for xmit_more (see dev_hard_start_xmit).
A crash occurs if e.g. dst_input() invoke ip_forward(), which calls
dst_output()/ip_output() that eventually calls __dev_queue_xmit() +
sch_direct_xmit(), and a crash occurs in validate_xmit_skb_list().
This patch only fixes the crash, but there is a huge potential for
a performance boost if we can pass an SKB-list through to ip_forward.
Fixes: 5fa12739a5 ("net: ipv4: listify ip_rcv_finish")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:
- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
by a concurrent settimeofday() or during a leap second adjustment.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The two do the same thing, but we want to have a consistent
naming in the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti says:
====================
net/sched: act_skbedit: lockless data path
the data path of act_skbedit can be faster if we avoid using spinlocks:
- patch 1 converts act_skbedit statistics to use per-cpu counters
- patch 2 lets act_skbedit use RCU to read/update its configuration
test procedure (using pktgen from https://github.com/netoptimizer):
# ip link add name eth1 type dummy
# ip link set dev eth1 up
# tc qdisc add dev eth1 clsact
# tc filter add dev eth1 egress matchall action skbedit priority c1a0:c1a0
# for c in 1 2 4 ; do
> ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1
> done
test results (avg. pps/thread)
$c | before patch | after patch | improvement
----+--------------+--------------+------------
1 | 3917464 ± 3% | 4000458 ± 3% | irrelevant
2 | 3455367 ± 4% | 3953076 ± 1% | +14%
4 | 2496594 ± 2% | 3801123 ± 3% | +52%
v2: rebased on latest net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on
act_skbedit configuration. This reduces the effects of contention in the
data path, in case multiple readers are present.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use per-CPU counters, instead of sharing a single set of stats with all
cores: this removes the need of spinlocks when stats are read/updated.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using get_seconds() for timestamps is deprecated since it can lead
to overflows on 32-bit systems. While the interface generally doesn't
overflow until year 2106, the specific implementation of the TCP PAWS
algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
overflow.
A related problem is that the local timestamps in CLOCK_REALTIME form
lead to unexpected behavior when settimeofday is called to set the system
clock backwards or forwards by more than 24 days.
While the first problem could be solved by using an overflow-safe method
of comparing the timestamps, a nicer solution is to use a monotonic
clocksource with ktime_get_seconds() that simply doesn't overflow (at
least not until 136 years after boot) and that doesn't change during
settimeofday().
To make 32-bit and 64-bit architectures behave the same way here, and
also save a few bytes in the tcp_options_received structure, I'm changing
the type to a 32-bit integer, which is now safe on all architectures.
Finally, the ts_recent_stamp field also (confusingly) gets used to store
a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
This is currently safe, but changing the type to 32-bit requires
some small changes there to keep it working.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of kzalloc/free for aead_request allocation and free, use
functions aead_request_alloc(), aead_request_free(). It ensures that
any sensitive crypto material held in crypto transforms is securely
erased from memory.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend tc tunnel_key action unit tests with geneve options. Tests
include testing single and multiple geneve options, as well as
testing geneve options that are expected to fail.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera says:
====================
be2net: small structures clean-up
The series:
- removes unused / unneccessary fields in several be2net structures
- re-order fields in some structures to eliminate holes, cache-lines
crosses
- as result reduces size of main struct be_adapter by 4kB
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union
These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit fb6113e688 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.
Fixes: fb6113e688 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 2632bafd74 ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.
Fixes: 2632bafd74 ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The late ts queue can contain a bunch of skbs while hi rate testing,
no need to check all of them if timestamp is already matched.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP
address from 192.0.2.128/28 network. However the interface is configured
as a member of 192.0.2.160/28 and there's no route directing traffic
from the former network through that interface. Correspondingly, the RP
filter on the VRF rejects it.
Therefore turn off the VRF's RP filter.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: ERSPAN: Take LACP state into consideration
Petr says:
When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is. This patch set
aims to address these problems.
Patch #1 publishes team_port_get_rcu().
Then in patch #2, a new function is introduced,
mlxsw_sp_port_dev_check(). That returns, for a given netdevice that is a
slave of a LAG device, whether that device is "txable", i.e. whether the
LAG master would send traffic through it. Since there's no good place to
put LAG-wide helpers, introduce a new header include/net/lag.h.
Finally in patch #3, fix the slave selection logic to take into
consideration whether a given slave has a carrier and whether it is
txable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is.
So instead of checking upness of the device, check carrier state and
txability.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LAG devices (team or bond) recognize for each one of their slave devices
whether LAG traffic is going to be sent through that device. Bond calls
such devices "active", team calls them "txable". When this state
changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together
with a netdev_notifier_changelowerstate_info structure that for LAG
devices includes a tx_enabled flag that refers to the new state. The
notification thus makes it possible to react to the changes in txability
in drivers.
However there's no way to query txability from the outside on demand.
That is problematic namely for mlxsw, which when resolving ERSPAN packet
path, may encounter a LAG device, and needs to determine which of the
slaves it should choose.
To that end, introduce a new function, net_lag_port_dev_txable(), which
determines whether a given slave device is "active" or
"txable" (depending on the flavor of the LAG device). That function then
dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp.
team_port_dev_txable().
Because there currently is no good place where net_lag_port_dev_txable()
should be added, introduce a new header file, lag.h, which should from
now on hold any logic common to both team and bond. (But keep
netif_is_lag_master() together with the rest of netif_is_*_master()
functions).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A follow-up patch adds a new entry point, team_port_dev_txable(). Making
it an ordinary exported function would mean that any module that may
need the service in one of the supported configurations also
unconditionally needs to pull in the team module, whether or not the
user actually intends to create team interfaces.
To prevent that, team_port_dev_txable() is defined in if_team.h, and
therefore all dependencies of that function also need to be
publicly-visible.
Therefore move team_port_get_rcu() from team.c to if_team.h.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today macvlan ignores the notification when a lower device goes
administratively down, preventing the lack of connectivity from
bubbling up.
Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN
with NO-CARRIER which should be easy to interpret in userspace.
2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
Signed-off-by: Suresh Krishnan <skrishnan@arista.com>
Signed-off-by: Travis Brown <travisb@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: make link protocol more resilient
These two commits make the link ptotocol more resilient to
infrastructures with frequent packet duplication and long delays.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In some virtual environments we observe a significant higher number of
packet reordering and delays than we have been used to traditionally.
This makes it necessary with stricter checks on incoming link protocol
messages' session number, which until now only has been validated for
RESET messages.
Since the other two message types, ACTIVATE and STATE messages also
carry this number, it is easy to extend the validation check to those
messages.
We also introduce a flag indicating if a link has a valid peer session
number or not. This eliminates the mixing of 32- and 16-bit arithmethics
we are currently using to achieve this.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>