Use the new fail-safe channels switch mechanism to set new ethtool
settings:
- ring parameters
- coalesce parameters
- tx copy break parameters
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
A fail safe helper functions that allows switching to new channels on the
fly, In simple words:
make_new_config(new_params)
{
new_channels = open_channels(new_params);
if (!new_channels)
return "Failed, but current channels are still active :)"
switch_channels(new_channels);
return "SUCCESS";
}
Demonstrate mlx5e_switch_priv_channels usage in set channels ethtool
callback and make it fail-safe using the new switch channels mechanism.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
mlx5e_redirect_rqts_to_{channels,drop} and mlx5e_{add,del}_sqs_fwd_rules
and Set real num tx/rx queues belong to
mlx5e_{activate,deactivate}_priv_channels, for that we move those functions
and minimize mlx5e_open/close flows.
This will be needed in downstream patches to replace old channels with new
ones without the need to call mlx5e_close/open.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Remove mlx5e_priv pointer from CQ and RQ structs,
it was needed only to access mdev pointer from priv pointer.
Instead we now pass mdev where needed.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
In order to have a clean separation between channels resources creation
flows and current active mlx5e netdev parameters, make sure each
resource creation function do not access priv->params, and only works
with on a new fresh set of parameters.
For this we add "new" mlx5e_params field to mlx5e_channels structure
and use it down the road to mlx5e_open_{cq,rq,sq} and so on.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
As a foundation for safe config flow, a simple clear API such as
(Open then Activate) where the "Open" handles the heavy unsafe
creation operation and the "activate" will be fast and fail safe,
to enable the newly created channels.
For this we split the RQs/TXQ SQs and channels open/close flows to
open => activate, deactivate => close.
This will simplify the ability to have fail safe configuration changes
in downstream patches as follows:
make_new_config(new_params)
{
old_channels = current_active_channels;
new_channels = create_channels(new_params);
if (!new_channels)
return "Failed, but current channels still active :)"
deactivate_channels(old_channels); /* Can't fail */
activate_channels(new_channels); /* Can't fail */
close_channels(old_channels);
current_active_channels = new_channels;
return "SUCCESS";
}
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Rename mlx5e_refresh_tirs_self_loopback to mlx5e_refresh_tirs,
as it will be used in downstream (Safe config flow) patches, and make it
fail safe on mlx5e_open.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
RQ Tables are always created once (on netdev creation) pointing to drop RQ
and at that stage, RQ tables (indirection tables) are always directed to
drop RQ.
We don't need to use mlx5e_fill_{direct,indir}_rqt_rqns to fill the drop
RQ in create RQT procedure.
Instead of having separate flows to redirect direct and indirect RQ Tables
to the current active channels Receive Queues (RQs), we unify the two
flows by introducing mlx5e_redirect_rqt function and redirect_rqt_param
struct. Combined, they provide one generic logic to fill the RQ table RQ
numbers regardless of the RQ table purpose (direct/indirect).
Demonstrated the usage with mlx5e_redirect_rqts_to_channels which will
be called on mlx5e_open and with mlx5e_redirect_rqts_to_drop which will
be called on mlx5e_close.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Have a dedicated "channels" handler that will serve as channels
(RQs/SQs/etc..) holder to help with separating channels/parameters
operations, for the downstream fail-safe configuration flow, where we will
create a new instance of mlx5e_channels with the new requested parameters
and switch to the new channels on the fly.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
To simplify mlx5e_open_locked flow we set netdev->rx_cpu_rmap on netdev
creation rather on netdev open, it is redundant to set it every time on
mlx5e_open_locked.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Instead of iterating over the channel SQs to set their max rate, do it
on SQ creation per TXQ SQ.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
K. Y. Srinivasan says:
====================
netvsc: Fix miscellaneous issues
Fix miscellaneous issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All netvsc channels are handled via NAPI. Setup the "read mode" correctly
for the netvsc sub-channels.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Bonn says:
====================
GTP SGSN-side tunnel
Changes since v4:
* Respin the series on top of net-next; the conflicts were trivial,
amounting to just code having been shifted about
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP
contexts based on the incoming packets _destination_ address. If we
want to place ourselves on the SGSN side of the tunnel, then we want
to be identifying PDP contexts based on _source_ address.
Let it be noted that in a "real" configuration this module would never
be used: the SGSN normally does not see IP packets as input. The
justification for this functionality is for PGW load-testing applications
where the input to the SGSN is locally generally IP traffic.
This patch adds a "role" argument at GTP-link creation time to specify
whether we are on the GGSN or SGSN side of the tunnel; this flag is then
used to determine which part of the IP packet to use in determining
the PDP context.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a mostly cosmetic rename of the SGSN netlink attribute to
the GTP link. The justification for this is that we will be making
the module support decapsulation of "downstream" SGSN packets, in
which case the netlink parameter actually refers to the upstream GGSN
peer. Renaming the parameter makes the relationship clearer.
The legacy name is maintained as a define in the header file in order
to not break existing code.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniele Palmas says:
====================
net: usb: qmi_wwan: add qmap mux protocol support
This patch adds support for qmap mux protocol available in recent
Qualcomm based modems.
The qmap mux protocol can be used for multiplexing data packets in
order to have multiple ip streams through the same physical device.
Two new sysfs files are added for adding/removing the qmap mux based
interfaces (named qmimux):
/sys/class/net/<iface>/qmi/add_mux
/sys/class/net/<iface>/qmi/del_mux
Main patch author is Bjørn Mork <bjorn@mork.no>
An userspace implementation of the qmi requests needed to support
multiple ip streams is already available (namely libqmi since
version 1.18.0).
The qmap mux feature has been recently implemented in Codeaurora
gobinet out-of-kernel driver that was the inspiration for this
development.
Tests have been performed with Telit LE922A6 (PID 0x1040)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the documentation related to the new files added for
qmap mux support.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for qmap mux protocol available in recent
Qualcomm based modems.
The qmap mux protocol can be used for multiplexing data packets in
order to have multiple ip streams through the same physical device.
Two new sysfs files are added for adding/removing the qmap mux based
interfaces (named qmimux):
- /sys/class/net/<iface>/qmi/add_mux
- /sys/class/net/<iface>/qmi/del_mux
Main patch author is Bjørn Mork <bjorn@mork.no>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the return allocated index and err value are multiplexed.
This patch changes the API to decouple the ret value from the allocated
index.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
Add busy poll support for epoll
This patch set adds support for using busy polling with epoll. The main
idea behind this is that we record the NAPI ID for the last event that is
moved onto the ready list for the epoll context and then when we no longer
have any events on the ready list we begin polling with that ID. If the
busy polling does not yield any events then we will reset the NAPI ID to 0
and wait until a new event is added to the ready list with a valid NAPI ID
before we will resume busy polling.
Most of the changes in this set authored by me are meant to be cleanup or
fixes for various things. For example, I am trying to make it so that we
don't perform hash look-ups for the NAPI instance when we are only working
with sender_cpu and the like.
At the heart of this set is the last 3 patches which enable epoll support
and add support for obtaining the NAPI ID of a given socket. With these it
becomes possible for an application to make use of epoll and get optimal
busy poll utilization by stacking multiple sockets with the same NAPI ID on
the same epoll context.
v1: The first version of this series only allowed epoll to busy poll if all
of the sockets with a NAPI ID shared the same NAPI ID. I feel we were
too strict with this requirement, so I changed the behavior for v2.
v2: The second version was pretty much a full rewrite of the first set. The
main changes consisted of pulling apart several patches to better
address the need to clean up a few items and to make the code easier to
review. In the set however I went a bit overboard and was trying to fix
an issue that would only occur with 500+ years of uptime, and in the
process limited the range for busy_poll/busy_read unnecessarily.
v3: Split off the code for limiting busy_poll and busy_read into a separate
patch for net.
Updated patch that changed busy loop time tracking so that it uses
"local_clock() >> 10" as we originally did.
Tweaked "Change return type.." patch by moving declaration of "work"
inside the loop where is was accessed and always reset to 0.
Added "Acked-by" for patches that received acks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This socket option returns the NAPI ID associated with the queue on which
the last frame is received. This information can be used by the apps to
split the incoming flows among the threads based on the Rx queue on which
they are received.
If the NAPI ID actually represents a sender_cpu then the value is ignored
and 0 is returned.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds busy poll support to epoll. The implementation is meant to
be opportunistic in that it will take the NAPI ID from the last socket
that is added to the ready list that contains a valid NAPI ID and it will
use that for busy polling until the ready list goes empty. Once the ready
list goes empty the NAPI ID is reset and busy polling is disabled until a
new socket is added to the ready list.
In addition when we insert a new socket into the epoll we record the NAPI
ID and assume we are going to receive events on it. If that doesn't occur
it will be evicted as the active NAPI ID and we will resume normal
behavior.
An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
options to spread the incoming connections to specific worker threads
based on the incoming queue. This enables epoll for each worker thread
to have only sockets that receive packets from a single queue. So when an
application calls epoll_wait() and there are no events available to report,
busy polling is done on the associated queue to pull the packets.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the core functionality in sk_busy_loop() to napi_busy_loop() and
make it independent of sk.
This enables re-using this function in epoll busy loop implementation.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch flips the logic we were using to determine if the busy polling
has timed out. The main motivation for this is that we will need to
support two different possible timeout values in the future and by
recording the start time rather than when we would want to end we can focus
on making the end_time specific to the task be it epoll or socket based
polling.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
checking the return value of sk_busy_loop. As there are only a few
consumers of that data, and the data being checked for can be replaced
with a check for !skb_queue_empty() we might as well just pull the code
out of sk_busy_loop and place it in the spots that actually need it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of defining two versions of skb_mark_napi_id I think it is more
readable to just match the format of the sk_mark_napi_id functions and just
wrap the contents of the function instead of defining two versions of the
function. This way we can save a few lines of code since we only need 2 of
the ifdef/endif but needed 5 for the extra function declaration.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on some recent busy poll changes we found that child sockets
were being instantiated without NAPI ID being set. In our first attempt to
fix it, it was suggested that we should just pull programming the NAPI ID
into the function itself since all callers will need to have it set.
In addition to the NAPI ID change I have dropped the code that was
populating the Rx hash since it was actually being populated in
tcp_get_cookie_sock.
Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup/fix for NAPI IDs following the changes that made it
so that sender_cpu and napi_id were doing a better job of sharing the same
location in the sk_buff.
One issue I found is that we weren't validating the napi_id as being valid
before we started trying to setup the busy polling. This change corrects
that by using the MIN_NAPI_ID value that is now used in both allocating the
NAPI IDs, as well as validating them.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox mlx5e XDP performance optimization
This series provides some preformancee optimizations for mlx5e
driver, especially for XDP TX flows.
1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
which shows a huge gain for both RX and TX packet rates.
2nd patch removes write combining logic from the driver TX handler
and simplifies the TX logic while improving TX CPU utilization.
All other patches combined provide some refactoring to the driver TX
flows to allow some significant XDP TX improvements.
More details and performance numbers per patch can be found in each patch
commit message compared to the preceding patch.
Overall performance improvemnets
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Test case Baseline Now improvement
---------------------------------------------------------------
TX packets (24 threads) 45Mpps 54Mpps 20%
TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5%
XDP Drop (1 core) 14Mpps 16.9Mpps 20%
XDP TX (1 core) 10.4Mpps 13.7Mpps 31%
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different SQ types (tx, xdp, ico) are growing apart, we separate them
and remove unwanted parts in each one of them, to simplify data path and
utilize data cache.
Remove DB union from SQ structures since it is not needed anymore as we
now have different SQ data type for each SQ.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patches we will introduce different SQ types,
and we would want to reuse those functions, in this patch we make them
agnostic to SQ type (txq, xdp, ico).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename mlx5e_{create,destroy}_{sq,rq,cq} to
mlx5e_{alloc,free}_{sq,rq,cq}.
Rename mlx5e_{enable,disable}_{sq,rq,cq} to
mlx5e_{create,destroy}_{sq,rq,cq}.
mlx5e_{enable,disable}_{sq,rq,cq} used to actually create/destroy the SQ
in FW, so we rename them to align the functions names with FW semantics.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patches we will introduce different SQ types, for that we here
generalize some TX helper functions to work with more basic SQ parameters,
in order to re-use them for the different SQ types.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP SQ has a fixed size WQE (MLX5E_XDP_TX_WQEBBS = 1) and only posts
one kind of WQE (MLX5_OPCODE_SEND),
Also we initialize SQ descriptors static fields once on open_xdpsq,
rather than every time on critical path.
Optimize the code in light of those facts and add a prefetch of the TX
descriptor first thing in the xdp xmit function.
Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Test case Before Now improvement
---------------------------------------------------------------
XDP TX (1 core) 13Mpps 13.7Mpps 5%
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle XDP TX completions before handling RX packets, to make sure more
free space is available for XDP TX packets a moment before handling
RX packets.
Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Test case Before Now improvement
---------------------------------------------------------------
XDP Drop (1 core) 16.9Mpps 16.9Mpps No change
XDP TX (1 core) 12Mpps 13Mpps 8%
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To save many rq->channel->sq dereferences in fast-path.
And rename it to xdpsq.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move struct mlx5e_rq and friends to appear after mlx5e_sq declaration in
en.h.
We will need this for next patch to move the mlx5e_sq instance into
mlx5e_rq struct for XDP SQs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP code belongs to RX path, move mlx5e_poll_xdp_tx_cq and
mlx5e_free_xdp_tx_descs to en_rx.c.
Rename them to mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One is sufficient since Blue Flame is not supported anymore.
This will also come in handy for switchdev mode to save resources, since
VF representors will use same single UAR as well for their own SQs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5e netdev Blue Flame (write combining) support demands a lot of
overhead for a little latency gain for some special cases, this overhead
is hurting the common case.
Here we remove xmit Blue Flame support by creating all bfregs with no
write combining for all SQs, and we remove a lot of BF logic and
conditions from xmit data path.
Simplify mlx5e_tx_notify_hw (doorbell function) by removing BF related
code and by removing one memory barrier needed for WC mapped SQ doorbell
buffers, which no longer exist.
Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Test case Before Now improvement
---------------------------------------------------------------
TX packets (24 threads) 50Mpps 54Mpps 8%
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on
some architectures), this should help improve the performance on such
CPU archs where dma_rmb is optimized.
Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Test case Baseline Now improvement
---------------------------------------------------------------
TX packets (24 threads) 45Mpps 50Mpps 11%
TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5%
XDP Drop (1 core) 14Mpps 16.9Mpps 20%
XDP TX (1 core) 10.4Mpps 12Mpps 15%
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcm_sf2 does require the MDIO_BCM_UNIMAC driver which is now dependent
on OF_MDIO but also internally uses of_mdio.c provided routines which
are guarted with OF_MDIO.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun says:
====================
Performances improvement for IPv6 Segment Routing
This patch series improves the performances of IPv6 SR by optimizing skb head
reallocation and extending the use of dst_cache. The overall performances improve
by 35%.
Before patch series (SRH encap):
Result: OK: 7348320(c7347271+d1048) usec, 5000000 (1000byte,0frags)
680427pps 5443Mb/sec (5443416000bps) errors: 0
After patch series (SRH encap):
Result: OK: 4774543(c4774084+d459) usec, 5000000 (1000byte,0frags)
1047220pps 8377Mb/sec (8377760000bps) errors: 0
Baseline for plain IPv6 forwarding:
Result: OK: 4244144(c4243722+d422) usec, 5000000 (1000byte,0frags)
1178093pps 9424Mb/sec (9424744000bps) errors: 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We already use dst_cache in seg6_output, when handling locally generated
packets. We extend it in seg6_input, to also handle forwarded packets, and avoid
unnecessary fib lookups.
Performances for SRH encapsulation before the patch:
Result: OK: 5656067(c5655678+d388) usec, 5000000 (1000byte,0frags)
884006pps 7072Mb/sec (7072048000bps) errors: 0
Performances after the patch:
Result: OK: 4774543(c4774084+d459) usec, 5000000 (1000byte,0frags)
1047220pps 8377Mb/sec (8377760000bps) errors: 0
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
To insert or encapsulate a packet with an SRH, we need a large enough skb
headroom. Currently, we are using pskb_expand_head to inconditionally increase
the size of the headroom by the amount needed by the SRH (and IPv6 header).
If this reallocation is performed by another CPU than the one that initially
allocated the skb, then when the initial CPU kfree the skb, it will enter the
__slab_free slowpath, impacting performances.
This patch replaces pskb_expand_head with skb_cow_head, that will reallocate the
skb head only if the headroom is not large enough.
Performances for SRH encapsulation before the patch:
Result: OK: 7348320(c7347271+d1048) usec, 5000000 (1000byte,0frags)
680427pps 5443Mb/sec (5443416000bps) errors: 0
Performances after the patch:
Result: OK: 5656067(c5655678+d388) usec, 5000000 (1000byte,0frags)
884006pps 7072Mb/sec (7072048000bps) errors: 0
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Query resources from firmware
Ido says:
Some parts of the driver already use the resource query mechanism, but
in other parts we still rely on hard coded values that may change over
time.
This patchset removes most of these remaining values and queries them
from the firmware instead.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in the previous patch, the cell size may change in future
devices, so query it from the firmware instead of hard coding it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>