This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.
The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add tests for SK_DROP.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.
Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds an option to test the apply_bytes helper. This option lets
the user specify an int on the command line specifying how much data
each verdict should apply to.
When this is set a map entry is set with the bytes input by the user
and then the specified program --txmsg or --txmsg_redir will use the
value and set the applied data. If no other option is set then a
default --txmsg_apply program is run. This program will drop pkts
if an error is detected on the bytes map lookup. Useful to verify
the map lookup and apply helper are working and causing a hard
error if it is not.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To verify data is not being dropped or corrupted this adds an option
to verify test-patterns on recv.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To exercise TX ULP sendpage implementation we need a test that does
a sendfile. Add sendfile test option here.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sockmap option to use SK_MSG program types.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Test read and writes for BPF_PROG_TYPE_SK_MSG.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, if a bpf sk msg program is run the program
can only parse data that the (start,end) pointers already
consumed. For sendmsg hooks this is likely the first
scatterlist element. For sendpage this will be the range
(0,0) because the data is shared with userspace and by
default we want to avoid allowing userspace to modify
data while (or after) BPF verdict is being decided.
To support pulling in additional bytes for parsing use
a new helper bpf_sk_msg_pull(start, end, flags) which
works similar to cls tc logic. This helper will attempt
to point the data start pointer at 'start' bytes offest
into msg and data end pointer at 'end' bytes offset into
message.
After basic sanity checks to ensure 'start' <= 'end' and
'end' <= msg_length there are a few cases we need to
handle.
First the sendmsg hook has already copied the data from
userspace and has exclusive access to it. Therefor, it
is not necessesary to copy the data. However, it may
be required. After finding the scatterlist element with
'start' offset byte in it there are two cases. One the
range (start,end) is entirely contained in the sg element
and is already linear. All that is needed is to update the
data pointers, no allocate/copy is needed. The other case
is (start, end) crosses sg element boundaries. In this
case we allocate a block of size 'end - start' and copy
the data to linearize it.
Next sendpage hook has not copied any data in initial
state so that data pointers are (0,0). In this case we
handle it similar to the above sendmsg case except the
allocation/copy must always happen. Then when sending
the data we have possibly three memory regions that
need to be sent, (0, start - 1), (start, end), and
(end + 1, msg_length). This is required to ensure any
writes by the BPF program are correctly transmitted.
Lastly this operation will invalidate any previous
data checks so BPF programs will have to revalidate
pointers after making this BPF call.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In the case where we need a specific number of bytes before a
verdict can be assigned, even if the data spans multiple sendmsg
or sendfile calls. The BPF program may use msg_cork_bytes().
The extreme case is a user can call sendmsg repeatedly with
1-byte msg segments. Obviously, this is bad for performance but
is still valid. If the BPF program needs N bytes to validate
a header it can use msg_cork_bytes to specify N bytes and the
BPF program will not be called again until N bytes have been
accumulated. The infrastructure will attempt to coalesce data
if possible so in many cases (most my use cases at least) the
data will be in a single scatterlist element with data pointers
pointing to start/end of the element. However, this is dependent
on available memory so is not guaranteed. So BPF programs must
validate data pointer ranges, but this is the case anyways to
convince the verifier the accesses are valid.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.
To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.
Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This implements a BPF ULP layer to allow policy enforcement and
monitoring at the socket layer. In order to support this a new
program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
the sendmsg/sendpage hook. To attach the policy to sockets a
sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
Similar to previous sockmap usages when a sock is added to a
sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
program type attached then the BPF ULP layer is created on the
socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
every msg in sendmsg case and page/offset in sendpage case.
BPF_PROG_TYPE_SK_MSG Semantics/API:
BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
case and in the sendpage case leaves the data untouched. Both cases
return -EACESS to the user. Returning SK_PASS will allow the msg to
be sent.
In the sendmsg case data is copied into kernel space buffers before
running the BPF program. The kernel space buffers are stored in a
scatterlist object where each element is a kernel memory buffer.
Some effort is made to coalesce data from the sendmsg call here.
For example a sendmsg call with many one byte iov entries will
likely be pushed into a single entry. The BPF program is run with
data pointers (start/end) pointing to the first sg element.
In the sendpage case data is not copied. We opt not to copy the
data by default here, because the BPF infrastructure does not
know what bytes will be needed nor when they will be needed. So
copying all bytes may be wasteful. Because of this the initial
start/end data pointers are (0,0). Meaning no data can be read or
written. This avoids reading data that may be modified by the
user. A new helper is added later in this series if reading and
writing the data is needed. The helper call will do a copy by
default so that the page is exclusively owned by the BPF call.
The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
in the sendmsg() case and the entire page/offset in the sendpage case.
This avoids ambiguity on how to handle mixed return codes in the
sendmsg case. Again a helper is added later in the series if
a verdict needs to apply to multiple system calls and/or only
a subpart of the currently being processed message.
The helper msg_redirect_map() can be used to select the socket to
send the data on. This is used similar to existing redirect use
cases. This allows policy to redirect msgs.
Pseudo code simple example:
The basic logic to attach a program to a socket is as follows,
// load the programs
bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
&obj, &msg_prog);
// lookup the sockmap
bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");
// get fd for sockmap
map_fd_msg = bpf_map__fd(bpf_map_msg);
// attach program to sockmap
bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);
Adding sockets to the map is done in the normal way,
// Add a socket 'fd' to sockmap at location 'i'
bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);
After the above any socket attached to "my_sock_map", in this case
'fd', will run the BPF msg verdict program (msg_prog) on every
sendmsg and sendpage system call.
For a complete example see BPF selftests or sockmap samples.
Implementation notes:
It seemed the simplest, to me at least, to use a refcnt to ensure
psock is not lost across the sendmsg copy into the sg, the bpf program
running on the data in sg_data, and the final pass to the TCP stack.
Some performance testing may show a better method to do this and avoid
the refcnt cost, but for now use the simpler method.
Another item that will come after basic support is in place is
supporting MSG_MORE flag. At the moment we call sendpages even if
the MSG_MORE flag is set. An enhancement would be to collect the
pages into a larger scatterlist and pass down the stack. Notice that
bpf_tcp_sendmsg() could support this with some additional state saved
across sendmsg calls. I built the code to support this without having
to do refactoring work. Other features TBD include ZEROCOPY and the
TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series
shortly.
Future work could improve size limits on the scatterlist rings used
here. Currently, we use MAX_SKB_FRAGS simply because this was being
used already in the TLS case. Future work could extend the kernel sk
APIs to tune this depending on workload. This is a trade-off
between memory usage and throughput performance.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The current implementation of sk_alloc_sg expects scatterlist to always
start at entry 0 and complete at entry MAX_SKB_FRAGS.
Future patches will want to support starting at arbitrary offset into
scatterlist so add an additional sg_start parameters and then default
to the current values in TLS code paths.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When calling do_tcp_sendpages() from in kernel and we know the data
has no references from user side we can omit SKBTX_SHARED_FRAG flag.
This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
to omit setting SKBTX_SHARED_FRAG.
The flag is not exposed to userspace because the sendpage call from
the splice logic masks out all bits except MSG_MORE.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The sockmap refcnt up until now has been wrapped in the
sk_callback_lock(). So its not actually needed any locking of its
own. The counter itself tracks the lifetime of the psock object.
Sockets in a sockmap have a lifetime that is independent of the
map they are part of. This is possible because a single socket may
be in multiple maps. When this happens we can only release the
psock data associated with the socket when the refcnt reaches
zero. There are three possible delete sock reference decrement
paths first through the normal sockmap process, the user deletes
the socket from the map. Second the map is removed and all sockets
in the map are removed, delete path is similar to case 1. The third
case is an asyncronous socket event such as a closing the socket. The
last case handles removing sockets that are no longer available.
For completeness, although inc does not pose any problems in this
patch series, the inc case only happens when a psock is added to a
map.
Next we plan to add another socket prog type to handle policy and
monitoring on the TX path. When we do this however we will need to
keep a reference count open across the sendmsg/sendpage call and
holding the sk_callback_lock() here (on every send) seems less than
ideal, also it may sleep in cases where we hit memory pressure.
Instead of dealing with these issues in some clever way simply make
the reference counting a refcnt_t type and do proper atomic ops.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The TLS ULP module builds scatterlists from a sock using
page_frag_refill(). This is going to be useful for other ULPs
so move it into sock file for more general use.
In the process remove useless goto at end of while loop.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-03-19
This series contains updates to i40e and i40evf only.
Alex fixes a potential deadlock in the configure_clsflower function in
i40evf, where we exit with the "IN_CRITICAL_TASK" bit set while
notifying the PF of flower filters.
Jan fixed an issue where it was possible to set a mode that is not
allowed which resulted in link being down, so fixed the parity between
i40e_set_link_ksettings() and i40e_get_link_ksettings().
Patryk fixes a bug where a backplane device was allowing the setting of
link settings, which is not allowed.
Shiraz fixes a crash when entering S3 because the client interface was
freeing the MSIx vectors while they are still in use.
Jake fixes up a function header comment to document a newly added
parameter. Also cleaned up flags that were never used.
Doug fixes the incorrect return type for i40e_aq_add_cloud_filters().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the
PF Reset path when Global Reset is in progress. While the driver
is polling for the end of the PF Reset and the Global Reset is
triggered, abandon the PF Reset path and prepare for the
upcoming Global Reset.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
These flags were defined, but there is no use within the driver code, so
we don't need to keep them.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix return types from i40e_status to enum i40e_status_code.
Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A recent patch updated the signature for i40e_aq_set_switch_config() to
add a new parameter 'mode'. It forgot to document the parameter in the
doxygen function header comment. Add the parameter to the function
description now.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
During suspend client MSIx vectors are freed while they are still
in use causing a crash on entering S3.
Fix this calling client close before freeing up its MSIx vectors.
Also update the client MSIx vectors on resume before client
open is called.
Fixes commit b980c0634f ("i40e: shutdown all IRQs and disable MSI-X
when suspended")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e_set_link_ksettings and i40e_get_link_ksettings use different
codepaths to check available and supported advertisement modes. This
creates scenarios where it's possible to set a mode that's not allowed,
resulting in a link down.
Fix setting advertisement in i40e_set_link_ksettings by calling
i40e_get_link_ksettings to check what modes are allowed.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While doing some code review I noticed that we can get into a state where
we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of
flower filters. This patch is meant to address that plus tweak the ordering
of the while loop waiting on it slightly so that we don't wait an extra
period after we have failed for the last time.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reported-by: David Ahern <dsahern@gmail.com>
Fixes: 1fad59ea1c ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
Automatic PHY interrupts
Now that the mv88e6xxx driver either installs in interrupt handler, or
polls for interrupts, it is possible to always handle PHY interrupts,
rather than have phylib perform the polling. This speeds up detection
of link changes and reduces the load on the MDIO bus, which is
beneficial for PTP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When registering an MDIO bus, it is possible to pass an array of
interrupts, one per address on the bus. phylib will then associate the
interrupt to the PHY device, if no other interrupt is provided.
Some of the global2 interrupts are PHY interrupts. Place them into the
MDIO bus structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add to the info structure the number of internal PHYs, if they generate
interrupts. Some of the older generations of switches have internal
PHYs, but no interrupt registers. In this case, set the count to zero.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the recent change to polling for interrupts, it is important that
the number of global 1 interrupts is listed. Without it, the driver
requests an interrupt domain for zero interrupts, which returns
EINVAL, and the probe fails.
Add two missing entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can hit the register lock not held assertion with the following path:
[ 34.170631] mv88e6085 0.1:00: Switch registers lock not held!
[ 34.176510] CPU: 0 PID: 950 Comm: ethtool Not tainted 4.16.0-rc4 #143
[ 34.182985] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[ 34.189519] Backtrace:
[ 34.192033] [<8010c4b4>] (dump_backtrace) from [<8010c788>] (show_stack+0x20/0x24)
[ 34.199680] r6:9f5dc010 r5:00000011 r4:9f5dc010 r3:00000000
[ 34.205434] [<8010c768>] (show_stack) from [<80679d38>] (dump_stack+0x24/0x28)
[ 34.212719] [<80679d14>] (dump_stack) from [<804844a8>] (mv88e6xxx_read+0x70/0x7c)
[ 34.220376] [<80484438>] (mv88e6xxx_read) from [<804870dc>] (mv88e6xxx_port_get_cmode+0x34/0x4c)
[ 34.229257] r5:a09cd128 r4:9ee31d07
[ 34.232880] [<804870a8>] (mv88e6xxx_port_get_cmode) from [<80487e6c>] (mv88e6352_port_has_serdes+0x24/0x64)
[ 34.242690] r4:9f5dc010
[ 34.245309] [<80487e48>] (mv88e6352_port_has_serdes) from [<804880b8>] (mv88e6352_serdes_get_stats+0x28/0x12c)
[ 34.255389] r4:00000001
[ 34.257973] [<80488090>] (mv88e6352_serdes_get_stats) from [<804811e8>] (mv88e6xxx_get_ethtool_stats+0xb0/0xc0)
[ 34.268156] r10:00000000 r9:00000000 r8:00000000 r7:a09cd020 r6:00000001 r5:9f5dc01c
[ 34.276052] r4:9f5dc010
[ 34.278631] [<80481138>] (mv88e6xxx_get_ethtool_stats) from [<8064f740>] (dsa_slave_get_ethtool_stats+0xbc/0xc4)
mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
chip->info->ops->stats_get_stats(), which holds the register lock, and
chip->info->ops->serdes_get_stats() which does not. Have
chip->info->ops->serdes_get_stats() be running with the register lock held to
avoid such assertions.
Fixes: 436fe17d27 ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle polled interrupts correctly when loading the module.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 294d711ee8 ("net: dsa: mv88e6xxx: Poll when no interrupt defined")
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio says:
====================
selftests: pmtu: Add further vti/vti6 MTU and PMTU tests
Patches 5/10 to 10/10 add tests to verify default MTU assignment
for vti4 and vti6 interfaces, to check that MTU values set on new
link and link changes are properly taken and validated, and to
verify PMTU exceptions on vti4 interfaces.
Patch 1/10 reverses function return codes as suggested by David
Ahern.
Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
passed namespace.
Patches 3/10 and 4/10 are preparation work to make it easier to
introduce those tests.
v2: Reverse return codes, and make output prettier in 4/9 by
using padded printf, test descriptions and buffered error
strings. Remove accidental output to /dev/kmsg from 10/10
(was 9/9).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This test checks that MTU configured from userspace is used on
link creation and changes, and that when it's not passed from
userspace, it's calculated properly from the MTU of the lower
layer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as pmtu_vti4_link_add_mtu test, but for IPv6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test checks that MTU given on vti link creation is actually
configured, and that tunnel is not created with an invalid MTU
value.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test checks that PMTU exceptions are created only when
needed on IPv4 routes with vti and xfrm, and their PMTU value is
checked as well.
We can't adopt the same approach as test_pmtu_vti6_exception()
here, because on IPv4 administrative MTU changes won't be
reflected directly on PMTU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as pmtu_vti4_default_mtu, but on IPv6 with vti6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test checks that the MTU assigned by default to a vti (IPv4)
interface created on top of veth is simply veth's MTU minus the
length of the encapsulated IPv4 header.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce list of tests and their descriptions, and loop on it
in main body.
Tests will now just take care of calling setup with a list of
"units" they need, and return 0 on success, 1 on failure, 2 if
the test had to be skipped.
Main script body will take care of displaying results and
cleaning up after every test. Introduce guard variable so that
we don't clean up twice in case of interrupts or unexpected
failures.
The pmtu_vti6_exception test can now run its third step even if
the previous one failed, as we can return values from it.
Also introduce support to display test descriptions, and display
aligned OK/FAIL/SKIP test outcomes. Buffer error strings so that
in case of failure we can display them right under the outcome
for each test.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
...so that it can be used for any iproute command output.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 7af137b72131 ("selftests: net: Introduce first PMTU test") I
accidentally assumed route_get_* helpers would run from a single
namespace. Make them a bit more generic, by passing the
namespace command prefix as a parameter instead.
Fixes: 7af137b72131 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David suggests it's more intuitive to return non-zero on
failures, and zero on success.
No need to introduce tail 'return 0' in functions, they will
return the exit code of the last command anyway.
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon says:
====================
ibmvnic: Update TX pool and TX routines
This patch restructures the TX pool data structure and provides a
separate TX pool array for TSO transmissions. This is already used
in some way due to our unique DMA situation, namely that we cannot
use single DMA mappings for packet data. Previously, both buffer
arrays used the same pool entry. This restructuring allows for
some additional cleanup in the driver code, especially in some
places in the device transmit routine.
In addition, it allows us to more easily track the consumer
and producer indexes of a particular pool. This has been
further improved by better tracking of in-use buffers to
prevent possible data corruption in case an invalid buffer
entry is used.
v5: Fix bisectability mistake in the first patch. Removed
TSO-specific data in a later patch when it is no longer used.
v4: Fix error in 7th patch that causes an oops by using
the older fixed value for number of buffers instead
of the respective field in the tx pool data structure
v3: Forgot to update TX pool cleaning function to handle new data
structures. Included 7th patch for that.
v2: Fix typo in 3/6 commit subject line
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally, remove the TSO-specific fields in the TX pool
strcutures. These are no longer needed with the introduction
of separate buffer pools for TSO transmissions.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update routine that cleans up any outstanding transmits that
have not received completions when the device needs to close.
Introduces a helper function that cleans one TX pool to make
code more readable.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve TX pool buffer accounting to prevent the producer
index from overruning the consumer. First, set the next free
index to an invalid value if it is in use. If next buffer
to be consumed is in use, drop the packet.
Finally, if the transmit fails for some other reason, roll
back the consumer index and set the free map entry to its original
value. This should also be done if the DMA map fails.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TX and TX completion routines to account for TX pool
restructuring. TX routine first chooses the pool depending
on whether a packet is GSO or not, then uses it accordingly.
For the completion routine to know which pool it needs to use,
set the most significant bit of the correlator index to one
if the packet uses the TSO pool. On completion, unset the bit
and use the correlator index to release the buffer pool entry.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>