Merge the ipv4 and ipv6 nat chain type. This is the last
missing piece which allows to provide inet family support
for nat in a follow patch.
The kconfig knobs for ipv4/ipv6 nat chain are removed, the
nat chain type will be built unconditionally if NFT_NAT
expression is enabled.
Before:
text data bss dec hex filename
1576 896 0 2472 9a8 nft_chain_nat_ipv4.ko
1697 896 0 2593 a21 nft_chain_nat_ipv6.ko
After:
text data bss dec hex filename
1832 896 0 2728 aa8 nft_chain_nat.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The family specific masq modules are way too small to warrant
an extra module, just place all of them in nft_masq.
before:
text data bss dec hex filename
1001 832 0 1833 729 nft_masq.ko
766 896 0 1662 67e nft_masq_ipv4.ko
764 896 0 1660 67c nft_masq_ipv6.ko
after:
2010 960 0 2970 b9a nft_masq.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
before:
text data bss dec hex filename
990 832 0 1822 71e nft_redir.ko
697 896 0 1593 639 nft_redir_ipv4.ko
713 896 0 1609 649 nft_redir_ipv6.ko
after:
text data bss dec hex filename
1910 960 0 2870 b36 nft_redir.ko
size is reduced, all helpers from nft_redir.ko can be made static.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use struct device_attribute instead of struct idletimer_tg_attr, and
the correct callback function type to avoid indirect call mismatches
with Control Flow Integrity checking.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
CONNTRACK_LOCKS is divisor when computer array index, if it is power of
2, compiler will optimize modulo operation as bitwise AND, or else
modulo will lower performance.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.
base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.
And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.
And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.
Thanks for Eric's help to finish this patch.
Fixes: 009240940e ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Followup to a173f066c7 ("netfilter: bridge: Don't sabotage nf_hook
calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
to not drop packets for slave devices too. Currently, neighbor
solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
dropped. This patch enables IPv6 communications for bridges with an
address that are enslaved to a VRF.
Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls
skb_make_writable() to ensure sctphdr to be linearized.
So there's no need to get sctphdr by calling skb_header_pointer()
in sctp_csum_check().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The proto in struct xt_match and struct xt_target is u16, when
calling xt_check_target/match, their proto argument is u8,
and will cause truncation, it is harmless to ip packet, since
ip proto is u8
if a etable's match/target has proto that is u16, will cause
the check failure.
and convert be16 to short in bridge/netfilter/ebtables.c
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The metadata_dst does not initialize the dst_cache field, this causes
problems to ip_md_tunnel_xmit() since it cannot use this cache, hence,
Triggering a route lookup for every packet.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
TCP resets cause instant transition from established to closed state
provided the reset is in-window. Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.
Main problem for conntrack is that its a middlebox, i.e. whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).
Therefore we can't simply flag RSTs with non-exact match as invalid.
This updates RST processing as follows:
1. If the connection is in a state other than ESTABLISHED, nothing is
changed, RST is subject to normal in-window check.
2. If the RSTs sequence number either matches exactly RCV.NXT,
connection state moves to CLOSE.
3. The same applies if the RST sequence number aligns with a previous
packet in the same direction.
In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.
If the peer sends a challenge ack, connection timeout will be reset.
If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.
If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.
Packetdrill test case:
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001
// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001
// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R 2:2(0) ack 1001 win 260
// 2nd reset, but too far ahead sequence. Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260
// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260
// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501
// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260
// ACK
0.300 < . 1001:1001(0) ack 1001 win 257
// more data could be exchanged here, connection
// is still established
// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002
// Close the connection without reading outstanding data
0.700 close(4) = 0
// so one more reset. Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.
With patch, this generates following conntrack events:
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change the data type of the following variables from int to bool
across ipvs code:
- found
- loop
- need_full_dest
- need_full_svc
- payload_csum
Also change the following functions to use bool full_entry param
instead of int:
- ip_vs_genl_parse_dest()
- ip_vs_genl_parse_service()
This patch does not change any functionality but makes the source
code slightly easier to read.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The original purpose of the code I fix is to replace max_discard with
max_trim if max_trim is less than max_discard. When max_discard is 0
we should replace max_discard with max_trim as well, because
max_discard equals 0 happens only when the max_do_calc_max_discard
process is overflowed, so if mmc_can_trim(card) is true, max_discard
should be replaced by an available max_trim.
However, in the original code, there are two lines of code interfere
the right process.
1) if (max_discard && mmc_can_trim(card))
when max_discard is 0, it skips the process checking if max_discard
needs to be replaced with max_trim.
2) if (max_trim < max_discard)
the condition is false when max_discard is 0. it also skips the process
that replaces max_discard with max_trim, in fact, we should replace the
0-valued max_discard with max_trim.
Signed-off-by: Jiong Wu <Lohengrin1024@gmail.com>
Fixes: b305882fbc (mmc: core: optimize mmc_calc_max_discard)
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Add a warning about removing required architecture level set facilities
via "facilities=" command line option.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add "facilities=" command line option which allows to override
facility bits returned by stfle. The main purpose of that is debugging
aids which allows to test specific kernel behaviour depending on
specific facilities presence. It also affects CPU alternatives.
"facilities=" command line option format is comma separated list of
integer values to be additionally set or cleared (if value is starting
with "!"). Values ranges are also supported. e.g.:
facilities=!130-160,159,167-169
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Facilities list in the lowcore is initially set up by verify_facilities
from als.c and later initializations are redundant, so cleaning them up.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reuse __stfle call instead of in-place implementation. __stfle is using
memcpy and memset functions but they are safe to use, since mem.S is
built with -march=z900.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Andrii Nakryiko says:
====================
This patchset fixes a bug in btf_dedup() algorithm, which under specific
hash collision causes infinite loop. It also exposes ability to tune BTF
deduplication table size, with double purpose of allowing applications to
adjust size according to the size of BTF data, as well as allowing a simple
way to force hash collisions by setting table size to 1.
- Patch #1 fixes bug in btf_dedup testing code that's checking strings
- Patch #2 fixes pointer arg formatting in btf.h
- Patch #3 adds option to specify custom dedup table size
- Patch #4 fixes aforementioned bug in btf_dedup
- Patch #5 adds test that validates the fix
v1->v2:
- remove "Fixes" from formatting change patch
- extract roundup_pow2_max func for dedup table size
- btf_equal_struct -> btf_shallow_equal_struct
- explain in comment why we can't rely on just btf_dedup_is_equiv
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds a btf_dedup test exercising logic of STRUCT<->FWD
resolution and validating that STRUCT is not resolved to a FWD. It also
forces hash collisions, forcing both FWD and STRUCT to be candidates for
each other. Previously this condition caused infinite loop due to FWD
pointing to STRUCT and STRUCT pointing to its FWD.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When checking available canonical candidates for struct/union algorithm
utilizes btf_dedup_is_equiv to determine if candidate is suitable. This
check is not enough when candidate is corresponding FWD for that
struct/union, because according to equivalence logic they are
equivalent. When it so happens that FWD and STRUCT/UNION end in hashing
to the same bucket, it's possible to create remapping loop from FWD to
STRUCT and STRUCT to same FWD, which will cause btf_dedup() to loop
forever.
This patch fixes the issue by additionally checking that type and
canonical candidate are strictly equal (utilizing btf_equal_struct).
Fixes: d5caef5b56 ("btf: add BTF types deduplication algorithm")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Default size of dedup table (16k) is good enough for most binaries, even
typical vmlinux images. But there are cases of binaries with huge amount
of BTF types (e.g., allyesconfig variants of kernel), which benefit from
having bigger dedup table size to lower amount of unnecessary hash
collisions. Tools like pahole, thus, can tune this parameter to reach
optimal performance.
This change also serves double purpose of allowing tests to force hash
collisions to test some corner cases, used in follow up patch.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix invalid formatting of pointer arg.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
btf_dedup testing code doesn't account for length of struct btf_header
when calculating the start of a string section. This patch fixes this
problem.
Fixes: 49b57e0d01 ("tools/bpf: remove btf__get_strings() superseded by raw data API")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The "ref_type_id" variable needs to be signed for the error handling
to work.
Fixes: d5caef5b56 ("btf: add BTF types deduplication algorithm")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jakub Kicinski says:
====================
This set is next part of a quest to get rid of the bpf_load
ELF loader. It fixes some minor issues with the samples and
starts the conversion.
First patch fixes ping invocations, ping localhost defaults
to IPv6 on modern setups. Next load_sock_ops sample is removed
and users are directed towards using bpftool directly.
Patch 4 removes the use of bpf_load from samples which don't
need the auto-attachment functionality at all.
Patch 5 improves symbol counting in libbpf, it's not currently
an issue but it will be when anyone adds a symbol with a long
name. Let's make sure that person doesn't have to spend time
scratching their head and wondering why .a and .so symbol
counts don't match.
v2: - specify prog_type where possible (Andrii).
====================
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
readelf truncates its output by default to attempt to make it more
readable. This can lead to function names getting aliased if they
differ late in the string. Use --wide parameter to avoid
truncation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some samples don't really need the magic of bpf_load,
switch them to libbpf.
v2: - specify program types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For historical reasons the helper to loop over maps in an object
is called bpf_map__for_each while it really should be called
bpf_object__for_each_map. Rename and add a correctly named
define for backward compatibility.
Switch all in-tree users to the correct name (Quentin).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
bpftool can do all the things load_sock_ops used to do, and more.
Point users to bpftool instead of maintaining this sample utility.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ping localhost may default of IPv6 on modern systems, but
samples are trying to only parse IPv4. Force IPv4.
samples/bpf/tracex1_user.c doesn't interpret the packet so
we don't care which IP version will be used there.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Older GCC (<4.8) isn't smart enough to optimize !__builtin_constant_p()
branch in bpf_htons.
I recently fixed it for pkt_v4 and pkt_v6 in commit a0517a0f7e
("selftests/bpf: use __bpf_constant_htons in test_prog.c"), but
later added another bunch of bpf_htons in commit bf0f0fd939
("selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow
dissector").
Fixes: bf0f0fd939 ("selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This header defines the BPF functions enumerated in uapi/linux.bpf.h
in a callable format. Expand to include all registered functions.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
wrap bpf_stats_enabled sysctl with #ifdef
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 492ecee892 ("bpf: enable program stats")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- Fix 16b cmpxchg() operations which could erroneously fail if bits 15:8
of the old value are non-zero. In practice I'm not aware of any actual
users of 16b cmpxchg() on MIPS, but this fixes the support for it was
was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY systems
with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a non-zero
PHYS_OFFSET, correcting the memblock conversion performed in v4.20.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXHhqjBUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN3ZaAD/SFgi3dS9bSWhDhiy83llLaWiCGPb
i09uzo3rpWoKSwQBAIwLEfmaHz/sYdliKRlE13uaxYWzwaN+VHXIPjlzbYMB
=cDlO
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few more MIPS fixes:
- Fix 16b cmpxchg() operations which could erroneously fail if bits
15:8 of the old value are non-zero. In practice I'm not aware of
any actual users of 16b cmpxchg() on MIPS, but this fixes the
support for it was was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY
systems with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a
non-zero PHYS_OFFSET, correcting the memblock conversion performed
in v4.20"
* tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix memory setup for platforms with PHYS_OFFSET != 0
MIPS: BCM63XX: provide DMA masks for ethernet devices
MIPS: lantiq: pass struct device to DMA API functions
MIPS: fix truncation in __cmpxchg_small for short values
Upon setting the cmode on 6390 and 6390X, the associated serdes
interfaces must be powered off/on.
Both 6390X and 6390 share code to do so, but it currently uses the 6390
specific helper mv88e6390_serdes_power() to disable and enable the
serdes interface.
This call will fail silently on 6390X when trying so set a 10G interface
such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs
the lane number based on modes supported by the 6390, and returns 0 when
getting -ENODEV as a lane number.
Using mv88e6390x_serdes_power() should be safe here, since we explicitly
rule-out all ports but the 9 and 10, and because modes supported by 6390
ports 9 and 10 are a subset of those supported on 6390X.
This was tested on 6390X using RXAUI mode.
Fixes: 364e9d7776 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'ip' can switch network namespaces internally and then run a given
command relative to that namespace without the need to fork and exec
another ip instance. Update all references of the form:
ip netns exec "$testns" ip ...
to
ip -netns "$testns" ...
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: updates 2019-02-28
please apply one more qeth patch series for net-next. This eliminates
some of the quirks in our reset code, and slims down the internal
state machine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that qeth always uses dev_close() to shutdown the interface, we can
trust the locking and remove some custom state checks.
qeth_l?_stop_card() is no longer called for a card in UP state, so remove
the checks there too. This basically makes the UP state obsolete, so rip
out the whole thing (except for the sysfs-visible string).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes no difference whether we
1. manually disarm the HW trap and call the offline code with
recovery_mode == 1, or
2. call the offline code with recovery_mode == 0, and let it disarm the
HW trap for us.
So consolidate the two code paths in the suspend callback.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qeth-wide workqueue is now only used by a single caller to schedule
close_dev work. Just put it on a system queue instead.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recovery code already runs in a kthread, we don't have to defer the
offlining further.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
smatch complains that __qeth_l3_set_offline() first accesses card->dev,
and then later checks whether the pointer is valid.
Since commit d3d1b205e8 ("s390/qeth: allocate netdevice early"), the
pointer is _always_ valid - that patch merely missed to remove this one
check.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When resetting an interface ("recovery"), qeth currently attempts to
elide the call to dev_close(). We initially only call .ndo_close to
quiesce the data path, and then offline & online the ccwgroup device.
If the reset succeeded, a call to .ndo_open then resumes the data path
along with some internal setup (dev_addr validation, RX modeset) that
dev_open() would have usually triggered.
dev_close() only gets called (via the close_dev worker) if the reset
action fails.
It's unclear whether this was initially done due to locking concerns, or
rather to execute the reset transparently. Either way, temporarily
closing the interface without dev_close() is fragile, and means we're
susceptible to various races and unexpected behaviour. For instance:
- Bypassing dev_deactivate_many() means that the qdiscs are not set to
__QDISC_STATE_DEACTIVATED. Consequently any intermittent TX completion
can wake up the txq, resulting in calls to .ndo_start_xmit while the
data path is down. We have custom state checking to detect this case
and drop such packets.
- Because the IFF_UP flag doesn't reflect the interface's actual state
during a reset, we have custom state checking in .ndo_open and
.ndo_close to guard against invalid calls.
- Considering that the reset might take a considerable amount of time
(in particular if an IO fails and we end up waiting for its timeout), we
_do_ want NETDEV_GOING_DOWN and NETDEV_DOWN events so that components
like bonding, team, bridge, macvlan, vlan, ... can take appropriate
action.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In its attempt to run only the minimal amount of tear down steps,
qeth_l2_stop_card() fails to reset the "is dev_addr registered?" flag
in some rare scenarios. But a future change to the tear down sequence
would cause us to _always_ hit this issue, so patch it up before that
code lands.
Fix it by unconditionally clearing the flag bit. This also allows us to
remove the additional cleanup step in qeth_dev_layer2_store().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting a L2 qeth device online, enable the HW trap as soon as the
control plane is available. This allows us to catch any error that
occurs during the very first commands.
In the same spirit, the offline code should disable the HW trap as the
very first step of its processing.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The offline code uses a specific RECOVER state to indicate that the
interface should be brought up when a qeth device is set online again.
Rather than having a specific card-state for this, just put it in an
internal flag bit and set the state to DOWN. When working with the
card's state transitions, this reduces the complexity quite a bit.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch maintains u64 counters for the number of octets sent and
received. These are kept as two u32's which need to be combined. Fix
the combing, which wrongly worked on u16's.
Fixes: 80c4627b27 ("dsa: mv88x6xxx: Refactor getting a single statistic")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>