When KASAN is enabled, after several rds connections are
created, then "rmmod rds_rdma" is run. The following will
appear.
"
BUG rds_ib_incoming (Not tainted): Objects remaining
in rds_ib_incoming on __kmem_cache_shutdown()
Call Trace:
dump_stack+0x71/0xab
slab_err+0xad/0xd0
__kmem_cache_shutdown+0x17d/0x370
shutdown_cache+0x17/0x130
kmem_cache_destroy+0x1df/0x210
rds_ib_recv_exit+0x11/0x20 [rds_rdma]
rds_ib_exit+0x7a/0x90 [rds_rdma]
__x64_sys_delete_module+0x224/0x2c0
? __ia32_sys_delete_module+0x2c0/0x2c0
do_syscall_64+0x73/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
"
This is rds connection memory leak. The root cause is:
When "rmmod rds_rdma" is run, rds_ib_remove_one will call
rds_ib_dev_shutdown to drop the rds connections.
rds_ib_dev_shutdown will call rds_conn_drop to drop rds
connections as below.
"
rds_conn_path_drop(&conn->c_path[0], false);
"
In the above, destroy is set to false.
void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
{
atomic_set(&cp->cp_state, RDS_CONN_ERROR);
rcu_read_lock();
if (!destroy && rds_destroy_pending(cp->cp_conn)) {
rcu_read_unlock();
return;
}
queue_work(rds_wq, &cp->cp_down_w);
rcu_read_unlock();
}
In the above function, destroy is set to false. rds_destroy_pending
is called. This does not move rds connections to ib_nodev_conns.
So destroy is set to true to move rds connections to ib_nodev_conns.
In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
is called to shutdown rds connections finally.
Then rds_ib_recv_exit is called to destroy slab.
void rds_ib_recv_exit(void)
{
kmem_cache_destroy(rds_ib_incoming_slab);
kmem_cache_destroy(rds_ib_frag_slab);
}
The above slab memory leak will not occur again.
>From tests,
256 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma
real 0m16.522s
user 0m0.000s
sys 0m8.152s
512 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma
real 0m32.054s
user 0m0.000s
sys 0m15.568s
To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
And with 512 rds connections, about 32 seconds are needed.
>From ftrace, when one rds connection is destroyed,
"
19) | rds_conn_destroy [rds]() {
19) 7.782 us | rds_conn_path_drop [rds]();
15) | rds_shutdown_worker [rds]() {
15) | rds_conn_shutdown [rds]() {
15) 1.651 us | rds_send_path_reset [rds]();
15) 7.195 us | }
15) + 11.434 us | }
19) 2.285 us | rds_cong_remove_conn [rds]();
19) * 24062.76 us | }
"
So if many rds connections will be destroyed, this function
rds_ib_destroy_nodev_conns uses most of time.
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.
It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.
c1:
(rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the topo:
h1 ---| rp1 |
| route rp3 |--- h3 (192.168.200.1)
h2 ---| rp2 |
If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
h2, and the packets can still be forwared.
This issue was caused by the input route cache. It should only do
the cache for either bc forwarding or local delivery. Otherwise,
local delivery can use the route cache for bc forwarding of other
interfaces.
This patch is to fix it by not doing cache for local delivery if
all.bc_forwarding is enabled.
Note that we don't fix it by checking route cache local flag after
rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
common route code shouldn't be touched for bc_forwarding.
Fixes: 5cbf777cfd ("route: add support for directed broadcast forwarding")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This Kselftest update for Linux 5.2-rc4 consists of
- Alex Shi's fixes to cgroup tests
- Alakesh Haloi's fix to userfaultfd compiler warning
- Naresh Kamboju's fix to vm install to include test script to run
the test.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAlz36zAACgkQCwJExA0N
QxyEVxAAhvjL+l9QQpexV91Cl6nu5OIhElsqobbAzQjcKCE5SENVhHTh5rJCuX1W
yWyl5CdS6XnUWxT2mxsSCJxSBQUFBqsIZy4qLzSORFi2zYL11ICJU/6kJ7WGMdhp
tASpD4txNzjzKecslCgTXBupyEaGAnDmzE4YeNy09AUbpFcYPr++wqlgc58C9u6k
uGsc0YVQhsxuK+1gCtoOHjxIiY5RXBegb6v2krpQFMhR/AOrnhunMyhpLSTxrTnq
jEaOvKI/ug4i4VZUdrd0Z0dcZMbE1Iqyvc7BgIip93VQjvgZGpL+0gRolTCboLmc
ufk6GuDw++jeU/AGMp7qva2x0L0JTVuGzQQ28HUOcwugBFd5bI8LtgwDCI+gQIB8
haNS6uWnWjl2ezmdXqChz0nA9J/RcsBeYPxhVqW9s6Rj2CYqLAmkL3pW063/7ayG
dBwBPad0JSSZiMkf8R26pbJEcaCJFgpINYobnQ3/SbBQ36ripE0lzwKiyOXp4y+v
BHcrtiNOSwaFUiy+kd9/0og1uvSebdgtRCqvkukIMJpMPIJlC0m7t4X21fzwAYH4
K6euOYUABvy81EBlhkJKwwQhukoL3OSVuayumtHvzI1lXxh/fXFtQdYdVxMpRs0H
IjTdw+mIpwrIiGI4fUyX6fkkDnopYapzU/ETl7aLCaSAGzC07oc=
=9KtO
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fixes to cgroup tests (Alex Shi)
- fix to userfaultfd compiler warning (Alakesh Haloi)
- fix to vm install to include test script to run the test (Naresh
Kamboju)
* tag 'linux-kselftest-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: vm: install test_vmalloc.sh for run_vmtests
userfaultfd: selftest: fix compiler warning
kselftest/cgroup: fix incorrect test_core skip
kselftest/cgroup: fix unexpected testing failure on test_core
kselftest/cgroup: fix unexpected testing failure on test_memcontrol
In commit 9a5ab8bf1d ("tools: bpftool: turn err() and info() macros
into functions") one case of error reporting was special cased, so it
could report a lookup error for a specific key when dumping the map
element. What the code forgot to do is to wrap the key and value keys
into a JSON object, so an example output of pretty JSON dump of a
sockhash map (which does not support looking up its values) is:
[
"key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00"
],
"value": {
"error": "Operation not supported"
},
"key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01"
],
"value": {
"error": "Operation not supported"
}
]
Note the key-value pairs inside the toplevel array. They should be
wrapped inside a JSON object, otherwise it is an invalid JSON. This
commit fixes this, so the output now is:
[{
"key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00"
],
"value": {
"error": "Operation not supported"
}
},{
"key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01"
],
"value": {
"error": "Operation not supported"
}
}
]
Fixes: 9a5ab8bf1d ("tools: bpftool: turn err() and info() macros into functions")
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlz3v20ACgkQjrBW1T7s
sS1rbw//clR7YbczqO1Og5W3Bpg2JdPPQjXyfomO9gqPsOuOaRFQ4vmhvrCnavN1
4SdXm3QzXZ30fnhgtoPzrNgPrZBU4DzGq7V8G6NCWxu0wsWJrYOy++FK6WpW3ngB
AFRRWCwuI9u3xEu/0xXgd2UtM2Wy9rmu5PNa/BDkIYx33F56Lz0aIQrzrYKX3a+W
DroRiPpNTrrThFyFUH1pPs0rZHWDY9l90drq7QxZB5+7irktHHKiywL71N4gy7Wj
Hje7P5pc3Zkj2qNKT1im/ccSWdApOrTlzDrIx5GLJpZCycVlCwcGRF8F1+l6g/Pg
AV3ABMo2k5SLQ+q/3PzlCFhmIvPL/ucly+l7KbYrwwb0Zn+QKuwc+Z8vFJt3daeQ
BZPrxWse3Iwjg2S/b4tyrbxowS6SmPGQ7Dmk62q7nffgsb161uypz1/p/dLmgL9b
W1bY12bZHB/nr0smTewQk1N15dvxnsViFa1oyAdJjtngbA448nCQnZklFIaEl4Mo
NngcgKSQrp4B7G+htcPCi5Jda1GE4blhed/fwDN9G4mbCBgnhG+GB8ABx46/tlpV
A+dt0Bm88lNZETqNBQHFqE0nOaNzzdCGteHCx+61ZhX1eaFNRb3AaLsrGs9PR4zY
PVrvwmMksnTZ5Wc3YYeSHbC7IlkMdB8tbi+HFbZapPucu+33VFA=
=KySr
-----END PGP SIGNATURE-----
Merge tag 'pidfd-fixes-v5.2-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
Pull pidfd fixes from Christian Brauner:
"The contains two small patches to the pidfd samples and test binaries
respectively.
They were lacking appropriate ifdefines for __NR_pidfd_send_signal and
could hence lead to compilation errors when that was not defined.
This was spotted on mips independently by Guenter Roeck (who was kind
enough to send a fix for the samples binary) and Arnd who spotted it
in linux-next.
Apart from these two patches, there's also a patch to update the
comments for the pidfd_send_signal() syscall which were slightly
wrong/inconsistenly worded"
* tag 'pidfd-fixes-v5.2-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
tests: fix pidfd-test compilation
signal: improve comments
samples: fix pidfd-metadata compilation
- Avoid NULL deref when unloading/reloading ramoops module (Pi-Hsun Shih)
- Run ramoops without crash dump region
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlz3NB8WHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjGyEACklx2W3qjE51oCWpRuN9B29As0
XzuWrQr15WzD2zAtdG4wc6/OI3Gfu+/xXb9YClqXFN8TqGEwyGIcz0kOOExvzJN2
bdseq8gA4JkL8NK3LKjGMXCvUBQiCGUKfGa4xXL8I2NfyZkykUqRa0PVkkfNYOEf
q6zjPz73BDRvpZUw7+50sKDJalcicwOzn3GMXw7C43qDuuychpwzLTL5ZmFrQ3oX
qJqz7mIfsP5DpJk8SUTZl+W4eZ6/ianfML883ia9Zg8AP6ix/iET0iQHXw59DbOZ
XeFmXBudou+JNAjqlDbGppBwJOu3iHXFKh7eJre2W2swkdah/V8CvYo36qdJ9zHP
zs4/Wt/yloWYZqtY4UWsMhs47ryvm8iC2Ki//OPTZh30fIeqGAcVknbFbu1EHron
autOEy8DiKH5I76BGGaR78We6AVt04HXTT0kFcDgczv3MLhfOpHLoL4w4fM0NvNq
3CSDEkr6dsTQPCPUoApBo3rfbiVROzgXdDLLLxULWphtL6rAvvn/FmAPQsC7OdN3
TdZQ0AjMtiQO32TFfm9badadDXW2QjXJF91TQBqtGacR+ipiXSnImeZC24VCdXyT
pO9U/rbrU3tds3+Qu1WNh87IvEWOjzC/sjDKSd/ClZqk9F0KVGGSxc9YXgxNzLIR
gC0luMlt7acj4Jzkog==
=g26W
-----END PGP SIGNATURE-----
Merge tag 'pstore-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Avoid NULL deref when unloading/reloading ramoops module (Pi-Hsun
Shih)
- Run ramoops without crash dump region
* tag 'pstore-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Run without kernel crash dump region
pstore: Set tfm to NULL on free_buf_for_compression
Julian Wiedmann says:
====================
s390/qeth: fixes 2019-06-05
one more shot... now with patch 2 fixed up so that it uses the
dst entry returned from dst_check().
From the v1 cover letter:
Please apply the following set of qeth fixes to -net.
- The first two patches fix issues in the L3 driver's cast type
selection for transmitted skbs.
- Alexandra adds a sanity check when retrieving VLAN information from
neighbour address events.
- The last patch adds some missing error handling for qeth's new
multiqueue code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_set_real_num_tx_queues() can return an error, deal with it.
Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enabling sysfs attribute bridge_hostnotify triggers a series of udev events
for the MAC addresses of all currently connected peers. In case no VLAN is
set for a peer, the device reports the corresponding MAC addresses with
VLAN ID 4096. This currently results in attribute VLAN=4096 for all
non-VLAN interfaces in the initial series of events after host-notify is
enabled.
Instead, no VLAN attribute should be reported in the udev event for
non-VLAN interfaces.
Only the initial events face this issue. For dynamic changes that are
reported later, the device uses a validity flag.
This also changes the code so that it now sets the VLAN attribute for
MAC addresses with VID 0. On Linux, no qeth interface will ever be
registered with VID 0: Linux kernel registers VID 0 on all network
interfaces initially, but qeth will drop .ndo_vlan_rx_add_vid for VID 0.
Peers with other OSs could register MACs with VID 0.
Fixes: 9f48b9db9a ("qeth: bridgeport support - address notifications")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While qeth_l3 uses netif_keep_dst() to hold onto the dst, a skb's dst
may still have been obsoleted (via dst_dev_put()) by the time that we
end up using it. The dst then points to the loopback interface, which
means the neighbour lookup in qeth_l3_get_cast_type() determines a bogus
cast type of RTN_BROADCAST.
For IQD interfaces this causes us to place such skbs on the wrong
HW queue, resulting in TX errors.
Fix-up the various call sites to first validate the dst entry with
dst_check(), and fall back accordingly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When selecting the cast type of a neighbourless IPv4 skb (eg. on a raw
socket), qeth_l3 falls back to the packet's destination IP address.
For this case we should classify traffic sent to 255.255.255.255 as
broadcast.
This fixes DHCP requests, which were misclassified as unicast
(and for IQD interfaces thus ended up on the wrong HW queue).
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define __NR_pidfd_send_signal if it isn't to prevent a potential
compilation error.
To make pidfd-test compile on all arches, irrespective of whether
or not syscall numbers are assigned, define the syscall number to -1.
If it isn't defined this will cause the kernel to return -ENOSYS.
Fixes: 575a0ae974 ("selftests: add tests for pidfd_send_signal()")
Signed-off-by: Christian Brauner <christian@brauner.io>
Improve the comments for pidfd_send_signal().
First, the comment still referred to a file descriptor for a process as a
"task file descriptor" which stems from way back at the beginning of the
discussion. Replace this with "pidfd" for consistency.
Second, the wording for the explanation of the arguments to the syscall
was a bit inconsistent, e.g. some used the past tense some used present
tense. Make the wording more consistent.
Signed-off-by: Christian Brauner <christian@brauner.io>
Define __NR_pidfd_send_signal if it isn't to prevent a compilation error.
To make pidfd-metadata compile on all arches, irrespective of whether
or not syscall numbers are assigned, define the syscall number to -1.
If it isn't defined this will cause the kernel to return -ENOSYS.
Fixes: 43c6afee48 ("samples: show race-free pidfd metadata access")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Christian Brauner <christian@brauner.io>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[christian@brauner.io: tweak commit message]
Signed-off-by: Christian Brauner <christian@brauner.io>
If CONFIG_FUNCTION_GRAPH_TRACER is enabled function
arch_counter_get_cntvct() is marked as notrace. However, function
__arch_counter_get_cntvct is marked as inline. If
CONFIG_OPTIMIZE_INLINING is set that will make the two functions
tracable which they shouldn't.
Rework so that functions __arch_counter_get_* are marked with
__always_inline so they will be inlined even if CONFIG_OPTIMIZE_INLINING
is turned on.
Fixes: 0ea415390c ("clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
asm/smp.h is included by linux/smp.h and some drivers, in particular
irqchip drivers can access cpu_logical_map[] in order to perform SMP
affinity tasks. Make arm64 consistent with other architectures here.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In commit 06a916feca ("arm64: Expose SVE2 features for
userspace"), new hwcaps are added that are detected via fields in
the SVE-specific ID register ID_AA64ZFR0_EL1.
In order to check compatibility of secondary cpus with the hwcaps
established at boot, the cpufeatures code uses
__read_sysreg_by_encoding() to read this ID register based on the
sys_reg field of the arm64_elf_hwcaps[] table.
This leads to a kernel splat if an hwcap uses an ID register that
__read_sysreg_by_encoding() doesn't explicitly handle, as now
happens when exercising cpu hotplug on an SVE2-capable platform.
So fix it by adding the required case in there.
Fixes: 06a916feca ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
test_lirc_mode2_user is included in test_lirc_mode2.sh test and should
not be run directly.
Fixes: 6bdd533cee ("bpf: add selftest for lirc_mode2 type program")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
As Eric noted, the current wrapper for ptype func hook inside
__netif_receive_skb_list_ptype() has no chance of avoiding the indirect
call: we enter such code path only for protocols other than ipv4 and
ipv6.
Instead we can wrap the list_func invocation.
v1 -> v2:
- use the correct fix tag
Fixes: f5737cbadb ("net: use indirect calls helpers for ptype hook")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's some NICs, such as hinic, with NETIF_F_IP_CSUM and NETIF_F_TSO
on but NETIF_F_HW_CSUM off. And ipvlan device features will be
NETIF_F_TSO on with NETIF_F_IP_CSUM and NETIF_F_IP_CSUM both off as
IPVLAN_FEATURES only care about NETIF_F_HW_CSUM. So TSO will be
disabled in netdev_fix_features.
For example:
Features for enp129s0f0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
Fixes: a188222b6e ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, packets received in another VRF should not be passed to an
unbound socket in the default VRF. This patch updates the IPv4 UDP
multicast logic to match the unicast VRF logic (in compute_score()),
as well as the IPv6 mcast logic (in __udp_v6_is_mcast_sock()).
The particular case I noticed was DHCP discover packets going
to the 255.255.255.255 address, which are handled by
__udp4_lib_mcast_deliver(). The previous code meant that running
multiple different DHCP server or relay agent instances across VRFs
did not work correctly - any server/relay agent in the default VRF
received DHCP discover packets for all other VRFs.
Fixes: 6da5b0f027 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
net/tls: redo the RX resync locking
Take two of making sure we don't use a NULL netdev pointer
for RX resync. This time using a bit and an open coded
wait loop.
v2:
- fix build warning (DaveM).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem. Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing. Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.
Note that there is a leftover "flags" field in struct
tls_context already.
Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 38030d7cb7.
Unfortunately the RX resync may get called from soft IRQ,
so we can't take the rwsem to protect from the device
disappearing.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware values for link speed are held in the sja1105_speed_t enum.
However they do not increase in the order that sja1105_get_speed_cfg was
iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to
SJA1105_SPEED_1000MBPS - 1 - skipping the other two).
Another bug is that the code in sja1105_adjust_port_config relies on the
fact that an invalid link speed is detected by sja1105_get_speed_cfg and
returned as -EINVAL. However storing this into an enum that only has
positive members will cast it into an unsigned value, and it will miss
the negative check.
So take the simplest approach and remove the sja1105_get_speed_cfg
function and replace it with a simple switch-case statement.
Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid reducing the support mask as a result of the interface type
selected for SFP modules, or when setting the link settings through
ethtool - this should only change when the supported link modes of
the hardware combination change.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time. This behaviour is not specified
in the SFP MSAs, which specifies:
"The serial interface uses the 2-wire serial CMOS E2PROM protocol
defined for the ATMEL AT24C01A/02/04 family of components."
and
"As long as the SFP+ receives an acknowledge, it shall serially clock
out sequential data words. The sequence is terminated when the host
responds with a NACK and a STOP instead of an acknowledge."
We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl setting bc_forwarding for $rp2 is needed when ping_test_from h2,
otherwise the bc packets from $rp2 won't be forwarded. This patch is to
add this setting for $rp2.
Also, as ping_test_from does grep "$from" only, which could match some
unexpected output, some test case doesn't really work, like:
# ping_test_from $h2 198.51.200.255 198.51.200.2
PING 198.51.200.255 from 198.51.100.2 veth3: 56(84) bytes of data.
64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.336 ms
When doing grep $form (198.51.200.2), the output could still match.
So change to grep "bytes from $from" instead.
Fixes: 40f98b9af9 ("selftests: add a selftest for directed broadcast forwarding")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should only enable HW RX_2BYTE_OFFSET function in the case NET_IP_ALIGN
equals to 2.
Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should hw_feature as hardware capability flags to check if hardware LRO
got support.
Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ability to set RX descriptor number, the reason - initially
"tx_max_pending" was set incorrectly, but the issue appears after
adding sanity check, so fix is for "sanity" patch.
Fixes: 37e2d99b59 ("ethtool: Ensure new ring parameters are within bounds during SRINGPARAM")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Lau says:
====================
s series has fixes when running reuseport's bpf_prog for udp lookup.
If there is reuseport's bpf_prog, the common issue is the reuseport code
path expects skb->data pointing to the transport header (udphdr here).
A couple of commits broke this expectation. The issue is specific
to running bpf_prog, so bpf tag is used for this series.
Please refer to the individual commit message for details.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When the commit a6024562ff ("udp: Add GRO functions to UDP socket")
added udp[46]_lib_lookup_skb to the udp_gro code path, it broke
the reuseport_select_sock() assumption that skb->data is pointing
to the transport header.
This patch follows an earlier __udp6_lib_err() fix by
passing a NULL skb to avoid calling the reuseport's bpf_prog.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
__udp6_lib_err() may be called when handling icmpv6 message. For example,
the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called
which may call reuseport_select_sock(). reuseport_select_sock() will
call into a bpf_prog (if there is one).
reuseport_select_sock() is expecting the skb->data pointing to the
transport header (udphdr in this case). For example, run_bpf_filter()
is pulling the transport header.
However, in the __udp6_lib_err() path, the skb->data is pointing to the
ipv6hdr instead of the udphdr.
One option is to pull and push the ipv6hdr in __udp6_lib_err().
Instead of doing this, this patch follows how the original
commit 538950a1b7 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
was done in IPv4, which has passed a NULL skb pointer to
reuseport_select_sock().
Fixes: 538950a1b7 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Cc: Craig Gallek <kraig@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Herbert Xu pointed out that commit bb73c52bad ("rcu: Don't disable
preemption for Tiny and Tree RCU readers") was incorrect in making the
preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
a no-op, but still is a compiler barrier.
And RCU locking still _needs_ that compiler barrier.
It is simply fundamentally not true that RCU locking would be a complete
no-op: we still need to guarantee (for example) that things that can
trap and cause preemption cannot migrate into the RCU locked region.
The way we do that is by making it a barrier.
See for example commit 386afc9114 ("spinlocks and preemption points
need to be at least compiler barriers") from back in 2013 that had
similar issues with spinlocks that become no-ops on UP: they must still
constrain the compiler from moving other operations into the critical
region.
Now, it is true that a lot of RCU operations already use READ_ONCE() and
WRITE_ONCE() (which in practice likely would never be re-ordered wrt
anything remotely interesting), but it is also true that that is not
globally the case, and that it's not even necessarily always possible
(ie bitfields etc).
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: bb73c52bad ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
Cc: stable@kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the nds32 patchset based on 5.2-rc1
Contained in here are
1. fix warning for math-emu
2. fix nds32 fpu exception handling
3. fix nds32 fpu emulation implementation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iQIcBAABAgAGBQJc9SRzAAoJEHfB0l0b2JxETZwQAJFERZU5ziU/UU5g4W8P2fn8
KAS3bxqWk6mJ8qptSsvdggfXsJjCqxb+6Qhc9cAWQBiasfp6a6HR9zWFyAwOP7qo
y0SydlRv/tMwhImpkF63Pp9omDfFVH0xpWSGIIF/PM7Uy2GrV9pmeTFX9R7Om4qV
qdxGVPu2fe/aNP4W3/Typ2YttmWMAN5MJjTaig8hnsWQQGfVNuPZJNTAyt/iCmS/
/XwxCRWpE4WIlYkcBV1LqhYRQ7xPN/IdnkD4EY4zPsDOq1+bgQQmGh8ekoiaITgB
zSj1btl6FBcyeItFv/idaGxPjxmlN87Misix1P069pXOKVfbwB/HncZZJ3/msYFk
tHbVMoVIsJ09kFiHUgXb9sjttf4o1xl8FM3uH32HEEJBOS4OPxbEhTbXRUECEDww
xvrV6M0rKCp/Mbxvp3PmAJp75hY6/wE7Ygu7Enf9+OQBx1H6yAph25z9T2J7AoDl
5rCGwaHw0SUHyz8GeE6vvcQ0JPRRGv59tVJEYKZmttoNlS6JebYDQJ7WPm1Rkp+V
BmLv6SYeAJGdO1nhS7tMPUUWVD7cerRvz8VTmiMOSgbd/lwAR+4KIIMdi3cvaD9m
TTyHcyxUNXwa7ox2OFvFXlvIEzqBiilzNgP89DKCAI7osvUuOviqRlgBNOvN0KxR
ip/vG/xwDKXg1XVCaKdC
=kAV3
-----END PGP SIGNATURE-----
Merge tag 'nds32-for-linux-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 fixes from Greentime Hu:
- fix warning for math-emu
- fix nds32 fpu exception handling
- fix nds32 fpu emulation implementation
* tag 'nds32-for-linux-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: add new emulations for floating point instruction
nds32: Avoid IEX status being incorrectly modified
math-emu: Use statement expressions to fix Wshift-count-overflow warning
Pull sparc fixes from David Miller:
"Three bug fixes, and TLB flushing one is of particular brown paper bag
quality..."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
mdesc: fix a missing-check bug in get_vdev_port_node_info()
sparc64: Fix regression in non-hypervisor TLB flush xcall
several fixes, some of them for CVEs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJc6/2mAAoJECgfDbjSjVRpoVAIAKjEnHE4tOYBDuFXyQaFz6fy
2MRDALMaIHKM3A5yPZ5hQOWTqngP2kiEOihN/YDr31ZRmzF1Itmff48UrFy0vJnz
2mGkCDCLbEnKbfaWNYYDHOoBqE4hXPqB7PHzv2XiIcRerSZpQvnY44t3OHyMOaEV
uwN6Fie1umz5xaypWQ1OZUuOeolu4T379YTFTDr/txQ9vdZnED2G9mgCOjpAQvBK
9SMbPdylBqns8oOpAncpDBT+f5cIeGok79Lx8/U1a72hKT1I7QSSAXQcD5lhRIMJ
iXH4k94aQgtQeYkx9BnlFLGf2ue6mfqOl3hcVfLuPKAFHjLhX/zy+E7dwn4x62o=
=zlB3
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Several fixes, some of them for CVEs"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: scsi: add weight support
vhost: vsock: add weight support
vhost_net: fix possible infinite loop
vhost: introduce vhost_exceeds_weight()
virtio: Fix indentation of VIRTIO_MMIO
virtio: add unlikely() to WARN_ON_ONCE()
The im_boffset field is in units of bytes, whereas XFS_INO_OFFSET
returns a value in units of inodes. Convert the units so that scrub on
a 64k-block filesystem works correctly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
The PERF_EVENT_IOC_PERIOD ioctl command can be used to change the
sample period of a running perf_event. Consequently, when calculating
the next event period, the new period will only be considered after the
previous one has overflowed.
This patch changes the calculation of the remaining event ticks so that
they are offset if the period has changed.
See commit 3581fe0ef3 ("ARM: 7556/1: perf: fix updated event period in
response to PERF_EVENT_IOC_PERIOD") for details.
Signed-off-by: Young Xiao <92siuyang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In get_vdev_port_node_info(), 'node_info->vdev_port.name' is allcoated
by kstrdup_const(), and it returns NULL when fails. So
'node_info->vdev_port.name' should be checked.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, %g2 would end up with the value PAGE_SIZE, but after the
commit mentioned below it ends up with the value 1 due to being reused
for a different purpose. We need it to be PAGE_SIZE as we use it to step
through pages in our demap loop, otherwise we set different flags in the
low 12 bits of the address written to, thereby doing things other than a
nucleus page flush.
Fixes: a74ad5e660 ("sparc64: Handle extremely large kernel TLB range flushes more gracefully.")
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rollover used to use a complex RCU mechanism for assignment, which had
a race condition. The below patch fixed the bug and greatly simplified
the logic.
The feature depends on fanout, but the state is private to the socket.
Fanout_release returns f only when the last member leaves and the
fanout struct is to be freed.
Destroy rollover unconditionally, regardless of fanout state.
Fixes: 57f015f5ec ("packet: fix crash in fanout_demux_rollover()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing a loopback test at copper ports, the serdes loopback
and the phy loopback will fail, because of the adjust link had
not finished, and phy not ready.
Adds sleep between adjust link and test process to fix it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When non-bridged, non-vlan'ed mv88e6xxx port is moving down, error
message is logged:
failed to kill vid 0081/0 for device eth_cu_1000_4
This is caused by call from __vlan_vid_del() with vin set to zero, over
call chain this results into _mv88e6xxx_port_vlan_del() called with
vid=0, and mv88e6xxx_vtu_get() called from there returns -EINVAL.
On symmetric path moving port up, call goes through
mv88e6xxx_port_vlan_prepare() that calls mv88e6xxx_port_check_hw_vlan()
that returns -EOPNOTSUPP for zero vid.
This patch changes mv88e6xxx_vtu_get() to also return -EOPNOTSUPP for
zero vid, then this error code is explicitly cleared in
dsa_slave_vlan_rx_kill_vid() and error message is no longer logged.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Ingo Molnar:
"Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
KASAN related build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
x86/boot: Provide KASAN compatible aliases for string routines