Commit Graph

1013465 Commits

Author SHA1 Message Date
Huazhong Tan 1556ea9120 net: hns3: refactor dump mac list of debugfs
Currently, the debugfs command for mac list info is implemented
by "echo xxxx > cmd", and record the information in dmesg. It's
unnecessary and heavy. To improve it, create two files "uc" and
"mc" under directory "mac_list" for it, and query mac list info
by "cat mac_list/uc" and "mac_list/mc", return the result to
userspace, rather than record in dmesg.

The display style is below:
$ cat mac_list/uc
UC MAC_LIST:
FUNC_ID  MAC_ADDR            STATE
pf       00:18:2d:00:00:71   ACTIVE

$ cat mac_list/mc
MC MAC_LIST:
FUNC_ID  MAC_ADDR            STATE
pf       01:80:c2:00:00:21   ACTIVE

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:34 -07:00
Huazhong Tan 77e9184869 net: hns3: refactor dump bd info of debugfs
Currently, the debugfs command for bd info is implemented
by "echo xxxx > cmd", and record the information in dmesg.
It's unnecessary and heavy.

To improve it, add two debugfs directories "tx_bd_info" and
"rx_bd_info", and create a file for each queue under these
two directories, and query the bd info of specific queue by
"cat tx_bd_info/tx_bd_queue*" or "cat rx_bd_info/rx_bd_queue*",
return the result to userspace, rather than record in dmesg.

The display style is below:
$ cat rx_bd_info/rx_bd_queue0
Queue 0 rx bd info:
BD_IDX   L234_INFO  PKT_LEN   SIZE...
0        0x0             60     60...
1        0x0           1512   1512...

$ cat tx_bd_info/tx_bd_queue0
Queue 0 tx bd info:
BD_IDX     ADDRESS  VLAN_TAG  SIZE...
0          0x0          0        0...
1          0x0          0        0...

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:34 -07:00
Jiaran Zhang c929bc2ac3 net: hns3: refactor dev capability and dev spec of debugfs
Currently, the debugfs command for dev capability and dev spec
are implemented by "echo xxxx > cmd", and record the information
in dmesg. It's unnecessary and heavy. To improve it, create a
single file "dev_info" for them, and query them by command
"cat dev_info", return the result to userspace, rather than
record in dmesg.

The display style is below:
$cat dev_info
dev capability:
support FD: yes
support GRO: yes
support FEC: yes
support UDP GSO: no
support PTP: no
support INT QL: no
support HW TX csum: no
support UDP tunnel csum: no
support TX push: no
support imp-controlled PHY: no
support rxd advanced layout: no

dev spec:
MAC entry num: 0
MNG entry num: 0
MAX non tso bd num: 8
RSS ind tbl size: 512
RSS key size: 40
RSS size: 1
Allocated RSS size: 0
Task queue pairs numbers: 1
RX buffer length: 2048
Desc num per TX queue: 1024
Desc num per RX queue: 1024
Total number of enabled TCs: 1
MAX INT QL: 0
MAX INT GL: 8160
MAX TM RATE: 100000
MAX QSET number: 1024

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:34 -07:00
Yufeng Mo 5e69ea7ee2 net: hns3: refactor the debugfs process
Currently, each debugfs command needs to create a file to get
the information. To better support more debugfs commands, the
debugfs process is reconstructed, including the process of
creating dentries and files, and obtaining information.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:34 -07:00
Huazhong Tan 1ddc028ac8 net: hns3: refactor out RX completion checksum
Only when RXD advanced layout is enabled, in some cases
(e.g. ip fragments), the checksum of entire packet will be
calculated and filled in the least significant 16 bits of
the unused addr field.

So refactor out the handling of RX completion checksum: adjust
the location of the checksum in RX descriptor, and use ptype table
to identify whether this kind of checksum is calculated.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:33 -07:00
Huazhong Tan 796640778c net: hns3: support RXD advanced layout
Currently, the driver gets packet type by parsing the
L3_ID/L4_ID/OL3_ID/OL4_ID from RX descriptor, it's
time-consuming.

Now some new devices support RXD advanced layout, which combines
previous OL3_ID/OL4_ID to 8bit ptype field, so the driver gets
packet type by looking up only one table, and L3_ID/L4_ID become
reserved fields.

Considering compatibility, the firmware will report capability of
RXD advanced layout, the driver will identify and enable it by
default. This patch provides basic function: identify and enable
the RXD advanced layout, and refactor out hns3_rx_checksum() by
using ptype table to handle RX checksum if supported.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:07:33 -07:00
Guenter Roeck fc25f9f631 net: thunderx: Drop unnecessary NULL check after container_of
The result of container_of() operations is never NULL unless the embedded
element is the first element of the structure. This is not the case here.
The NULL check is therefore unnecessary and misleading. Remove it.

This change was made automatically with the following Coccinelle script.

@@
type t;
identifier v;
statement s;
@@

<+...
(
  t v = container_of(...);
|
  v = container_of(...);
)
  ...
  when != v
- if (\( !v \| v == NULL \) ) s
...+>

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 15:00:53 -07:00
Heiner Kallweit fa44821a4d sfc: don't use netif_info et al before net_device is registered
Using netif_info() before the net_device is registered results in ugly
messages like the following:
sfc 0000:01:00.1 (unnamed net_device) (uninitialized): Solarflare NIC detected
Therefore use pci_info() et al until net_device is registered.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 14:59:59 -07:00
Matteo Croce 30515832e9 net: bridge: fix build when IPv6 is disabled
The br_ip6_multicast_add_router() prototype is defined only when
CONFIG_IPV6 is enabled, but the function is always referenced, so there
is this build error with CONFIG_IPV6 not defined:

net/bridge/br_multicast.c: In function ‘__br_multicast_enable_port’:
net/bridge/br_multicast.c:1743:3: error: implicit declaration of function ‘br_ip6_multicast_add_router’; did you mean ‘br_ip4_multicast_add_router’? [-Werror=implicit-function-declaration]
 1743 |   br_ip6_multicast_add_router(br, port);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      |   br_ip4_multicast_add_router
net/bridge/br_multicast.c: At top level:
net/bridge/br_multicast.c:2804:13: warning: conflicting types for ‘br_ip6_multicast_add_router’
 2804 | static void br_ip6_multicast_add_router(struct net_bridge *br,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~
net/bridge/br_multicast.c:2804:13: error: static declaration of ‘br_ip6_multicast_add_router’ follows non-static declaration
net/bridge/br_multicast.c:1743:3: note: previous implicit declaration of ‘br_ip6_multicast_add_router’ was here
 1743 |   br_ip6_multicast_add_router(br, port);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this build error by moving the definition out of the #ifdef.

Fixes: a3c02e769e ("net: bridge: mcast: split multicast router state for IPv4 and IPv6")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 10:38:59 -07:00
Nikolay Aleksandrov bbc6f2cca7 net: bridge: fix br_multicast_is_router stub when igmp is disabled
br_multicast_is_router takes two arguments when bridge IGMP is enabled
and just one when it's disabled, fix the stub to take two as well.

Fixes: 1a3065a268 ("net: bridge: mcast: prepare is-router function for mcast router split")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14 10:35:20 -07:00
Gustavo A. R. Silva ea89c862f0 net: mana: Use struct_size() in kzalloc()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:58:46 -07:00
Gustavo A. R. Silva fe0bdaec8d bpf: Use struct_size() in kzalloc()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:58:00 -07:00
Guenter Roeck 0f3ee28033 net: caif: Drop unnecessary NULL check after container_of
The first parameter passed to chnl_recv_cb() can never be NULL since all
callers dereferenced it. Consequently, container_of() on it is also never
NULL, even though the reference into the structure points to the first
element of the structure. The NULL check is therefore unnecessary.
On top of that, it is misleading to perform a NULL check on the result of
container_of() because the position of the contained element could change,
which would make the test invalid. Remove the unnecessary NULL check.

This change was made automatically with the following Coccinelle script.

@@
type t;
identifier v;
statement s;
@@

<+...
(
  t v = container_of(...);
|
  v = container_of(...);
)
  ...
  when != v
- if (\( !v \| v == NULL \) ) s
...+>

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:55:50 -07:00
Colin Ian King 5efe257531 net: qed: remove redundant initialization of variable rc
The variable rc is being initialized with a value that is never read,
it is being updated later on.  The assignment is redundant and can be
removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:52:22 -07:00
David S. Miller 25e248a2bc Merge branch 'virtio_net-fixes'
Xuan Zhuo says:

====================
virtio-net: fix for build_skb()

The logic of this piece is really messy. Fortunately, my refactored patch can be
completed with a small amount of testing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:51:14 -07:00
Xuan Zhuo 7bf64460e3 virtio-net: get build_skb() buf by data ptr
In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, you should directly use the
pointer(p) to get the address corresponding to the page.

At the same time, the offset of the data in the page should also be
obtained using offset_in_page().

This patch solves this problem. But if you don’t use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:51:14 -07:00
Xuan Zhuo 6c66c147b9 virtio-net: fix for unable to handle page fault for address
In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().

[   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
[   16.603350] #PF: error_code(0x0000) - not-present page
[   16.604200] PGD 0 P4D 0
[   16.604686] Oops: 0000 [#1] SMP PTI
[   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
[   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.624047] Call Trace:
[   16.624525]  ? release_pages+0x24d/0x730
[   16.625209]  unmap_single_vma+0xa9/0x130
[   16.625885]  unmap_vmas+0x76/0xf0
[   16.626480]  exit_mmap+0xa0/0x210
[   16.627129]  mmput+0x67/0x180
[   16.627673]  do_exit+0x3d1/0xf10
[   16.628259]  ? do_user_addr_fault+0x231/0x840
[   16.629000]  do_group_exit+0x53/0xd0
[   16.629631]  __x64_sys_exit_group+0x1d/0x20
[   16.630354]  do_syscall_64+0x3c/0x80
[   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   16.631828] RIP: 0033:0x7f1a043d0191
[   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
[   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
[   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
[   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
[   16.640408] Modules linked in:
[   16.640958] CR2: ffffecbfff7b43c8
[   16.641557] ---[ end trace bc4891c6ce46354c ]---
[   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.658290] Kernel panic - not syncing: Fatal exception
[   16.659613] Kernel Offset: disabled
[   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: fb32856b16 ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:51:14 -07:00
David S. Miller 33b3142656 Merge branch 'atl1c-support-for-Mikrotik-10-25G-NIC-features'
Gatis Peisenieks says:

====================
atl1c: support for Mikrotik 10/25G NIC features

The new Mikrotik 10/25G NIC maintains compatibility with existing atl1c
driver. However it does have new features.

This patch set adds support for reporting cards higher link speed, max-mtu,
enables rx csum offload and improves tx performance.

v2:
    - fixed xmit_more handling as pointed out by Eric Dumazet
    - added a more reliable link detection on Mikrotik 10/25G NIC
      since MDIO op emulation can occasionally fail
Guangbin Huang says:
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:11 -07:00
Gatis Peisenieks ea0fbd05d7 atl1c: improve link detection reliability on Mikrotik 10/25G NIC
Mikrotik 10/25G NIC emulates the MDIO accesses, but the emulation is
not 100% reliable - the MDIO ops occasionally can timeout.

This adds a reliable way of detecting link on Mikrotik 10/25G NIC.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:10 -07:00
Gatis Peisenieks b039000950 atl1c: enable rx csum offload on Mikrotik 10/25G NIC
Mikrotik 10/25G NIC supports hw checksum verification on rx for
IP/IPv6 + TCP/UDP packets. HW checksum offload helps reduce host
cpu load.

This enables the csum offload specifically for Mikrotik 10/25G NIC
as other HW supported by the driver is known to have problems with it.

TCP iperf3 to Threadripper 3960X with NIC improved 16.5 -> 20.0 Gbps
with mtu=1500.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:10 -07:00
Gatis Peisenieks 545fa3fb1e atl1c: adjust max mtu according to Mikrotik 10/25G NIC ability
The new Mikrotik 10/25G NIC supports jumbo frames. Jumbo frames are
supported for TSO as well.

This enables the support for mtu up to 9500 bytes.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:10 -07:00
Gatis Peisenieks d7ab6419bd atl1c: improve performance by avoiding unnecessary pcie writes on xmit
The kernel has xmit_more facility that hints the networking driver xmit
path about whether more packets are coming soon. This information can be
used to avoid unnecessary expensive PCIe transaction per tx packet.

Max TX pps on Mikrotik 10/25G NIC in a Threadripper 3960X system
improved from 1150Kpps to 1700Kpps.

Testing L2 forwarding on AR8151 hardware did not reveal a measurable
increase in latency.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:10 -07:00
Gatis Peisenieks f19d4997fd atl1c: show correct link speed on Mikrotik 10/25G NIC
The new Mikrotik 10/25G NIC maintains compatibility with existing atl1c
driver. However it does have new features.

This defines some new register offsets, code for identifying the new type
of NIC and correct speed detection for the NIC.

Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:48:10 -07:00
David S. Miller 0d59c95ea3 Merge branch 'hinic-cleanups'
Guangbin Huang says:

====================
net: hinic: some cleanups

This patchset adds some cleanups for the hinic ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:39:10 -07:00
Guangbin Huang 5db8c86e89 net: hinic: fix misspelled "acessing"
The word "acessing" is misspelled, so fix it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:39:10 -07:00
Guangbin Huang c8ad5df615 net: hinic: remove unnecessary parentheses
There are some unnecessary parentheses, this patch deletes them.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:39:10 -07:00
Guangbin Huang 3402ab54a8 net: hinic: add blank line after function declaration
There should be a blank line after function declaration, so add two
missed blank lines.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:39:10 -07:00
Guangbin Huang 9afcb59597 net: hinic: remove unnecessary blank line
There are two blank lines are unnecessary, this patch removes them.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 15:39:10 -07:00
David S. Miller d38717af2c Merge branch 'bridge-split-ipv4-ipv6-mc-router-state'
Linus Lüssing says:

====================
net: bridge: split IPv4/v6 mc router state and export for batman-adv

The following patches are splitting the so far combined multicast router
state in the Linux bridge into two ones, one for IPv4 and one for IPv6,
for a more fine-grained detection of multicast routers. This avoids
sending IPv4 multicast packets to an IPv6-only multicast router and
avoids sending IPv6 multicast packets to an IPv4-only multicast router.
This also allows batman-adv to make use of the now split information in
the final patch.

The first eight patches prepare the bridge code to avoid duplicate
code or IPv6-#ifdef clutter for the multicast router state split. And
contain no functional changes yet.

The ninth patch then implements the IPv4+IPv6 multicast router state
split.

Patch number ten adds IPv4+IPv6 specific timers to the mdb netlink
router port dump, so that the timers validity can be checked individually
from userspace.

The final, eleventh patch exports this now per protocol family multicast
router state so that batman-adv can then later make full use of the
Multicast Router Discovery (MRD) support in the Linux bridge. The
batman-adv protocol format currently expects separate multicast router
states for IPv4 and IPv6, therefore it depends on the first patch.
batman-adv will then make use of this newly exported functions like
this[0].

Regards, Linus

[0]: https://git.open-mesh.org/batman-adv.git/shortlog/refs/heads/linus/multicast-routeable-mrd
     -> https://git.open-mesh.org/batman-adv.git/commit/d4bed3a92427445708baeb1f2d1841c5fb816fd4

Changelog v3:

* Patch 01/11:
  * fixed/added missing rename of br->router_list to
    br->ip4_mc_router_list in br_multicast_flood()
* Patch 02/11:
  * moved inline functions from br_forward.c to br_private.h
* Patch 03/11:
  * removed inline attribute from functions added to br_mdb.c
* Patch 04/11:
  * unchanged
* Patch 05/11:
  * converted if()'s into switch-case in br_multicast_is_router()
* Patch 06/11:
  * removed inline attribute from function added to br_multicast.c
* Patch 07/11:
  * added missing static attribute to function
    br_ip4_multicast_get_rport_slot() added to br_multicast.c
* Patch 08/11:
  * removed inline attribute from function added to br_multicast.c
* Patch 09/11:
  * added missing static attribute to function
    br_ip6_multicast_get_rport_slot() added to br_multicast.c
  * removed inline attribute from function added to br_multicast.c
* Patch 10/11:
  * unchanged
* Patch 11/11:
  * simplified bridge check in br_multicast_has_router_adjacent()
    by using br_port_get_check_rcu()
  * added missing declaration for br_multicast_has_router_adjacent()
    in include/linux/if_bridge.h

Changelog v2:

* split into multiple patches as suggested by Nikolay
* added helper functions to br_multicast_flood(), avoiding
  IPv6 #ifdef clutter
* fixed reverse xmas tree ordering in br_rports_fill_info() and
  added helper functions to avoid IPv6 #ifdef clutter
* Added a common br_multicast_add_router() and a helper function
  to retrieve the correct slot to avoid duplicate code for an
  ip4 and ip6 variant
* replaced the "1" and "2" constants in br_multicast_is_router()
  with the appropriate enums
* added br_{ip4,ip6}_multicast_rport_del() wrappers to reduce
  IPv6 #ifdef clutter
* added return values to br_*multicast_rport_del() to only notify
  if the port was actually removed and did not race with a readdition
  somewhere else
* added empty, void br_ip6_multicast_mark_router() if compiled
  without IPv6, to reduce IPv6 #ifdef clutter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing 3b85f9ba34 net: bridge: mcast: export multicast router presence adjacent to a port
To properly support routable multicast addresses in batman-adv in a
group-aware way, a batman-adv node needs to know if it serves multicast
routers.

This adds a function to the bridge to export this so that batman-adv
can then make full use of the Multicast Router Discovery capability of
the bridge.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing b7fb091654 net: bridge: mcast: add ip4+ip6 mcast router timers to mdb netlink
Now that we have split the multicast router state into two, one for IPv4
and one for IPv6, also add individual timers to the mdb netlink router
port dump. Leaving the old timer attribute for backwards compatibility.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing a3c02e769e net: bridge: mcast: split multicast router state for IPv4 and IPv6
A multicast router for IPv4 does not imply that the same host also is a
multicast router for IPv6 and vice versa.

To reduce multicast traffic when a host is only a multicast router for
one of these two protocol families, keep router state for IPv4 and IPv6
separately. Similar to how querier state is kept separately.

For backwards compatibility for netlink and switchdev notifications
these two will still only notify if a port switched from either no
IPv4/IPv6 multicast router to any IPv4/IPv6 multicast router or the
other way round. However a full netlink MDB router dump will now also
include a multicast router timeout for both IPv4 and IPv6.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing ed2d35971a net: bridge: mcast: split router port del+notify for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants split router port deletion and notification
into two functions. When we disable a port for instance later we want to
only send one notification to switchdev and netlink for compatibility
and want to avoid sending one for IPv4 and one for IPv6. For that the
split is needed.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing d9b8c4d8d9 net: bridge: mcast: prepare add-router function for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants move the protocol specific router list
and timer access to ip4 wrapper functions.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing ee5fb2223e net: bridge: mcast: prepare expiry functions for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants move the protocol specific timer access to
an ip4 wrapper function.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing 1a3065a268 net: bridge: mcast: prepare is-router function for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants make br_multicast_is_router() protocol
family aware.

Note that for now br_ip6_multicast_is_router() uses the currently still
common ip4_mc_router_timer for now. It will be renamed to
ip6_mc_router_timer later when the split is performed.

While at it also renames the "1" and "2" constants in
br_multicast_is_router() to the MDB_RTR_TYPE_TEMP_QUERY and
MDB_RTR_TYPE_PERM enums.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing b19232effd net: bridge: mcast: prepare query reception for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and as the br_multicast_mark_router() will
be split for that remove the select querier wrapper and instead add
ip4 and ip6 variants for br_multicast_query_received().

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing ff391c5d98 net: bridge: mcast: prepare mdb netlink for mcast router split
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add
some inline functions for the protocol specific parts in the mdb router
netlink code. Also the we need iterate over the port instead of router
list to be able put one router port entry with both the IPv4 and IPv6
multicast router info later.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing 44ebb081dc net: bridge: mcast: add wrappers for router node retrieval
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add
two wrapper functions for router node retrieval in the payload
forwarding code.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Linus Lüssing ce6f709775 net: bridge: mcast: rename multicast router lists and timers
In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants, rename the affected variable to the IPv4
version first to avoid some renames in later commits.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 14:04:31 -07:00
Sebastian Andrzej Siewior 8380c81d5c net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
__napi_schedule_irqoff() is an optimized version of __napi_schedule()
which can be used where it is known that interrupts are disabled,
e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
callbacks.

On PREEMPT_RT enabled kernels this assumptions is not true. Force-
threaded interrupt handlers and spinlocks are not disabling interrupts
and the NAPI hrtimer callback is forced into softirq context which runs
with interrupts enabled as well.

Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
game so make __napi_schedule_irqoff() invoke __napi_schedule() for
PREEMPT_RT kernels.

The callers of ____napi_schedule() in the networking core have been
audited and are correct on PREEMPT_RT kernels as well.

Reported-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 13:11:19 -07:00
Johannes Berg 4a5fe57e77 alx: use fine-grained locking instead of RTNL
In the alx driver, all locking depended on the RTNL, but
that causes issues with ipconfig ("ip=..." command line)
because that waits for the netdev to have a carrier while
holding the RTNL, but the alx workers etc. require RTNL,
so the carrier won't be set until the RTNL is dropped and
can be acquired by alx workers. This causes long delays
at boot, as reported by Nikolai Zhubr.

Really the only sensible thing to do here is to not use
the RTNL for everything, but instead have fine-grained
locking for just the driver. Do that, it's not that hard.

Reported-by: Nikolai Zhubr <zhubr.2@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 13:09:26 -07:00
Yannick Vignon 13511704f8 net: taprio offload: enforce qdisc to netdev queue mapping
Even though the taprio qdisc is designed for multiqueue devices, all the
queues still point to the same top-level taprio qdisc. This works and is
probably required for software taprio, but at least with offload taprio,
it has an undesirable side effect: because the whole qdisc is run when a
packet has to be sent, it allows packets in a best-effort class to be
processed in the context of a task sending higher priority traffic. If
there are packets left in the qdisc after that first run, the NET_TX
softirq is raised and gets executed immediately in the same process
context. As with any other softirq, it runs up to 10 times and for up to
2ms, during which the calling process is waiting for the sendmsg call (or
similar) to return. In my use case, that calling process is a real-time
task scheduled to send a packet every 2ms, so the long sendmsg calls are
leading to missed timeslots.

By attaching each netdev queue to its own qdisc, as it is done with
the "classic" mq qdisc, each traffic class can be processed independently
without touching the other classes. A high-priority process can then send
packets without getting stuck in the sendmsg call anymore.

Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-13 13:08:00 -07:00
Jim Ma d8654f4f93 tls splice: remove inappropriate flags checking for MSG_PEEK
In function tls_sw_splice_read, before call tls_sw_advance_skb
it checks likely(!(flags & MSG_PEEK)), while MSG_PEEK is used
for recvmsg, splice supports SPLICE_F_NONBLOCK, SPLICE_F_MOVE,
SPLICE_F_MORE, should remove this checking.

Signed-off-by: Jim Ma <majinjing3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-12 14:31:30 -07:00
Zou Wei 34e7434ba4 atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.

Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-12 14:03:54 -07:00
Guenter Roeck faa5f5da80 net/sched: taprio: Drop unnecessary NULL check after container_of
The rcu_head pointer passed to taprio_free_sched_cb is never NULL.
That means that the result of container_of() operations on it is also
never NULL, even though rcu_head is the first element of the structure
embedding it. On top of that, it is misleading to perform a NULL check
on the result of container_of() because the position of the contained
element could change, which would make the check invalid. Remove the
unnecessary NULL check.

This change was made automatically with the following Coccinelle script.

@@
type t;
identifier v;
statement s;
@@

<+...
(
  t v = container_of(...);
|
  v = container_of(...);
)
  ...
  when != v
- if (\( !v \| v == NULL \) ) s
...+>

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:24:02 -07:00
Loic Poulain cac6fb015f usb: class: cdc-wdm: WWAN framework integration
The WWAN framework provides a unified way to handle WWAN/modems and its
control port(s). It has initially been introduced to support MHI/PCI
modems, offering the same control protocols as the USB variants such as
MBIM, QMI, AT... The WWAN framework exposes these control protocols as
character devices, similarly to cdc-wdm, but in a bus agnostic fashion.

This change adds registration of the USB modem cdc-wdm control endpoints
to the WWAN framework as standard control ports (wwanXpY...).

Exposing cdc-wdm through WWAN framework normally maintains backward
compatibility, e.g:
    $ qmicli --device-open-qmi -d /dev/wwan0p1QMI --dms-get-ids
instead of
    $ qmicli --device-open-qmi -d /dev/cdc-wdm0 --dms-get-ids

However, some tools may rely on cdc-wdm driver/device name for device
detection. It is then safer to keep the 'legacy' cdc-wdm character
device to prevent any breakage. This is handled in this change by
API mutual exclusion, only one access method can be used at a time,
either cdc-wdm chardev or WWAN API.

Note that unknown channel types (other than MBIM, AT or MBIM) are not
registered to the WWAN framework.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:17:56 -07:00
Loic Poulain bf30396cdf net: wwan: Add unknown port type
Some devices may have ports with unknown type/protocol which need to
be tagged (though not supported by WWAN core). This will be the case
for cdc-wdm based drivers.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:17:56 -07:00
Zou Wei 009fc857c5 mISDN: fix possible use-after-free in HFC_cleanup()
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.

Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:12:46 -07:00
Zou Wei 1c72e6ab66 atm: iphase: fix possible use-after-free in ia_module_exit()
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.

Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11 16:12:03 -07:00