There are cases where the page_pool need to refill with pages from the
page allocator. Some workloads cause the page_pool to release pages
instead of recycling these pages.
For these workload it can improve performance to bulk alloc pages from the
page-allocator to refill the alloc cache.
For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
redirecting xdp_frame packets into a veth, that does XDP_PASS to create an
SKB from the xdp_frame, which then cannot return the page to the
page_pool.
Performance results under GitHub xdp-project[1]:
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
Mel: The patch "net: page_pool: convert to use alloc_pages_bulk_array
variant" was squashed with this patch. From the test page, the array
variant was superior with one of the test results as follows.
Kernel XDP stats CPU pps Delta
Baseline XDP-RX CPU total 3,771,046 n/a
List XDP-RX CPU total 3,940,242 +4.49%
Array XDP-RX CPU total 4,249,224 +12.68%
Link: https://lkml.kernel.org/r/20210325114228.27719-10-mgorman@techsingularity.net
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for next patch, move the dma mapping into its own function,
as this will make it easier to follow the changes.
[ilias.apalodimas: make page_pool_dma_map return boolean]
Link: https://lkml.kernel.org/r/20210325114228.27719-9-mgorman@techsingularity.net
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reduce the rate at which nfsd threads hammer on the page allocator. This
improves throughput scalability by enabling the threads to run more
independently of each other.
[mgorman: Update interpretation of alloc_pages_bulk return value]
Link: https://lkml.kernel.org/r/20210325114228.27719-8-mgorman@techsingularity.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "SUNRPC consumer for the bulk page allocator"
This patch set and the measurements below are based on yesterday's
bulk allocator series:
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9
The patches change SUNRPC to invoke the array-based bulk allocator
instead of alloc_page().
The micro-benchmark results are promising. I ran a mixture of 256KB
reads and writes over NFSv3. The server's kernel is built with KASAN
enabled, so the comparison is exaggerated but I believe it is still
valid.
I instrumented svc_recv() to measure the latency of each call to
svc_alloc_arg() and report it via a trace point. The following results
are averages across the trace events.
Single page: 25.007 us per call over 532,571 calls
Bulk list: 6.258 us per call over 517,034 calls
Bulk array: 4.590 us per call over 517,442 calls
This patch (of 2)
Refactor:
I'm about to use the loop variable @i for something else.
As far as the "i++" is concerned, that is a post-increment. The
value of @i is not used subsequently, so the increment operator
is unnecessary and can be removed.
Also note that nfsd_read_actor() was renamed nfsd_splice_actor()
by commit cf8208d0ea ("sendfile: convert nfsd to
splice_direct_to_actor()").
Link: https://lkml.kernel.org/r/20210325114228.27719-7-mgorman@techsingularity.net
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to
walk all map elements in a more robust and easier to verify
fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF
on s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices
which don't need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability
on next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is
slow in reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take
place correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP
packets before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used
to define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
-independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and
bnxt support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP
which define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay
for packets sampled from switch HW (and corresponding egress
and policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP
forwarding, bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface
to distribute MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x -
11-port Ethernet switch with 8x 1-Gigabit Ethernet
and 3x 10-Gigabit interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365
and BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack,
matching on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode
changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCKFPIACgkQMUZtbf5S
Irtw0g/+NA8bWdHNgG4H5rya0pv2z3IieLRmSdDfKRQQXcJpklawc5MKVVaTee/Q
5/QqgPdCsu1LAU6JXBKsKmyDDaMlQKdWuKbOqDSiAQKoMesZStTEHf9d851ZzgxA
Cdb6O7BD3lBl/IN+oxNG+KcmD1LKquTPKGySq2mQtEdLO12ekAsranzmj4voKffd
q9tBShpXQ7Dq77DLYfiQXVCvsizNcbbJFuxX0o9Lpb9+61ZyYAbogZSa9ypiZZwR
I/9azRBtJg7UV1aD/cLuAfy66Qh7t63+rCxVazs5Os8jVO26P/jQdisnnOe/x+p9
wYEmKm3GSu0V4SAPxkWW+ooKusflCeqDoMIuooKt6kbP6BRj540veGw3Ww/m5YFr
7pLQkTSP/tSjuGQIdBE1LOP5LBO8DZeC8Kiop9V0fzAW9hFSZbEq25WW0bPj8QQO
zA4Z7yWlslvxcfY2BdJX3wD8klaINkl/8fDWZFFsBdfFX2VeLtm7Xfduw34BJpvU
rYT3oWr6PhtkPAKR32SUcemSfeWgIVU41eSshzRz3kez1NngBUuLlSGGSEaKbes5
pZVt6pYFFVByyf6MTHFEoQvafZfEw04JILZpo4R5V8iTHzom0kD3Py064sBiXEw2
B6t+OW4qgcxGblpFkK2lD4kR2s1TPUs0ckVO6sAy1x8q60KKKjY=
=vcbA
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- bpf:
- allow bpf programs calling kernel functions (initially to
reuse TCP congestion control implementations)
- enable task local storage for tracing programs - remove the
need to store per-task state in hash maps, and allow tracing
programs access to task local storage previously added for
BPF_LSM
- add bpf_for_each_map_elem() helper, allowing programs to walk
all map elements in a more robust and easier to verify fashion
- sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT
redirection
- lpm: add support for batched ops in LPM trie
- add BTF_KIND_FLOAT support - mostly to allow use of BTF on
s390 which has floats in its headers files
- improve BPF syscall documentation and extend the use of kdoc
parsing scripts we already employ for bpf-helpers
- libbpf, bpftool: support static linking of BPF ELF files
- improve support for encapsulation of L2 packets
- xdp: restructure redirect actions to avoid a runtime lookup,
improving performance by 4-8% in microbenchmarks
- xsk: build skb by page (aka generic zerocopy xmit) - improve
performance of software AF_XDP path by 33% for devices which don't
need headers in the linear skb part (e.g. virtio)
- nexthop: resilient next-hop groups - improve path stability on
next-hops group changes (incl. offload for mlxsw)
- ipv6: segment routing: add support for IPv4 decapsulation
- icmp: add support for RFC 8335 extended PROBE messages
- inet: use bigger hash table for IP ID generation
- tcp: deal better with delayed TX completions - make sure we don't
give up on fast TCP retransmissions only because driver is slow in
reporting that it completed transmitting the original
- tcp: reorder tcp_congestion_ops for better cache locality
- mptcp:
- add sockopt support for common TCP options
- add support for common TCP msg flags
- include multiple address ids in RM_ADDR
- add reset option support for resetting one subflow
- udp: GRO L4 improvements - improve 'forward' / 'frag_list'
co-existence with UDP tunnel GRO, allowing the first to take place
correctly even for encapsulated UDP traffic
- micro-optimize dev_gro_receive() and flow dissection, avoid
retpoline overhead on VLAN and TEB GRO
- use less memory for sysctls, add a new sysctl type, to allow using
u8 instead of "int" and "long" and shrink networking sysctls
- veth: allow GRO without XDP - this allows aggregating UDP packets
before handing them off to routing, bridge, OvS, etc.
- allow specifing ifindex when device is moved to another namespace
- netfilter:
- nft_socket: add support for cgroupsv2
- nftables: add catch-all set element - special element used to
define a default action in case normal lookup missed
- use net_generic infra in many modules to avoid allocating
per-ns memory unnecessarily
- xps: improve the xps handling to avoid potential out-of-bound
accesses and use-after-free when XPS change race with other
re-configuration under traffic
- add a config knob to turn off per-cpu netdev refcnt to catch
underflows in testing
Device APIs:
- add WWAN subsystem to organize the WWAN interfaces better and
hopefully start driving towards more unified and vendor-
independent APIs
- ethtool:
- add interface for reading IEEE MIB stats (incl. mlx5 and bnxt
support)
- allow network drivers to dump arbitrary SFP EEPROM data,
current offset+length API was a poor fit for modern SFP which
define EEPROM in terms of pages (incl. mlx5 support)
- act_police, flow_offload: add support for packet-per-second
policing (incl. offload for nfp)
- psample: add additional metadata attributes like transit delay for
packets sampled from switch HW (and corresponding egress and
policy-based sampling in the mlxsw driver)
- dsa: improve support for sandwiched LAGs with bridge and DSA
- netfilter:
- flowtable: use direct xmit in topologies with IP forwarding,
bridging, vlans etc.
- nftables: counter hardware offload support
- Bluetooth:
- improvements for firmware download w/ Intel devices
- add support for reading AOSP vendor capabilities
- add support for virtio transport driver
- mac80211:
- allow concurrent monitor iface and ethernet rx decap
- set priority and queue mapping for injected frames
- phy: add support for Clause-45 PHY Loopback
- pci/iov: add sysfs MSI-X vector assignment interface to distribute
MSI-X resources to VFs (incl. mlx5 support)
New hardware/drivers:
- dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port
Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit
interfaces.
- dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and
BCM63xx switches
- Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches
- ath11k: support for QCN9074 a 802.11ax device
- Bluetooth: Broadcom BCM4330 and BMC4334
- phy: Marvell 88X2222 transceiver support
- mdio: add BCM6368 MDIO mux bus controller
- r8152: support RTL8153 and RTL8156 (USB Ethernet) chips
- mana: driver for Microsoft Azure Network Adapter (MANA)
- Actions Semi Owl Ethernet MAC
- can: driver for ETAS ES58X CAN/USB interfaces
Pure driver changes:
- add XDP support to: enetc, igc, stmmac
- add AF_XDP support to: stmmac
- virtio:
- page_to_skb() use build_skb when there's sufficient tailroom
(21% improvement for 1000B UDP frames)
- support XDP even without dedicated Tx queues - share the Tx
queues with the stack when necessary
- mlx5:
- flow rules: add support for mirroring with conntrack, matching
on ICMP, GTP, flex filters and more
- support packet sampling with flow offloads
- persist uplink representor netdev across eswitch mode changes
- allow coexistence of CQE compression and HW time-stamping
- add ethtool extended link error state reporting
- ice, iavf: support flow filters, UDP Segmentation Offload
- dpaa2-switch:
- move the driver out of staging
- add spanning tree (STP) support
- add rx copybreak support
- add tc flower hardware offload on ingress traffic
- ionic:
- implement Rx page reuse
- support HW PTP time-stamping
- octeon: support TC hardware offloads - flower matching on ingress
and egress ratelimitting.
- stmmac:
- add RX frame steering based on VLAN priority in tc flower
- support frame preemption (FPE)
- intel: add cross time-stamping freq difference adjustment
- ocelot:
- support forwarding of MRP frames in HW
- support multiple bridges
- support PTP Sync one-step timestamping
- dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like
learning, flooding etc.
- ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350,
SC7280 SoCs)
- mt7601u: enable TDLS support
- mt76:
- add support for 802.3 rx frames (mt7915/mt7615)
- mt7915 flash pre-calibration support
- mt7921/mt7663 runtime power management fixes"
* tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits)
net: selftest: fix build issue if INET is disabled
net: netrom: nr_in: Remove redundant assignment to ns
net: tun: Remove redundant assignment to ret
net: phy: marvell: add downshift support for M88E1240
net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255
net/sched: act_ct: Remove redundant ct get and check
icmp: standardize naming of RFC 8335 PROBE constants
bpf, selftests: Update array map tests for per-cpu batched ops
bpf: Add batched ops support for percpu array
bpf: Implement formatted output helpers with bstr_printf
seq_file: Add a seq_bprintf function
sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
net: fix a concurrency bug in l2tp_tunnel_register()
net/smc: Remove redundant assignment to rc
mpls: Remove redundant assignment to err
llc2: Remove redundant assignment to rc
net/tls: Remove redundant initialization of record
rds: Remove redundant assignment to nr_sig
dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0
...
In case ethernet driver is enabled and INET is disabled, selftest will
fail to build.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 3e1e58d64c ("net: add generic selftest support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210428130947.29649-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Variable ns is set to 'skb->data[17]' but this value is never read as
it is overwritten or not used later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
net/netrom/nr_in.c:156:2: warning: Value stored to 'ns' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619603885-115604-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The assignment is not being used and redundant.
The check for null is redundant as nf_conntrack_put() also
checks this.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20210428060532.3330974-1-roid@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current definitions of constants for PROBE, currently defined only
in the net-next kernel branch, are inconsistent, with
some beginning with ICMP and others with simply EXT. This patch
attempts to standardize the naming conventions of the constants for
PROBE before their release into a stable Kernel, and to update the
relevant definitions in net/ipv4/icmp.c.
Similarly, the definitions for the code field (previously
ICMP_EXT_MAL_QUERY, etc) use the same prefixes as the type field. This
patch adds _CODE_ to the prefix to clarify the distinction of these
constants.
Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Acked-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210427153635.2591-1-andreas.a.roeseler@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and debugfs interfaces
to a unified debugfs interface.
- Signals: Allow caching one sigqueue object per task, to improve performance & latencies.
- Improve newidle_balance() irq-off latencies on systems with a large number of CPU cgroups.
- Improve energy-aware scheduling
- Improve the PELT metrics for certain workloads
- Reintroduce select_idle_smt() to improve load-balancing locality - but without the previous
regressions
- Add 'scheduler latency debugging': warn after long periods of pending need_resched. This
is an opt-in feature that requires the enabling of the LATENCY_WARN scheduler feature,
or the use of the resched_latency_warn_ms=xx boot parameter.
- CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix remaining
balance_push() vs. hotplug holes/races
- PSI fixes, plus allow /proc/pressure/ files to be written by CAP_SYS_RESOURCE tasks as well
- Fix/improve various load-balancing corner cases vs. capacity margins
- Fix sched topology on systems with NUMA diameter of 3 or above
- Fix PF_KTHREAD vs to_kthread() race
- Minor rseq optimizations
- Misc cleanups, optimizations, fixes and smaller updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJInsRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i5XxAArh0b+fwXlkVGzTUly7HQjhU7lFbChnmF
h6ToyNLi6pXoZ14VC/WoRIME+RzK3gmw9cEFaSLVPxbkbekTcyWS78kqmcg1/j2v
kO/20QhXobiIxVskYfoMmqSavZ5mKhMWBqtFXkCuYfxwGylas0VVdh3AZLJ7N21G
WEoFh99pVULwWnPHxM2ZQ87Ex9BkGKbsBTswxWpprCfXLqD0N2hHlABpwJP78zRf
VniWFOcC7lslILCFawb7CqGgAwbgV85nDRS4QCuCKisrkFywvjJrEeu/W+h1NfhF
d6ves/osNdEAM1DSALoxwEA42An8l8xh8NyJnl8JZV00LW0DM108O5/7pf5Zcryc
RHV3RxA7skgezBh5uThvo60QzNK+kVMatI4qpQEHxLE52CaDl/fBu1Cgb/VUxnIl
AEBfyiFbk+skHpuMFKtl30Tx3M+yJKMTzFPd4kYjHYGEDwtAcXcB3dJQW48A79i3
H3IWcDcXpk5Rjo2UZmaXdt/qlj7mP6U0xdOUq8ZK6JOC4uY9skszVGsfuNN9QQ5u
2E2YKKVrGFoQydl4C8R6A7axL2VzIJszHFZNipd8E3YOyW7PWRAkr02tOOkBTj8N
dLMcNM7aPJWqEYiEIjEzGQN20pweJ1dRA29LDuOswKh+7W2bWTQFh6F2Q8Haansc
RVg5PDzl+Mc=
=E7mz
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and
debugfs interfaces to a unified debugfs interface.
- Signals: Allow caching one sigqueue object per task, to improve
performance & latencies.
- Improve newidle_balance() irq-off latencies on systems with a large
number of CPU cgroups.
- Improve energy-aware scheduling
- Improve the PELT metrics for certain workloads
- Reintroduce select_idle_smt() to improve load-balancing locality -
but without the previous regressions
- Add 'scheduler latency debugging': warn after long periods of pending
need_resched. This is an opt-in feature that requires the enabling of
the LATENCY_WARN scheduler feature, or the use of the
resched_latency_warn_ms=xx boot parameter.
- CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix
remaining balance_push() vs. hotplug holes/races
- PSI fixes, plus allow /proc/pressure/ files to be written by
CAP_SYS_RESOURCE tasks as well
- Fix/improve various load-balancing corner cases vs. capacity margins
- Fix sched topology on systems with NUMA diameter of 3 or above
- Fix PF_KTHREAD vs to_kthread() race
- Minor rseq optimizations
- Misc cleanups, optimizations, fixes and smaller updates
* tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
cpumask/hotplug: Fix cpu_dying() state tracking
kthread: Fix PF_KTHREAD vs to_kthread() race
sched/debug: Fix cgroup_path[] serialization
sched,psi: Handle potential task count underflow bugs more gracefully
sched: Warn on long periods of pending need_resched
sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to simplify the code & fix an unused function warning
sched/debug: Rename the sched_debug parameter to sched_verbose
sched,fair: Alternative sched_slice()
sched: Move /proc/sched_debug to debugfs
sched,debug: Convert sysctl sched_domains to debugfs
debugfs: Implement debugfs_create_str()
sched,preempt: Move preempt_dynamic to debug.c
sched: Move SCHED_DEBUG sysctl to debugfs
sched: Don't make LATENCYTOP select SCHED_DEBUG
sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG
sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG
sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
cpumask: Introduce DYING mask
cpumask: Make cpu_{online,possible,present,active}() inline
rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs()
...
It seems like Fedora 34 ends up enabling a few new gcc warnings, notably
"-Wstringop-overread" and "-Warray-parameter".
Both of them cause what seem to be valid warnings in the kernel, where
we have array size mismatches in function arguments (that are no longer
just silently converted to a pointer to element, but actually checked).
This fixes most of the trivial ones, by making the function declaration
match the function definition, and in the case of intel_pm.c, removing
the over-specified array size from the argument declaration.
At least one 'stringop-overread' warning remains in the i915 driver, but
that one doesn't have the same obvious trivial fix, and may or may not
actually be indicative of a bug.
[ It was a mistake to upgrade one of my machines to Fedora 34 while
being busy with the merge window, but if this is the extent of the
compiler upgrade problems, things are better than usual - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In digital_tg_recv_dep_req, it calls nfc_tm_data_received(..,resp).
If nfc_tm_data_received() failed, the callee will free the resp via
kfree_skb() and return error. But in the exit branch, the resp
will be freed again.
My patch sets resp to NULL if nfc_tm_data_received() failed, to
avoid the double free.
Fixes: 1c7a4c24fb ("NFC Digital: Add target NFC-DEP support")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add support for the catch-all set element. This special element
can be used to define a default action to be applied in case that
the set lookup returns no matching element.
2) Fix incorrect #ifdef dependencies in the nftables cgroupsv2
support, from Arnd Bergmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_tunnel_register() registers a tunnel without fully
initializing its attribute. This can allow another kernel thread
running l2tp_xmit_core() to access the uninitialized data and
then cause a kernel NULL pointer dereference error, as shown below.
Thread 1 Thread 2
//l2tp_tunnel_register()
list_add_rcu(&tunnel->list, &pn->l2tp_tunnel_list);
//pppol2tp_connect()
tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id);
// Fetch the new tunnel
...
//l2tp_xmit_core()
struct sock *sk = tunnel->sock;
...
bh_lock_sock(sk);
//Null pointer error happens
tunnel->sock = sk;
Fix this bug by initializing tunnel->sock before adding the
tunnel into l2tp_tunnel_list.
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Sishuai Gong <sishuai@purdue.edu>
Reported-by: Sishuai Gong <sishuai@purdue.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable rc is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
net/smc/af_smc.c:1079:3: warning: Value stored to 'rc' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable err is set to -ENOMEM but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
net/mpls/af_mpls.c:1022:2: warning: Value stored to 'err' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable rc is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.
Cleans up the following clang-analyzer warning:
net/llc/llc_station.c:86:2: warning: Value stored to 'rc' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
record is being initialized to ctx->open_record but this is never
read as record is overwritten later on. Remove the redundant
initialization.
Cleans up the following clang-analyzer warning:
net/tls/tls_device.c:421:26: warning: Value stored to 'record' during
its initialization is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable nr_sig is being assigned a value however the assignment is
never read, so this redundant assignment can be removed.
Cleans up the following clang-analyzer warning:
net/rds/ib_send.c:297:2: warning: Value stored to 'nr_sig' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although HWTSTAMP_TX_ONESTEP_SYNC existed in ioctl for hardware timestamp
configuration, the PTP Sync one-step timestamping had never been supported.
This patch is to truely support it.
- ocelot_port_txtstamp_request()
This function handles tx timestamp request by storing
ptp_cmd(tx timestamp type) in OCELOT_SKB_CB(skb)->ptp_cmd,
and additionally for two-step timestamp storing ts_id in
OCELOT_SKB_CB(clone)->ptp_cmd.
- ocelot_ptp_rew_op()
During xmit, this function is called to get rew_op (rewriter option) by
checking skb->cb for tx timestamp request, and configure to transmitting.
Non-onestep-Sync packet with one-step timestamp request falls back to use
two-step timestamp.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free skb->cb usage in core driver and let device drivers decide to
use or not. The reason having a DSA_SKB_CB(skb)->clone was because
dsa_skb_tx_timestamp() which may set the clone pointer was called
before p->xmit() which would use the clone if any, and the device
driver has no way to initialize the clone pointer.
This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning
of dsa_slave_xmit(). Some new features in the future, like one-step
timestamp may need more bytes of skb->cb to use in
dsa_skb_tx_timestamp(), and p->xmit().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was a waste to clone skb directly in dsa_skb_tx_timestamp().
For one-step timestamping, a clone was not needed. For any failure of
port_txtstamp (this may usually happen), the skb clone had to be freed.
So this patch moves skb cloning for tx timestamp out of dsa core, and
let drivers clone skb in port_txtstamp if they really need.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move ptp_classify_raw out of dsa core driver for handling tx
timestamp request. Let device drivers do this if they want.
Not all drivers want to limit tx timestamping for only PTP
packet.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check tx timestamp request in core driver at very beginning of
dsa_skb_tx_timestamp(), so that most skbs not requiring tx
timestamp just return. And drop such checking in device drivers.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable offset is being assigned a value from a calculation
however the variable is never read, so this redundant variable
can be removed.
Cleans up the following clang-analyzer warning:
net/rxrpc/rxkad.c:579:2: warning: Value stored to 'offset' is never read
[clang-analyzer-deadcode.DeadStores].
net/rxrpc/rxkad.c:485:2: warning: Value stored to 'offset' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 Multicast Router Advertisements parsing has the following two
issues:
For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD
messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes,
assuming MLDv2 Reports with at least one multicast address entry).
When ipv6_mc_check_mld_msg() tries to parse an Multicast Router
Advertisement its MLD length check will fail - and it will wrongly
return -EINVAL, even if we have a valid MRD Advertisement. With the
returned -EINVAL the bridge code will assume a broken packet and will
wrongly discard it, potentially leading to multicast packet loss towards
multicast routers.
The second issue is the MRD header parsing in
br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header
immediately after the IPv6 header (IPv6 next header type). However
according to RFC4286, section 2 all MRD messages contain a Router Alert
option (just like MLD). So instead there is an IPv6 Hop-by-Hop option
for the Router Alert between the IPv6 and ICMPv6 header, again leading
to the bridge wrongly discarding Multicast Router Advertisements.
To fix these two issues, introduce a new return value -ENODATA to
ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop
option which is not an MLD but potentially an MRD packet. This also
simplifies further parsing in the bridge code, as ipv6_mc_check_mld()
already fully checks the ICMPv6 header and hop-by-hop option.
These issues were found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).
Fixes: 4b3087c7e3 ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM2sUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNfCg/9GmoCyCh+ZRj5RGQ6M+yJas1+yyJQ
uEfTNde54yfATUTaaWYnZG59yqzM3I2uaV11U7tqg8ajiFPxJKqbs5R9jl3lnSjH
0Dg22nXPSCOTKcU0x/DeLoKRr+M9jO1K/nQ8NEZvYX4nC/OgtCvJqb/oEQZIKAk5
2a7OEmNNQyFGd274p9dELaDHxN9UIaJ2PzQFXtq7ROHgBXQO4ONb2ajOf6mDSFQb
vP/CDHwaH+pcE28w44oRy0/YBkO1SrdqoFQchg5yFagM5tQRLGkXK4OFSs5KHi5Q
YMtmaOzMPIv1e5eaC1HuuMJYA4pPb30T9hFHP7tmBVZfmZaFaDeUs+BhMm98WTiS
o0iTP7tfs36/poOR1Q0/sB06uvF9hUAAX1ZuE95YySifbXU9hsUc9b0uQSwCdg9P
/J9rcdHLTpWqjw9n02mezWmAvo5U8ZvbDs+0xPIwI+3RTUP5t6mp+Hd5Tc7bPTq1
0rpWXx+FQoSytFap5qiUSiwBp+HF6HQnNIXB0Muf6wctChoTjvo7TwoxH//z4kEm
+SddhOCNkB7VC/X7hOxhl0F/rdHuXvb1AFIWjpTLJH2CR1PvMtF+sGey+uPT6hKZ
/gvhmQGjFdph99eGlfVbCNvx1pM61O25IscaYD1T2wGImw+z7dX4WkG3WoOdDSkR
bRjrBkcHh0gLhWk=
=HTEy
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add support for measuring the SELinux state and policy capabilities
using IMA.
- A handful of SELinux/NFS patches to compare the SELinux state of one
mount with a set of mount options. Olga goes into more detail in the
patch descriptions, but this is important as it allows more
flexibility when using NFS and SELinux context mounts.
- Properly differentiate between the subjective and objective LSM
credentials; including support for the SELinux and Smack. My clumsy
attempt at a proper fix for AppArmor didn't quite pass muster so John
is working on a proper AppArmor patch, in the meantime this set of
patches shouldn't change the behavior of AppArmor in any way. This
change explains the bulk of the diffstat beyond security/.
- Fix a problem where we were not properly terminating the permission
list for two SELinux object classes.
* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: add proper NULL termination to the secclass_map permissions
smack: differentiate between subjective and objective task credentials
selinux: clarify task subjective and objective credentials
lsm: separate security_task_getsecid() into subjective and objective variants
nfs: account for selinux security context when deciding to share superblock
nfs: remove unneeded null check in nfs_fill_super()
lsm,selinux: add new hook to compare new mount to an existing mount
selinux: fix misspellings using codespell tool
selinux: fix misspellings using codespell tool
selinux: measure state and policy capabilities
selinux: Allow context mounts for unpriviliged overlayfs
In some configurations, the sock_cgroup_ptr() function is not available:
net/netfilter/nft_socket.c: In function 'nft_sock_get_eval_cgroupv2':
net/netfilter/nft_socket.c:47:16: error: implicit declaration of function 'sock_cgroup_ptr'; did you mean 'obj_cgroup_put'? [-Werror=implicit-function-declaration]
47 | cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
| ^~~~~~~~~~~~~~~
| obj_cgroup_put
net/netfilter/nft_socket.c:47:14: error: assignment to 'struct cgroup *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion]
47 | cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
| ^
Change the caller to match the same #ifdef check, only calling it
when the function is defined.
Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The variable is only used in an #ifdef, causing a harmless warning:
net/netfilter/nft_socket.c: In function 'nft_socket_init':
net/netfilter/nft_socket.c:137:27: error: unused variable 'level' [-Werror=unused-variable]
137 | unsigned int len, level;
| ^~~~~
Move it into the same #ifdef block.
Fixes: e0bb96db96 ("netfilter: nft_socket: add support for cgroupsv2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmCHJJAACgkQ+7dXa6fL
C2uv0A//S/sJyToPtj3xbzmRVmSGGWFYNRMaxBD2gYAq7swbDNiX4ZbBCe8A4FBY
zedeMfoNztHIRB2M9vvnhG4HJWXPKq2BaT0xzeteCcmZ65b5zBOrAXue0PQPqE20
xmK1RDls/y5Y2FaF92Ay0VZzXW7+y/M+RRSo+FCFzrIgpJrPprTnlZigrECYauGJ
Qdsv26rQ0flK6tyi6GVuWZIMvpINCt3WwpwQTkAUewz2VewA1tZ1xFe70sP0vF7R
MJNaS6A4uJmvoJJzb8rqdnBGiu76+TxmPaXn0IZKJBECZjBVJyk/duce0jgqbQ7C
PZz5j4C2xrPyu3Y98joj37HPEAHCy0DPRx2Es1mz5cHPzI1TDRClHzPrxyycz9gr
D9WnMiPj9ff9aDaV6XpWKyuHhPxaHpoOD3VGdrhx6bU19Jd3/mLHB3lSt1kJzWdg
QrSAk3KzMWAZigz/+I5xetOpbygKTPLEYgpdmdOSTrtACcm1wjnhIougu0FUIWXK
arPNFOIV9liN0qCQyDOcLx4UEcxXrb2W0AYeHHJDBFxJ7sT2WWUCjPZFW5bh3G+Y
goKv/XJRVWJxFlTXLZLZ5siclzzIlAAmSylh661ji836yRhqTQ3NJTB8QfnrGGsZ
QlD1hjpyqC8uwIGUvoh56KdLRTxj9Gj70gpVe/Lk3Z16mivqDUE=
=fSr0
-----END PGP SIGNATURE-----
Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
"Use the new netfs lib.
Begin the process of overhauling the use of the fscache API by AFS and
the introduction of support for features such as Transparent Huge
Pages (THPs).
- Add some support for THPs, including using core VM helper functions
to find details of pages.
- Use the ITER_XARRAY I/O iterator to mediate access to the pagecache
as this handles THPs and doesn't require allocation of large bvec
arrays.
- Delegate address_space read/pre-write I/O methods for AFS to the
netfs helper library. A method is provided to the library that
allows it to issue a read against the server.
This includes a change in use for PG_fscache (it now indicates a
DIO write in progress from the marked page), so a number of waits
need to be deployed for it.
- Split the core AFS writeback function to make it easier to modify
in future patches to handle writing to the cache. [This might
feasibly make more sense moved out into my fscache-iter branch].
I've tested these with "xfstests -g quick" against an AFS volume
(xfstests needs patching to make it work). With this, AFS without a
cache passes all expected xfstests; with a cache, there's an extra
failure, but that's also there before these patches. Fixing that
probably requires a greater overhaul (as can be found on my
fscache-iter branch, but that's for a later time).
Thanks should go to Marc Dionne and Jeff Altman of AuriStor for
exercising the patches in their test farm also"
Link: https://lore.kernel.org/lkml/3785063.1619482429@warthog.procyon.org.uk/
* tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Use the netfs_write_begin() helper
afs: Use new netfs lib read helper API
afs: Use the fs operation ops to handle FetchData completion
afs: Prepare for use of THPs
afs: Extract writeback extension into its own function
afs: Wait on PG_fscache before modifying/releasing a page
afs: Use ITER_XARRAY for writing
afs: Set up the iov_iter before calling afs_extract_data()
afs: Log remote unmarshalling errors
afs: Don't truncate iter during data fetch
afs: Move key to afs_read struct
afs: Print the operation debug_id when logging an unexpected data version
afs: Pass page into dirty region helpers to provide THP size
afs: Disable use of the fscache I/O routines
- Clean up list_sort prototypes (Sami Tolvanen)
- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCHCR8ACgkQiXL039xt
wCZyFQ//fnUZaXR2K354zDyW6CJljMf+d94RF6rH+J6eMTH2/HXa5v0iJokwABLf
ussP6qF4k5wtmI22Gm9A5Zc3e4iiry5pC0jOdk0mk4gzWwFN9MdgNxJZIGA3xqhS
bsBK4AGrVKjtZl48G1/ZxJuNDeJhVp6GNK2n6/Gl4rZF6R7D/Upz0XelyJRdDpcM
HIGma7jZl6xfGU0mdWCzpOGK1zdMca1WVs7A4YuurSbLn5PZJrcNVWLouDqt/Si2
AduSri1gyPClicgvqWjMOzhUpuw/nJtBLRl1x1EsWk/KSZ1/uNVjlewfzdN4fZrr
zbtFr2gLubYLK6JOX7/LqoHlOTgE3tYLL+WIVN75DsOGZBKgHhmebTmWLyqzV0SL
oqcyM5d3ucC6msdtAK5Fv4MSp8rpjqlK1Ha4SGRT6kC2wut7AhZ3KD7eyRIz8mV9
Sa9mhignGFJnTEUp+LSbYdrAudgSKxB40WyXPmswAXX4VJFRD4ONrrcAON/SzkUT
Hw/JdFRCKkJjgwNQjIQoZcUNMTbFz2PlNIEnjJWm38YImQKQlCb2mXaZKCwBkf45
aheCZk17eKoxTCXFMd+KxlyNEtS2yBfq/PpZgvw7GW/pfFbWUg1+2O41LnihIe5v
zu0hN1wNCQqgfxiMZqX1OTb9C/2vybzGsXILt+9nppjZ8EBU7iU=
=wU6U
-----END PGP SIGNATURE-----
Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CFI on arm64 support from Kees Cook:
"This builds on last cycle's LTO work, and allows the arm64 kernels to
be built with Clang's Control Flow Integrity feature. This feature has
happily lived in Android kernels for almost 3 years[1], so I'm excited
to have it ready for upstream.
The wide diffstat is mainly due to the treewide fixing of mismatched
list_sort prototypes. Other things in core kernel are to address
various CFI corner cases. The largest code portion is the CFI runtime
implementation itself (which will be shared by all architectures
implementing support for CFI). The arm64 pieces are Acked by arm64
maintainers rather than coming through the arm64 tree since carrying
this tree over there was going to be awkward.
CFI support for x86 is still under development, but is pretty close.
There are a handful of corner cases on x86 that need some improvements
to Clang and objtool, but otherwise works well.
Summary:
- Clean up list_sort prototypes (Sami Tolvanen)
- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"
* tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: allow CONFIG_CFI_CLANG to be selected
KVM: arm64: Disable CFI for nVHE
arm64: ftrace: use function_nocfi for ftrace_call
arm64: add __nocfi to __apply_alternatives
arm64: add __nocfi to functions that jump to a physical address
arm64: use function_nocfi with __pa_symbol
arm64: implement function_nocfi
psci: use function_nocfi for cpu_resume
lkdtm: use function_nocfi
treewide: Change list_sort to use const pointers
bpf: disable CFI in dispatcher functions
kallsyms: strip ThinLTO hashes from static functions
kthread: use WARN_ON_FUNCTION_MISMATCH
workqueue: use WARN_ON_FUNCTION_MISMATCH
module: ensure __cfi_check alignment
mm: add generic function_nocfi macro
cfi: add __cficanonical
add support for Clang CFI
This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.
This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.
The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.
The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.
The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When binding sets to rule, validate set element data according to
set definition. This patch adds a helper function to be reused by
the catch-all set element support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rename:
- nft_set_elem_activate() to nft_set_elem_data_activate().
- nft_set_elem_deactivate() to nft_set_elem_data_deactivate().
To prepare for updates in the set element infrastructure to add support
for the special catch-all element.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Harald Arnesen reported [1] a deadlock at reboot time, and after
he captured a stack trace a picture developed of what's going on:
The distribution he's using is using iwd (not wpa_supplicant) to
manage wireless. iwd will usually use the "socket owner" option
when it creates new interfaces, so that they're automatically
destroyed when it quits (unexpectedly or otherwise). This is also
done by wpa_supplicant, but it doesn't do it for the normal one,
only for additional ones, which is different with iwd.
Anyway, during shutdown, iwd quits while the netdev is still UP,
i.e. IFF_UP is set. This causes the stack trace that Linus so
nicely transcribed from the pictures:
cfg80211_destroy_iface_wk() takes wiphy_lock
-> cfg80211_destroy_ifaces()
->ieee80211_del_iface
->ieeee80211_if_remove
->cfg80211_unregister_wdev
->unregister_netdevice_queue
->dev_close_many
->__dev_close_many
->raw_notifier_call_chain
->cfg80211_netdev_notifier_call
and that last call tries to take wiphy_lock again.
In commit a05829a722 ("cfg80211: avoid holding the RTNL when
calling the driver") I had taken into account the possibility of
recursing from cfg80211 into cfg80211_netdev_notifier_call() via
the network stack, but only for NETDEV_UNREGISTER, not for what
happens here, NETDEV_GOING_DOWN and NETDEV_DOWN notifications.
Additionally, while this worked still back in commit 78f22b6a3a
("cfg80211: allow userspace to take ownership of interfaces"), it
missed another corner case: unregistering a netdev will cause
dev_close() to be called, and thus stop wireless operations (e.g.
disconnecting), but there are some types of virtual interfaces in
wifi that don't have a netdev - for that we need an additional
call to cfg80211_leave().
So, to fix this mess, change cfg80211_destroy_ifaces() to not
require the wiphy_lock(), but instead make it acquire it, but
only after it has actually closed all the netdevs on the list,
and then call cfg80211_leave() as well before removing them
from the driver, to fix the second issue. The locking change in
this requires modifying the nl80211 call to not get the wiphy
lock passed in, but acquire it by itself after flushing any
potentially pending destruction requests.
[1] https://lore.kernel.org/r/09464e67-f3de-ac09-28a3-e27b7914ee7d@skogtun.org
Cc: stable@vger.kernel.org # 5.12
Reported-by: Harald Arnesen <harald@skogtun.org>
Fixes: 776a39b819 ("cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held")
Fixes: 78f22b6a3a ("cfg80211: allow userspace to take ownership of interfaces")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Update NFSv2 and NFSv3 XDR encoding functions
- Add batch Receive posting to the server's RPC/RDMA transport (take 2)
- Reduce page allocator traffic in svcrdma
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmBzLCIACgkQM2qzM29m
f5evWA//fE2WlZDoTP8Iq1BGreGrGyzqOIkJakDGoZs4VOaUJN9WxWEcmBHI4t22
yom7aZ7S7VtMF6SoGMnohYoNwkloPJ1kfYBVZuUxDUCIHGVrLaAGZwjtojQftUS0
19ZdiSx8D8xWEqI/cpbHsj+CNCH1F5IDGjJzNhlz+rIFLrRPBMeHwbY8qf/zusm/
uZ4tGJtKWmFXvdT9duGkoNXRd0gBdcfDeFN//JYrLPS9sX4zs4/2KnDh25YJR8Jf
EQryc19l4ztlVT8PXJfI4I/fG0Sfv5AWuYYzaFIncp1PkmiunqGL1yai6eeW4LEN
8v3QEUNy9J7J20FrsL1ge1icbEyObfNFvkgYqNhmGIBdbEDdLWPppL6fQKUYl/zi
HSnAoJsJYyzYKbE7BMLy3wwZ775GsTrkU/7tfiu/M9KgvXJtDy0m7vsxt3/qULXn
Bg4KAKnmYIcuigdG6BATJ23jISRK7cHH/YYlORj3lsB/KQi+c2U/5zQFaROOwbtc
Ny+y92zQZVu9ZMkUfs2+b5qdg8Z/J0p8n9MfS7GcpCIyGZTnvfs8SfM8BXuHH2Yn
BdL3xDoPlbYQnAzp16oVfdoYm8RWzFBv+xHc360ielMOQbP4ntdC0oexIApsXzyz
Yv4OK9itsLagwcuAxPWnfnIgGxj9hPsp/peUsChsglhx+4NSGWA=
=BOO0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Highlights:
- Update NFSv2 and NFSv3 XDR encoding functions
- Add batch Receive posting to the server's RPC/RDMA transport (take 2)
- Reduce page allocator traffic in svcrdma"
* tag 'nfsd-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (70 commits)
NFSD: Use DEFINE_SPINLOCK() for spinlock
sunrpc: Remove unused function ip_map_lookup
NFSv4.2: fix copy stateid copying for the async copy
UAPI: nfsfh.h: Replace one-element array with flexible-array member
svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()
svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg
svcrdma: Remove sc_read_complete_q
svcrdma: Single-stage RDMA Read
SUNRPC: Move svc_xprt_received() call sites
SUNRPC: Export svc_xprt_received()
svcrdma: Retain the page backing rq_res.head[0].iov_base
svcrdma: Remove unused sc_pages field
svcrdma: Normalize Send page handling
svcrdma: Add a "deferred close" helper
svcrdma: Maintain a Receive water mark
svcrdma: Use svc_rdma_refresh_recvs() in wc_receive
svcrdma: Add a batch Receive posting mechanism
svcrdma: Remove stale comment for svc_rdma_wc_receive()
svcrdma: Provide an explanatory comment in CMA event handler
svcrdma: RPCDBG_FACILITY is no longer used
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmCGYWwTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqUAKB/9iZHHXxDt9Kt5aoY2BmcVwzPcFCfkn
b9240tREnTfIkVZPxLu55Zwe1fjR+hSmen7A4A4U5AxLXnpHf/UikWgGUD6f3Teb
KRALuRMIqBRGdX7vayqpO37sTMGJQT/HNxtwU8W83gUooKNWeZC1gEuR1kjd7g3I
01Yhd6/7BpTPYkbwh3HQ1H7yW4dNj7HHXlIoY8X2jl5baCgwxo/qiab3a6aCXmOz
KaUIrcyGe1698/jzdLa7TYoLW7fFiljRJzwZeFbc4+LB34TWg5xI6uicRuJ2s2S1
0oQmnXAsVB/gmL5UAPldHpclaFumxmnpRyFPH5Y+HhP8Rb3VotkIUL5j
=j5ru
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.13-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2021-04-26
this is a pull request of 4 patches for net-next/master.
the first two patches are from Colin Ian King and target the
etas_es58x driver, they add a missing NULL pointer check and fix some
typos.
The next two patches are by Erik Flodin. The first one updates the CAN
documentation regarding filtering, the other one fixes the header
alignment in CAN related proc output on 64 bit systems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) The various ip(6)table_foo incarnations are updated to expect
that the table is passed as 'void *priv' argument that netfilter core
passes to the hook functions. This reduces the struct net size by 2
cachelines on x86_64. From Florian Westphal.
2) Add cgroupsv2 support for nftables.
3) Fix bridge log family merge into nf_log_syslog: Missing
unregistration from netns exit path, from Phil Sutter.
4) Add nft_pernet() helper to access nftables pernet area.
5) Add struct nfnl_info to reduce nfnetlink callback footprint and
to facilite future updates. Consolidate nfnetlink callbacks.
6) Add CONFIG_NETFILTER_XTABLES_COMPAT Kconfig knob, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big set of tty and serial driver updates for 5.13-rc1.
Actually busy this release, with a number of cleanups happening:
- much needed core tty cleanups by Jiri Slaby
- removal of unused and orphaned old-style serial drivers. If
anyone shows up with this hardware, it is trivial to restore
these but we really do not think they are in use anymore.
- fixes and cleanups from Johan Hovold on a number of termios
setting corner cases that loads of drivers got wrong as well
as removing unneeded code due to tty core changes from long
ago that were never propagated out to the drivers
- loads of platform-specific serial port driver updates and
fixes
- coding style cleanups and other small fixes and updates all
over the tty/serial tree.
All of these have been in linux-next for a while now with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa3NQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykMXgCfX3FZgKveI4l94ChXSy4OyKwycHUAn00BzrMC
/7BwA1FnjQnC4zSzuHnm
=bAas
-----END PGP SIGNATURE-----
Merge tag 'tty-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial driver updates from Greg KH:
"Here is the big set of tty and serial driver updates for 5.13-rc1.
Actually busy this release, with a number of cleanups happening:
- much needed core tty cleanups by Jiri Slaby
- removal of unused and orphaned old-style serial drivers. If anyone
shows up with this hardware, it is trivial to restore these but we
really do not think they are in use anymore.
- fixes and cleanups from Johan Hovold on a number of termios setting
corner cases that loads of drivers got wrong as well as removing
unneeded code due to tty core changes from long ago that were never
propagated out to the drivers
- loads of platform-specific serial port driver updates and fixes
- coding style cleanups and other small fixes and updates all over
the tty/serial tree.
All of these have been in linux-next for a while now with no reported
issues"
* tag 'tty-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (186 commits)
serial: extend compile-test coverage
serial: stm32: add FIFO threshold configuration
dt-bindings: serial: 8250: update TX FIFO trigger level
dt-bindings: serial: stm32: override FIFO threshold properties
dt-bindings: serial: add RX and TX FIFO properties
serial: xilinx_uartps: drop low-latency workaround
serial: vt8500: drop low-latency workaround
serial: timbuart: drop low-latency workaround
serial: sunsu: drop low-latency workaround
serial: sifive: drop low-latency workaround
serial: txx9: drop low-latency workaround
serial: sa1100: drop low-latency workaround
serial: rp2: drop low-latency workaround
serial: rda: drop low-latency workaround
serial: owl: drop low-latency workaround
serial: msm_serial: drop low-latency workaround
serial: mpc52xx_uart: drop low-latency workaround
serial: meson: drop low-latency workaround
serial: mcf: drop low-latency workaround
serial: lpc32xx_hs: drop low-latency workaround
...
Here is the big set of various smaller driver subsystem updates for
5.13-rc1.
Major bits in here are:
- habanalabs driver updates
- hwtracing driver updates
- interconnect driver updates
- mhi driver updates
- extcon driver updates
- fpga driver updates
- new binder features added
- nvmem driver updates
- phy driver updates
- soundwire driver updates
- smaller misc and char driver fixes and updates.
- bluetooth driver bugfix that maintainer wanted to go through
this tree.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYIa0CQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylQ/QCgwLQleU5hH/iQwxbHgNL5GawNUroAmwZtxILF
1r6zjmGi0Ak4oFBf7A0T
=Rrl6
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of various smaller driver subsystem updates for
5.13-rc1.
Major bits in here are:
- habanalabs driver updates
- hwtracing driver updates
- interconnect driver updates
- mhi driver updates
- extcon driver updates
- fpga driver updates
- new binder features added
- nvmem driver updates
- phy driver updates
- soundwire driver updates
- smaller misc and char driver fixes and updates.
- bluetooth driver bugfix that maintainer wanted to go through this
tree.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits)
bluetooth: eliminate the potential race condition when removing the HCI controller
coresight: etm-perf: Fix define build issue when built as module
phy: Revert "phy: ti: j721e-wiz: add missing of_node_put"
phy: ti: j721e-wiz: Add missing include linux/slab.h
phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
stm class: Use correct UUID APIs
intel_th: pci: Add Alder Lake-M support
intel_th: pci: Add Rocket Lake CPU support
intel_th: Consistency and off-by-one fix
intel_th: Constify attribute_group structs
intel_th: Constify all drvdata references
stm class: Remove an unused function
habanalabs/gaudi: Fix uninitialized return code rc when read size is zero
greybus: es2: fix kernel-doc warnings
mei: me: add Alder Lake P device id.
dw-xdata-pcie: Update outdated info and improve text format
dw-xdata-pcie: Fix documentation build warns
fbdev: zero-fill colormap in fbcmap.c
firmware: qcom-scm: Fix QCOM_SCM configuration
speakup: i18n: Switch to kmemdup_nul() in spk_msg_set()
...
The compat layer needs to parse untrusted input (the ruleset)
to translate it to a 64bit compatible format.
We had a number of bugs in this department in the past, so allow users
to turn this feature off.
Add CONFIG_NETFILTER_XTABLES_COMPAT kconfig knob and make it default to y
to keep existing behaviour.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update batch callbacks to use the nfnl_info structure. Rename one
clashing info variable to expr_info.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull crypto updates from Herbert Xu:
"API:
- crypto_destroy_tfm now ignores errors as well as NULL pointers
Algorithms:
- Add explicit curve IDs in ECDH algorithm names
- Add NIST P384 curve parameters
- Add ECDSA
Drivers:
- Add support for Green Sardine in ccp
- Add ecdh/curve25519 to hisilicon/hpre
- Add support for AM64 in sa2ul"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (184 commits)
fsverity: relax build time dependency on CRYPTO_SHA256
fscrypt: relax Kconfig dependencies for crypto API algorithms
crypto: camellia - drop duplicate "depends on CRYPTO"
crypto: s5p-sss - consistently use local 'dev' variable in probe()
crypto: s5p-sss - remove unneeded local variable initialization
crypto: s5p-sss - simplify getting of_device_id match data
ccp: ccp - add support for Green Sardine
crypto: ccp - Make ccp_dev_suspend and ccp_dev_resume void functions
crypto: octeontx2 - add support for OcteonTX2 98xx CPT block.
crypto: chelsio/chcr - Remove useless MODULE_VERSION
crypto: ux500/cryp - Remove duplicate argument
crypto: chelsio - remove unused function
crypto: sa2ul - Add support for AM64
crypto: sa2ul - Support for per channel coherency
dt-bindings: crypto: ti,sa2ul: Add new compatible for AM64
crypto: hisilicon - enable new error types for QM
crypto: hisilicon - add new error type for SEC
crypto: hisilicon - support new error types for ZIP
crypto: hisilicon - dynamic configuration 'err_info'
crypto: doc - fix kernel-doc notation in chacha.c and af_alg.c
...
Add a new structure to reduce callback footprint and to facilite
extensions of the nfnetlink callback interface in the future.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>