tcf_gate_walker() and tcf_gate_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_gact_walker() and tcf_gact_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_ctinfo_walker() and tcf_ctinfo_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_ct_walker() and tcf_ct_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_csum_walker() and tcf_csum_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_connmark_walker() and tcf_connmark_search() do the same thing as
generic walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_bpf_walker() and tcf_bpf_search() do the same thing as generic
walk/search function, so remove them.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Being able to get tc_action_net by using net_id stored in tc_action_ops
and execute the generic walk/search function, add __tcf_generic_walker()
and __tcf_idr_search() helpers.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each tc action module has a corresponding net_id, so put net_id directly
into the structure tc_action_ops.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju says:
====================
net: lan743x: Fix to use multiqueue start/stop APIs
This patch series address the fix to use multiqueue start/stop APIs and Add Rx IP and TCP checksum offload
Changes:
========
V1 -> V2:
- Fix the sparse warnings
V0 -> V1:
- Remove chip SKU check conditionals
- Update the changes description
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Rx IP and TCP checksum offload
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix to use multiqueue start/stop APIs
- Change to return NETDEV_TX_BUSY instead of holding the TX skb when busy
- Increase Tx ring size to 128 to address performance issues in some platforms
- Use NAPI_POLL_WEIGHT for Tx Napi handler instead of ring dependent value
- Use multiqueue to register 4 Rx channels
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal says:
====================
The following set contains changes for your *net-next* tree:
- make conntrack ignore packets that are delayed (containing
data already acked). The current behaviour to flag them as INVALID
causes more harm than good, let them pass so peer can send an
immediate ACK for the most recent sequence number.
- make conntrack recognize when both peers have sent 'invalid' FINs:
This helps cleaning out stale connections faster for those cases where
conntrack is no longer in sync with the actual connection state.
- Now that DECNET is gone, we don't need to reserve space for DECNET
related information.
- compact common 'find a free port number for the new inbound
connection' code and move it to a helper, then cap number of tries
the new helper will make until it gives up.
- replace various instances of strlcpy with strscpy, from Wolfram Sang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/freescale/fec.h
7d650df99d ("net: fec: add pm_qos support on imx6q platform")
40c79ce13b ("net: fec: add stop mode support for imx8 platform")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Function returns error integer, not bool.
Does not have any impact on functionality.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220906065815.3856323-1-casper.casan@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
wireless and bluetooth subtrees
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmMZwOMSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkCVMP/jUOqbRTzPOqV7+HhDM65Ww9Prb1WYKS
4o9Vi9DAL/DH8tXSDDtacE0GbtN6Mlr0kEYJE+lwn1hFsfGY1mzH2pcVjJqtZ7fh
6o2joaQMJ5lJe4dsr2k0TRtYhCzdeCrvzoTs2qLSeb5KYFOsxMtBikSF3kOJ1TZq
3bOK3OomiT/XO4FCnyPJvg8VBghkp69oNnefOavM8x7FzrMh6MY7VQem/IjnlIU2
9sHvMPLai81B1NXu2eybThEYSqutZHzM1PLuqIMhSMwdiSrVRlLFq7/BNP0FxCTR
/Mby+fnU4xIu+TK1bRoUM0fsipNceVEkXln4pQrRmxu1Yw62RFn5p+MGCM+Sc1UB
8HZzjtNRtlD+ZfRyQKs2m1+WJtbuESyzAeJaKoQ2eO2StmLqlnZLEYGl6oVQj6Uj
l3Bn1aBIOYrSM0T/qTNmAv7BTJYvpomlwqoQ5A1+5b0jKhn39Un3Yj3eU2aEv8eX
mJ80KAWk26Cc7ZA8WbL70Ac8wV76+TcpWCezVvrcZqrFoYAvicLrwEc92Fq0IiG5
bkA84zhd1TOFSuYVH7f79lSUPYHIs8FxFXIV8SZZGLbhj6nk607cwgbpA9IXizZT
CUajNoJclS1tELPG7VOv6DJS+VnpdsCm0+q3dnYTYiLlrEX71Bgc/ofgGOBKTeS3
iRyS2V0GLYoW
=DeAB
-----END PGP SIGNATURE-----
Merge tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc, netfilter, wireless and bluetooth
subtrees.
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for
of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after
enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in
rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N"
* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
sch_sfb: Also store skb len before calling child enqueue
net: phy: lan87xx: change interrupt src of link_up to comm_ready
net/smc: Fix possible access to freed memory in link clear
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
net: usb: qmi_wwan: add Quectel RM520N
net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
stmmac: intel: Simplify intel_eth_pci_remove()
net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
ipv6: sr: fix out-of-bounds read when setting HMAC data.
bonding: accept unsolicited NA message
bonding: add all node mcast address when slave up
bonding: use unspecified address if no available link local address
wifi: use struct_group to copy addresses
wifi: mac80211_hwsim: check length for virtio packets
...
Commit d4252071b9 ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression
although you presumably need a fast disk and a lot of cores to actually
notice.
Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Fengwei Yin <fengwei.yin@intel.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- prevent the randstruct plugin from re-ordering EFI protocol
definitions;
- fix a use-after-free in the capsule loader
- drop unused variable
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmMZw2gACgkQw08iOZLZ
jyQ+Rgv/XGT12sS/7sBldWWRdJcz7ix9G/nZQHAK68qahC9WSIIXbmeZDPspdP6C
7WZs9VQ1CizwHbBxX4voKJD2gV8PDF8hjcq+i5YsHPCuW7Rn607gsPBjBUuISuen
8M3Yrr3xj+uBmkTRqTV2/WgkWgmPrzdicXbP6TcWe9ZFPRO4eaKuqVgFzC0Z2W3h
3O4X8tjth25PAmbYcOJ8H9i9GlVBm/al0it+OM4HgyQduPMRPWBUtibitg5QC8P1
rOqeeAeZAt58qRN5Tf5t/sDSARnjyiqm4ruNpL15/cACaur+YFGtu3Y0i/+NCaw/
s8Aqh2z7ljxb8tYgAYtDofNvQe8aY3bOwnI3rJLL4cd5AvEJQF27A545IpPu78YE
Zvu7Gs+qk12669dePnggFEYQg9MCRqRN40QWnADXBoT8kn1x0z3Pj1OdyN5AhI8t
DqE9bxWqEoh3rDeGMkp5tL0PYJ6SYJ9ldfCnWxOPl1j2g7WP94fRUhMxa2IlqVjx
PRv4/gf8
=IHhL
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"A couple of low-priority EFI fixes:
- prevent the randstruct plugin from re-ordering EFI protocol
definitions
- fix a use-after-free in the capsule loader
- drop unused variable"
* tag 'efi-urgent-for-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: capsule-loader: Fix use-after-free in efi_capsule_write
efi/x86: libstub: remove unused variable
efi: libstub: Disable struct randomization
These chip versions are closely related and all of them have no
chip-specific MAC/PHY initialization. Therefore merge support
for the three chip versions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/469d27e0-1d06-9b15-6c96-6098b3a52e35@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cong Wang noticed that the previous fix for sch_sfb accessing the queued
skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
function was also calling qdisc_qstats_backlog_inc() after enqueue, which
reads the pkt len from the skb cb field. Fix this by also storing the skb
len, and using the stored value to increment the backlog after enqueueing.
Fixes: 9efd23297c ("sch_sfb: Don't assume the skb is still around after enqueueing to child")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently phy link up/down interrupt is enabled using the
LAN87xx_INTERRUPT_MASK register. In the lan87xx_read_status function,
phy link is determined using the T1_MODE_STAT_REG register comm_ready bit.
comm_ready bit is set using the loc_rcvr_status & rem_rcvr_status.
Whenever the phy link is up, LAN87xx_INTERRUPT_SOURCE link_up bit is set
first but comm_ready bit takes some time to set based on local and
remote receiver status.
As per the current implementation, interrupt is triggered using link_up
but the comm_ready bit is still cleared in the read_status function. So,
link is always down. Initially tested with the shared interrupt
mechanism with switch and internal phy which is working, but after
implementing interrupt controller it is not working.
It can fixed either by updating the read_status function to read from
LAN87XX_INTERRUPT_SOURCE register or enable the interrupt mask for
comm_ready bit. But the validation team recommends the use of comm_ready
for link detection.
This patch fixes by enabling the comm_ready bit for link_up in the
LAN87XX_INTERRUPT_MASK_2 register (MISC Bank) and link_down in
LAN87xx_INTERRUPT_MASK register.
Fixes: 8a1b415d70 ("net: phy: added ethtool master-slave configuration support")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220905152750.5079-1-arun.ramadoss@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The stmmac has the possibility to automatically strip the padding/FCS for IEEE
802.3 type frames. This feature is enabled conditionally. Therefore, the stmmac
receive path has to have a determination logic whether the FCS has to be
stripped in software or not.
In fact, for DSA this ACS feature is disabled and the determination logic
doesn't check for it properly. For instance, when using DSA in combination with
an older stmmac (pre version 4), the FCS is not stripped by hardware or software
which is problematic.
So either add another check for DSA to the fast path or simply disable ACS
feature completely. The latter approach has been chosen, because most of the
time the FCS is stripped in software anyway and it removes conditionals from the
receive fast path.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/87v8q8jjgh.fsf@kurt/
Link: https://lore.kernel.org/r/20220905130155.193640-1-kurt@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A race condition may occur if the user calls close() on another thread
during a write() operation on the device node of the efi capsule.
This is a race condition that occurs between the efi_capsule_write() and
efi_capsule_flush() functions of efi_capsule_fops, which ultimately
results in UAF.
So, the page freeing process is modified to be done in
efi_capsule_release() instead of efi_capsule_flush().
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Guangbin Huang says:
====================
hns3: add some new features
This series adds some new features for the HNS3 ethernet driver.
Patches #1~#3 support configuring dscp map to tc.
Patch 4# supports querying FEC statistics by command "ethtool -I --show-fec eth0".
Patch 5# supports querying and setting Serdes lane number.
Change logs:
V1 -> V2:
- fix build error of patch 1# reported by robot lkp@intel.com.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When serdes lane support setting 25Gb/s or 50Gb/s speed and user wants to
set port speed as 50Gb/s, it can be setted as one 50Gb/s serdes lane or
two 25Gb/s serdes lanes.
So, this patch adds support to query and set lane number by ethtool
to satisfy this scenario.
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FEC statistics can be used to check the transmission quality of links.
This patch implements the get_fec_stats callback of ethtool_ops to support
querying FEC statistics by command "ethtool -I --show-fec eth0".
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add dump the map relation for dscp, priority and TC, and
the current tc map mode.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support tx packets to select queue according to its dscp field after
setting dscp and tc map relationship, this patch implements
ndo_select_queue() to set skb->priority according to the user's setting
dscp and priority map relationship.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add support config dscp map to tc by implementing ieee_setapp
and ieee_delapp of struct dcbnl_rtnl_ops. Driver will convert mapping
relationship from dscp-prio to dscp-tc.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-06 (i40e, iavf)
This series contains updates to i40e and iavf drivers.
Stanislaw adds support for new device id for i40e.
Jaroslaw tidies up some code around MSI-X configuration by adding/
reworking comments and introducing a couple of macros for i40e.
Michal resolves some races around reset and close by deferring and deleting
some pending AdminQ operations and reworking filter additions and deletions
during these operations for iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-09-06 (ice)
This series contains updates to ice driver only.
Tony reduces device MSI-X request/usage when entire request can't be fulfilled.
Michal adds check for reset when waiting for PTP offsets.
Paul refactors firmware version checks to use a common helper.
Christophe Jaillet changes a couple of local memory allocation to not
use the devm variant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Almost all nat helpers reserve an expecation port the same way:
Try the port inidcated by the peer, then move to next port if that
port is already in use.
We can squash this into a helper.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Florian Westphal <fw@strlen.de>
In case the endpoints and conntrack go out-of-sync, i.e. there is
disagreement wrt. validy of sequence/ack numbers between conntracks
internal state and those of the endpoints, connections can hang for a
long time (until ESTABLISHED timeout).
This adds a check to detect a fin/fin exchange even if those are
invalid. The timeout is then lowered to UNACKED (default 300s).
Signed-off-by: Florian Westphal <fw@strlen.de>
After previous patch, the conditional branch is obsolete, reformat it.
gcc generates same code as before this change.
Signed-off-by: Florian Westphal <fw@strlen.de>
The variable long_max is replaced by bpf_jit_limit_max and no longer be
used. So remove it.
No functional change.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of mtk_foe_entry_timestamp routine since it is no longer used.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if max hash configured in hw in mtk_ppe_hash_entry is
MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in
mtk_ppe_check_skb routine
Fixes: c4f033d9e0 ("net: ethernet: mtk_eth_soc: rework hardware flow table management")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Eric reported, the 'reason' field is not presented when trace the
kfree_skb event by perf:
$ perf record -e skb:kfree_skb -a sleep 10
$ perf script
ip_defrag 14605 [021] 221.614303: skb:kfree_skb:
skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1
reason:
The cause seems to be passing kernel address directly to TP_printk(),
which is not right. As the enum 'skb_drop_reason' is not exported to
user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason
string from the 'reason' field, which is a number.
Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used
to define the trace enum by TRACE_DEFINE_ENUM(). With the help of
DEFINE_DROP_REASON(), now we can remove the auto-generate that we
introduced in the commit ec43908dd5
("net: skb: use auto-generation to convert skb drop reason to string"),
and define the string array 'drop_reasons'.
Hmmmm...now we come back to the situation that have to maintain drop
reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they
are both in dropreason.h, which makes it easier.
After this commit, now the format of kfree_skb is like this:
$ cat /tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 1524
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:void * location; offset:16; size:8; signed:0;
field:unsigned short protocol; offset:24; size:2; signed:0;
field:enum skb_drop_reason reason; offset:28; size:4; signed:0;
print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ......
Fixes: ec43908dd5 ("net: skb: use auto-generation to convert skb drop reason to string")
Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine.
Fixes: 33fc42de33 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If 'nf_conntrack_tcp_loose' is off (the default), tcp packets that are
outside of the current window are marked as INVALID.
nf/iptables rulesets often drop such packets via 'ct state invalid' or
similar checks.
For overly delayed acks, this can be a nuisance if such 'invalid' packets
are also logged.
Since they are not invalid in a strict sense, just ignore them, i.e.
conntrack won't extend timeout or change state so that they do not match
invalid state rules anymore.
This also avoids unwantend connection stalls in case conntrack considers
retransmission (of data that did not reach the peer) as too old.
The else branch of the conditional becomes obsolete.
Next patch will reformant the now always-true if condition.
The existing workaround for data that exceeds the calculated receive
window is adjusted to use the 'ignore' state so that these packets do
not refresh the timeout or change state other than updating ->td_end.
Signed-off-by: Florian Westphal <fw@strlen.de>
tcp_in_window returns true if the packet is in window and false if it is
not.
If its outside of window, packet will be treated as INVALID.
There are corner cases where the packet should still be tracked, because
rulesets may drop or log such packets, even though they can occur during
normal operation, such as overly delayed acks.
In extreme cases, connection may hang forever because conntrack state
differs from real state.
There is no retransmission for ACKs.
In case of ACK loss after conntrack processing, its possible that a
connection can be stuck because the actual retransmits are considered
stale ("SEQ is under the lower bound (already ACKed data
retransmitted)".
The problem is made worse by carrier-grade-nat which can also result
in stale packets from old connections to get treated as 'recent' packets
in conntrack (it doesn't support tcp timestamps at this time).
Prepare tcp_in_window() to return an enum that tells the desired
action (in-window/accept, bogus/drop).
A third action (accept the packet as in-window, but do not change
state) is added in a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Saeed Mahameed says:
====================
Introduce MACsec skb_metadata_dst and mlx5 macsec offload
v1->v2:
- attach mlx5 implementation patches.
This patchset introduces MACsec skb_metadata_dst to lay the ground
for MACsec HW offload.
MACsec is an IEEE standard (IEEE 802.1AE) for MAC security.
It defines a way to establish a protocol independent connection
between two hosts with data confidentiality, authenticity and/or
integrity, using GCM-AES. MACsec operates on the Ethernet layer and
as such is a layer 2 protocol, which means it’s designed to secure
traffic within a layer 2 network, including DHCP or ARP requests.
Linux has a software implementation of the MACsec standard and
HW offloading support.
The offloading is re-using the logic, netlink API and data
structures of the existing MACsec software implementation.
For Tx:
In the current MACsec offload implementation, MACsec interfaces shares
the same MAC address by default.
Therefore, HW can't distinguish from which MACsec interface the traffic
originated from.
MACsec stack will use skb_metadata_dst to store the SCI value, which is
unique per MACsec interface, skb_metadat_dst will be used later by the
offloading device driver to associate the SKB with the corresponding
offloaded interface (SCI) to facilitate HW MACsec offload.
For Rx:
Like in the Tx changes, if there are more than one MACsec device with
the same MAC address as in the packet's destination MAC, the packet will
be forward only to one of the devices and not neccessarly to the desired one.
Offloading device driver sets the MACsec skb_metadata_dst sci
field with the appropriaate Rx SCI for each SKB so the MACsec rx handler
will know to which port to divert those skbs, instead of wrongly solely
relaying on dst MAC address comparison.
1) patch 1,2, Add support to skb_metadata_dst in MACsec code:
net/macsec: Add MACsec skb_metadata_dst Tx Data path support
net/macsec: Add MACsec skb_metadata_dst Rx Data path support
2) patch 3, Move some MACsec driver code for sharing with various
drivers that implements offload:
net/macsec: Move some code for sharing with various drivers that
implements offload
3) The rest of the patches introduce mlx5 implementation for macsec
offloads TX and RX via steering tables.
a) TX, intercept skbs with macsec offlad mark in skb_metadata_dst and mark
the descriptor for offload.
b) RX, intercept offloaded frames and prepare the proper
skb_metadata_dst to mark offloaded rx frames.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to add up to 16 MACsec offload interfaces
over the same physical interface
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following statistics:
RX successfully decrypted MACsec packets:
macsec_rx_pkts : Number of packets decrypted successfully
macsec_rx_bytes : Number of bytes decrypted successfully
Rx dropped MACsec packets:
macsec_rx_pkts_drop : Number of MACsec packets dropped
macsec_rx_bytes_drop : Number of MACsec bytes dropped
TX successfully encrypted MACsec packets:
macsec_tx_pkts : Number of packets encrypted/authenticated successfully
macsec_tx_bytes : Number of bytes encrypted/authenticated successfully
Tx dropped MACsec packets:
macsec_tx_pkts_drop : Number of MACsec packets dropped
macsec_tx_bytes_drop : Number of MACsec bytes dropped
The above can be seen using:
ethtool -S <ifc> |grep macsec
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add offload support for MACsec SecY callbacks - add/update/delete.
add_secy is called when need to create a new MACsec interface.
upd_secy is called when source MAC address or tx SC was changed.
del_secy is called when need to destroy the MACsec interface.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MACsec driver need to distinguish to which offload device the MACsec
is target to, in order to handle them correctly.
This can be done by attaching a metadata_dst to a SKB with a SCI,
when there is a match on MACsec rule.
To achieve that, there is a map between fs_id to SCI, so for each RX SC,
there is a unique fs_id allocated when creating RX SC.
fs_id passed to device driver as metadata for packets that passed Rx
MACsec offload to aid the driver to retrieve the matching SCI.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>