Commit Graph

1031197 Commits

Author SHA1 Message Date
Rikard Falkeborn ade448c7e5 watchdog: sl28cpld_wdt: Constify static struct watchdog_ops
The struct sl28cpld_wdt_ops is only assigned to the ops pointer in the
watchdog_device struct, which is a pointer to const struct watchdog_ops.
Make it const to allow the compiler to put it in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210727223042.48150-2-rikard.falkeborn@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:09 +02:00
Jan Kiszka aec42642d9 watchdog: iTCO_wdt: Fix detection of SMI-off case
Obviously, the test needs to run against the register content, not its
address.

Fixes: cb011044e3 ("watchdog: iTCO_wdt: Account for rebooting on second timeout")
Cc: stable@vger.kernel.org
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Mantas Mikulėnas <grawity@gmail.com>
Link: https://lore.kernel.org/r/d84f8e06-f646-8b43-d063-fb11f4827044@siemens.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:09 +02:00
Stefan Wahren a4f95810e3 watchdog: bcm2835_wdt: consider system-power-controller property
Until now all Raspberry Pi boards used the power off function of the SoC.
But the Raspberry Pi 400 uses gpio-poweroff for the whole board which
possibly cannot register the poweroff handler because the it's
already registered by this watchdog driver. So consider the
system-power-controller property for registering, which is already
defined in soc/bcm/brcm,bcm2835-pm.txt .

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenz@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1622981777-5023-3-git-send-email-stefan.wahren@i2se.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:08 +02:00
Grzegorz Jaszczyk 14244b7c04 watchdog: imx2_wdg: notify wdog core to stop ping worker on suspend
Suspend routine disables wdog clk. Nevertheless, the watchdog subsystem
is not aware of that and can still try to ping wdog through
watchdog_ping_work. In order to prevent such condition and therefore
prevent from system hang (caused by the wdog register access issued
while the wdog clock is disabled) notify watchdog core that the ping
worker should be canceled during watchdog core suspend and restored
during resume.

Signed-off-by: Michal Koziel <michal.koziel@emlogic.no>
Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210618195033.3209598-3-grzegorz.jaszczyk@linaro.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:08 +02:00
Grzegorz Jaszczyk 60bcd91aaf watchdog: introduce watchdog_dev_suspend/resume
The watchdog drivers often disable wdog clock during suspend and then
enable it again during resume. Nevertheless the ping worker is still
running and can issue low-level ping while the wdog clock is disabled
causing the system hang. To prevent such condition register pm notifier
in the watchdog core which will call watchdog_dev_suspend/resume and
actually cancel ping worker during suspend and restore it back, if
needed, during resume.

Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210618195033.3209598-2-grzegorz.jaszczyk@linaro.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:08 +02:00
Curtis Klein c7b178dae1 watchdog: Fix NULL pointer dereference when releasing cdev
watchdog_hrtimer_pretimeout_stop needs the watchdog device to have a
valid pointer to the watchdog core data to stop the pretimeout hrtimer.
Therefore it needs to be called before the pointers are cleared in
watchdog_cdev_unregister.

Fixes: 7b7d2fdc8c ("watchdog: Add hrtimer-based pretimeout feature")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Curtis Klein <curtis.klein@hpe.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1624429583-5720-1-git-send-email-curtis.klein@hpe.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:07 +02:00
Curtis Klein cf6ea95423 watchdog: only run driver set_pretimeout op if device supports it
Some watchdog devices might conditionally support pretimeouts (e.g. if
an interrupt is exposed for the device) but some watchdog drivers might
still define the set_pretimeout operation (e.g. the mtk_wdt driver) and
indicate support at runtime through the WDIOF_PRETIMEOUT flag. If the
kernel is compiled with CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT enabled,
watchdog_set_pretimeout would run the driver specific set_pretimeout
even if WDIOF_PRETIMEOUT is not set which might have unintended
consequences.

So this change checks that the device flags and only runs the driver
operation if pretimeouts are supported.

Signed-off-by: Curtis Klein <curtis.klein@hpe.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1624751265-24785-1-git-send-email-curtis.klein@hpe.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:07 +02:00
Matti Vaittinen 52a5502507 watchdog: bd70528 drop bd70528 support
The only known BD70528 use-cases are such that the PMIC is controlled
from separate MCU which is not running Linux. I am not aware of
any Linux driver users. Furthermore, it seems there is no demand for
this IC. Let's ease the maintenance burden and drop the driver. We can
always add it back if there is sudden need for it.

Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/994d2e374262c3f59f4465c03ef23d3116120778.1621937490.git.matti.vaittinen@fi.rohmeurope.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2021-08-22 10:28:06 +02:00
Linus Torvalds d992fe5318 ARM: SoC fixes for 5.14, part 3
Not much to see here. Half the fixes this time are for Qualcomm dts files,
 fixing small mistakes on certain machines. The other fixes are:
 
  - A 5.13 regression fix for freescale QE interrupt controller\
 
  - A fix for TI OMAP gpt12 timer error handling
 
  - A randconfig build regression fix for ixp4xx
 
  - Another defconfig fix following the CONFIG_FB dependency rework
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmEe0MsACgkQmmx57+YA
 GNkQaw/5Ae6pR1VN2r727xHfYPOG9tgHuU0eAUsDvZjVxQSWlC/TowMZ/+eMPoyc
 6iwUz2W1BZrkPbc/3GfO/xQQq9HbdQbB0+hma9jkNVueSgURpaDsLHm1Qt8vXKw1
 rSa/ITHIOuHbYE63RGU48/8qw/Xyr6JJRJpjZKuRXQRAJhuJisw13w0IJAFStvPC
 GhgFkvkKruls1zsaeV5BeU1EZRnFCz9dZL519SPzol/dZW2allu9yiCFUopcdMJ+
 G/XyBwL+JVkQuLGy/Y8n4CifbFsyHPOv/dj4SxGDFwXDYPb1l+4+CwkdjuBoSeYE
 glbzJQJYZ7/QVyvUDIz5h5eulo03xrsx+80SQPCXjfmut+mWcLL3uSOcXb169F4S
 VB0rHgusXLL6Z7NbqWigo5YF58DqpDKa19rLCpW+/QqDuhyusm91RbMIs0oLJm0B
 n6HjYganyJM5VWgN5WvTpPGW/yJnt1uJoOwtgxKZSP95lmL7JRhUKzTI2AiZjo+8
 6zvy6QFlgMrjoG8mfw7Ns+sS9sAXTxE3YwL8AyFtkn2JiAYH+sP2J4Wn6P+E8/kh
 F5WtypSQaE2oQYgF8D06jq2Jd89dZdP+6ZABHlYZyflbbezJ/em1sxhohbRRLU10
 5C/Mqwo9/yVg2tOKKDkFqAkb6eq9QKvSDz9L7jrfRDQm5RMYnb4=
 =Ohvh
 -----END PGP SIGNATURE-----

Merge tag 'soc-fixes-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Not much to see here. Half the fixes this time are for Qualcomm dts
  files, fixing small mistakes on certain machines. The other fixes are:

   - A 5.13 regression fix for freescale QE interrupt controller\

   - A fix for TI OMAP gpt12 timer error handling

   - A randconfig build regression fix for ixp4xx

   - Another defconfig fix following the CONFIG_FB dependency rework"

* tag 'soc-fixes-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  soc: fsl: qe: fix static checker warning
  ARM: ixp4xx: fix building both pci drivers
  ARM: configs: Update the nhk8815_defconfig
  bus: ti-sysc: Fix error handling for sysc_check_active_timer()
  soc: fsl: qe: convert QE interrupt controller to platform_device
  arm64: dts: qcom: sdm845-oneplus: fix reserved-mem
  arm64: dts: qcom: msm8994-angler: Disable cont_splash_mem
  arm64: dts: qcom: sc7280: Fixup cpufreq domain info for cpu7
  arm64: dts: qcom: msm8992-bullhead: Fix cont_splash_mem mapping
  arm64: dts: qcom: msm8992-bullhead: Remove PSCI
  arm64: dts: qcom: c630: fix correct powerdown pin for WSA881x
2021-08-19 15:32:58 -07:00
Linus Torvalds f87d64319e Networking fixes for 5.14-rc7, including fixes from bpf, wireless and
mac80211 trees.
 
 Current release - regressions:
 
  - tipc: call tipc_wait_for_connect only when dlen is not 0
 
  - mac80211: fix locking in ieee80211_restart_work()
 
 Current release - new code bugs:
 
  - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()
 
  - ethernet: ice: fix perout start time rounding
 
  - wwan: iosm: prevent underflow in ipc_chnl_cfg_get()
 
 Previous releases - regressions:
 
  - bpf: clear zext_dst of dead insns
 
  - sch_cake: fix srchost/dsthost hashing mode
 
  - vrf: reset skb conntrack connection on VRF rcv
 
  - net/rds: dma_map_sg is entitled to merge entries
 
 Previous releases - always broken:
 
  - ethernet: bnxt: fix Tx path locking and races, add Rx path barriers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEepAYACgkQMUZtbf5S
 IrugdhAAo6/T9A8MmxU711gwq9BKcubOr2BQC0GYGjhM6Pnhign44kDGdBbl1fmL
 lt4LTgOwIKZ4bpLYJuwXnq0KpwWG8YKpgdgjvXsc0jXYtDHJvFjzL42vjGGGvAc4
 rwIe784dNt3LYppEsADAad2ZdQO25tB+jhQO8MttipNGq8Qvkjib2ttvmLri/hEl
 hKRHnyQMEZCXbMV4zn+ILjlqLxZ5a2ZPk97qGL9fAafi1O3cjfv5ZDwvOjM3TGE3
 DPLdEvYFRDTECGJO3QOK7SzC1NoQA49Bj1hqwbWRUi9tm8A0lPXEUjNleCbC2i7n
 qi6JBexmme6ZiimJJPeKacn8BcgorPDWm6friL9jiWpndvrWycBmqw5LurgdLVHC
 nFHjnbji885P6DBZLx8tDbTSXGSpcLoHuv7M3aQbD2DZqQKk/9irKaJMUZ6XwOYm
 EXM6oqQ13vaSjFH1GOsK0wx/XjcL5t42uRAtG9INlYusaU+ZvFu12TzEt1+PcWpO
 nF7VUQWQatcidSYdYolciw383siKRiSV1F8d6IEon907GHVrod9ty+yii33CamBf
 /aUyFULMEf/vaXjmJzhzWABV9vu3QVWd0IxL9oqbbRqNTwFr86hnnRhntqUCje/c
 vY0VFWQ4CuaTvohKl0I+IhyvUPLUp1iTD51qSbOSKybG6IZ77+U=
 =ofv+
 -----END PGP SIGNATURE-----

Merge tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from bpf, wireless and mac80211
  trees.

  Current release - regressions:

   - tipc: call tipc_wait_for_connect only when dlen is not 0

   - mac80211: fix locking in ieee80211_restart_work()

  Current release - new code bugs:

   - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()

   - ethernet: ice: fix perout start time rounding

   - wwan: iosm: prevent underflow in ipc_chnl_cfg_get()

  Previous releases - regressions:

   - bpf: clear zext_dst of dead insns

   - sch_cake: fix srchost/dsthost hashing mode

   - vrf: reset skb conntrack connection on VRF rcv

   - net/rds: dma_map_sg is entitled to merge entries

  Previous releases - always broken:

   - ethernet: bnxt: fix Tx path locking and races, add Rx path
     barriers"

* tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
  net: dpaa2-switch: disable the control interface on error path
  Revert "flow_offload: action should not be NULL when it is referenced"
  iavf: Fix ping is lost after untrusted VF had tried to change MAC
  i40e: Fix ATR queue selection
  r8152: fix the maximum number of PLA bp for RTL8153C
  r8152: fix writing USB_BP2_EN
  mptcp: full fully established support after ADD_ADDR
  mptcp: fix memory leak on address flush
  net/rds: dma_map_sg is entitled to merge entries
  net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
  net: asix: fix uninit value bugs
  ovs: clear skb->tstamp in forwarding path
  net: mdio-mux: Handle -EPROBE_DEFER correctly
  net: mdio-mux: Don't ignore memory allocation errors
  net: mdio-mux: Delete unnecessary devm_kfree
  net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
  sch_cake: fix srchost/dsthost hashing mode
  ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
  net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
  mac80211: fix locking in ieee80211_restart_work()
  ...
2021-08-19 12:33:43 -07:00
Linus Torvalds e649e4c806 platform-drivers-x86 for v5.14-4
A small set of pdx86 fixes for 5.14:
 - asus-nb-wmi: Enable SW_TABLET_MODE support for the TP200s (DMI quirk)
 - gigabyte-wmi: Enable on 2 more Gigabyte motherboards (2 DMI quirks)
 
 The following is an automated git shortlog grouped by driver:
 
 asus-nb-wmi:
  -  Add tablet_mode_sw=lid-flip quirk for the TP200s
  -  Allow configuring SW_TABLET_MODE method with a module option
 
 gigabyte-wmi:
  -  add support for B450M S2H V2
  -  add support for X570 GAMING X
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmEeCR4UHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yciAf7BkMzY4Lc9BAmxaEGQXKBBup9CDgl
 WXN1lJLt8pGiAQIXONmbymyi3sq5rZQGMJU585pmNH40xScAfUOv0LtHW50dQsPZ
 a3DAHwmoVGMDIXXLeufagHTmRovcv2uAsO/AbBN52kJg9TfT4jiA6XvOCL8Ayx30
 9VzJK8P1ADqiA34AFNifbB+HbsTP/hCngr1HJL1f24SPY9L7R+MH13uW5uaXXKTl
 Ltz/QIl+/N+pRwpgzchCT6snLLcX1e/kQQ5pIlb/hUWrW7NkplgzPe9yQ9bEOScy
 Co7GH07j4KLLHbkq23bg3GztkezHDr4R+Cc3rdiO8OfnkmU93u5srdcUvA==
 =tOIH
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - Enable SW_TABLET_MODE support for the TP200s

 - Enable WMI on two more Gigabyte motherboards

* tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: gigabyte-wmi: add support for B450M S2H V2
  platform/x86: gigabyte-wmi: add support for X570 GAMING X
  platform/x86: asus-nb-wmi: Add tablet_mode_sw=lid-flip quirk for the TP200s
  platform/x86: asus-nb-wmi: Allow configuring SW_TABLET_MODE method with a module option
2021-08-19 12:19:58 -07:00
Vladimir Oltean cd0a719fbd net: dpaa2-switch: disable the control interface on error path
Currently dpaa2_switch_takedown has a funny name and does not do the
opposite of dpaa2_switch_init, which makes probing fail when we need to
handle an -EPROBE_DEFER.

A sketch of what dpaa2_switch_init does:

	dpsw_open

	dpaa2_switch_detect_features

	dpsw_reset

	for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
		dpsw_if_disable

		dpsw_if_set_stp

		dpsw_vlan_remove_if_untagged

		dpsw_if_set_tci

		dpsw_vlan_remove_if
	}

	dpsw_vlan_remove

	alloc_ordered_workqueue

	dpsw_fdb_remove

	dpaa2_switch_ctrl_if_setup

When dpaa2_switch_takedown is called from the error path of
dpaa2_switch_probe(), the control interface, enabled by
dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
because dpaa2_switch_takedown does not call
dpaa2_switch_ctrl_if_teardown.

Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
means that a second probe of the driver will happen with the control
interface directly enabled.

This will trigger a second error:

[   93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
[   93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
[   93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13

Which if we investigate the /dev/dpaa2_mc_console log, we find out is
caused by:

[E, ctrl_if_set_pools:2211, DPMNG]  ctrl_if must be disabled

So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
reasonable limits, no reason to change STP state, re-add VLANs etc), and
rename it to something more conventional, like dpaa2_switch_teardown.

Fixes: 613c0a5810 ("staging: dpaa2-switch: enable the control interface")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 10:00:59 -07:00
Ido Schimmel fa05bdb89b Revert "flow_offload: action should not be NULL when it is referenced"
This reverts commit 9ea3e52c5b.

Cited commit added a check to make sure 'action' is not NULL, but
'action' is already dereferenced before the check, when calling
flow_offload_has_one_action().

Therefore, the check does not make any sense and results in a smatch
warning:

include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn:
variable dereferenced before check 'action' (see line 319)

Fix by reverting this commit.

Cc: gushengxian <gushengxian@yulong.com>
Fixes: 9ea3e52c5b ("flow_offload: action should not be NULL when it is referenced")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 10:00:38 -07:00
Jakub Kicinski d584566c4b Merge branch 'intel-wired-lan-driver-updates-2021-08-18'
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-08-18

This series contains updates to i40e and iavf drivers.

Arkadiusz fixes Flow Director not using the correct queue due to calling
the wrong pick Tx function for i40e.

Sylwester resolves traffic loss for iavf when it attempts to change its
MAC address when it does not have permissions to do so.
====================

Link: https://lore.kernel.org/r/20210818174217.4138922-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 09:56:42 -07:00
Sylwester Dziedziuch 8da80c9d50 iavf: Fix ping is lost after untrusted VF had tried to change MAC
Make changes to MAC address dependent on the response of PF.
Disallow changes to HW MAC address and MAC filter from untrusted
VF, thanks to that ping is not lost if VF tries to change MAC.
Add a new field in iavf_mac_filter, to indicate whether there
was response from PF for given filter. Based on this field pass
or discard the filter.
If untrusted VF tried to change it's address, it's not changed.
Still filter was changed, because of that ping couldn't go through.

Fixes: c5c922b3e0 ("iavf: fix MAC address setting for VFs when filter is rejected")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <Gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 09:56:15 -07:00
Arkadiusz Kubalewski a222be597e i40e: Fix ATR queue selection
Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.

If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.

Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.

Fixes: 89ec1f0886 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 09:55:23 -07:00
Jakub Kicinski 316749009f Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-08-19

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 3 files changed, 29 insertions(+), 6 deletions(-).

The main changes are:

1) Fix to clear zext_dst for dead instructions which was causing invalid program
   rejections on JITs with bpf_jit_needs_zext such as s390x, from Ilya Leoshkevich.

2) Fix RCU splat in bpf_get_current_{ancestor_,}cgroup_id() helpers when they are
   invoked from sleepable programs, from Yonghong Song.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests, bpf: Test that dead ldx_w insns are accepted
  bpf: Clear zext_dst of dead insns
  bpf: Add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
====================

Link: https://lore.kernel.org/r/20210819144904.20069-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 08:58:17 -07:00
Arnd Bergmann 1e16a40211 Fix for omap gpt12 timer error handling
Two of the recent fixes for ti-sysc driver had bad interaction for a
 function return value that caused one of the fixes to not work so we
 need to change the return value handling. Otherwise early beagleboard
 variants still have a boot issue.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAmEUxlsRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXObbxAA1BVrUF7L+yT7JPY/Jw6hYSHDpLMMclm4
 SuFNhqYQyyazD10ysOtfWr9LICPkSpPb03EayDQKpGMJthde0Jxuw2zphuW1eP6b
 Ntr4hL+gITaqL6bIjqAcUw5CX94xbkIgARtA8NQIgJ/oDIUfWQs1uH2h4SFzyjv8
 QLbobSV89aVRGN0x/aCqv7Ch46yA1YmL774WxNEY4Qdv2A1N4PLMXjXzHbuHvuIe
 Pb7sP4l22gCdQZei9CAVZifPbktg9AcsURnHHm6QsqWGldjiF8/8eS9jQB5eoRAq
 0vFFpNDMbOCFqm1h2jQ5vXty6eu7UmrpKpzz/zIgXhVyroQYU+oCuSeRaaSM5LC4
 gclUwPvOPQGXYB2lhZp1G+qbdrBxlaUcjjTnPlhCtTbaCXNg/uK4aiTuBwRApsCy
 zO7m9stWy6jIhT6hxCk9C5eS0pLTHLGA4UOH5GtCLG2BzZJMA+ybiVHVYMQlELq+
 E8B3YSdSONQJT3hoHEL96c5MoL/WMn/C5IPxRRVSrrR8hWSCOkeSkGPJiAQF/RQb
 G8m/FHH9PCfRpStWuVriaDUzWt+c204IX/dwkqc9zJO/t/cmNz+ZTVg6S3+K1HWZ
 OHu7+vUlAuoTj5bQSeyck9eC8lfDF88PjyWCnjURyDLE/Mg1MrTHGV0J15NPLu1d
 7Tz0yAJL5oo=
 =MPWH
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmEed0cACgkQmmx57+YA
 GNleSw/+N5tis7eo64yEcbMNTRhm/iF9s3IcclWH2lXwWJPgQnhxof38ccgTiHzY
 Th7nA+fx6X3lSyNKC3yaq/R1fBzHw6ZtkZDQFrJX5a4vPWHnJBT4mzBQJkBVnTKU
 7gTvPEoDnvyTyshhuGFOeViaDnowkeNpyn8Q6roXtDmXxJfZ6318jsFMZ+FO+uMS
 Ra2LW0gals9dvyVvwhcGysOALTVC7iiU/WhUPDAxPcDlIPSh6qfH8hFIfI4l9lA3
 HJKgZXHnM5E8kTC2qVp54s8E5Rv2IQgxpaKSs92uh3PWtTBio4kqIZOeiIUqkqbv
 WW4Pca277L9JEiPxw9jZSLbTSXxzvIEkn0kYZ5V3pvaY+wZIGSQy4G0HUdGPCMgp
 KIcK8dEce8nT9jj4MkjnCKvCrsdydZ4d0WidcqhmpUPHGlyTyC/HGn1oBagY05rO
 Kih1L/TAem4Xz9WBrz3+gR5ct5JyFE7KIGuOj0kgXGrKid7+VSORUK4x7uAJWWYO
 G0UVCvquFU1Hd8dBYEuD3lDNaxSvOKilk6TZm6mU7Gi9eAJo0qqZFNPJvOy9BG91
 bIC9eeXoqNHSN4VRIUTArqCYaZZ97wm/0RKdqpTB5e0NCsOraNmkSTAwVtHijow6
 5gd6QVL53mJgycWwYoBMOJ9BD6Q3zKQCxFyIxpzbDg6OGxnlYbg=
 =NOnI
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v5.14/gpt12-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Fix for omap gpt12 timer error handling

Two of the recent fixes for ti-sysc driver had bad interaction for a
function return value that caused one of the fixes to not work so we
need to change the return value handling. Otherwise early beagleboard
variants still have a boot issue.

* tag 'omap-for-v5.14/gpt12-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  bus: ti-sysc: Fix error handling for sysc_check_active_timer()

Link: https://lore.kernel.org/r/pull-1629354796-830948@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-19 17:22:47 +02:00
David S. Miller c15128c97b Merge branch 'r8152-bp-settings'
Hayes Wang says:

====================
r8152: fix bp settings

Fix the wrong bp settings of the firmware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:19:30 +01:00
Hayes Wang 6633fb83f1 r8152: fix the maximum number of PLA bp for RTL8153C
The maximum PLA bp number of RTL8153C is 16, not 8. That is, the
bp 0 ~ 15 are at 0xfc28 ~ 0xfc46, and the bp_en is at 0xfc48.

Fixes: 195aae321c ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:19:30 +01:00
Hayes Wang a876a33d2a r8152: fix writing USB_BP2_EN
The register of USB_BP2_EN is 16 bits, so we should use
ocp_write_word(), not ocp_write_byte().

Fixes: 9370f2d05a ("support request_firmware for RTL8153")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:19:30 +01:00
David S. Miller d98c821067 Merge branch 'mptcp-fixes'
Mat Martineau says:

====================
mptcp: Bug fixes

Here are two bug fixes for the net tree:

Patch 1 fixes a memory leak that could be encountered when clearing the
list of advertised MPTCP addresses.

Patch 2 fixes a protocol issue early in an MPTCP connection, to ensure
both peers correctly understand that the full MPTCP connection handshake
has completed even when the server side quickly sends an ADD_ADDR
option.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:17:05 +01:00
Matthieu Baerts 67b12f792d mptcp: full fully established support after ADD_ADDR
If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR
with HMAC from the server, it is enough to switch to a "fully
established" mode because it has received more MPTCP options.

It was then OK to enable the "fully_established" flag on the MPTCP
socket. Still, best to check if the ADD_ADDR looks valid by looking if
it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received
while we are not in "fully established" mode, it is strange and then
we should not switch to this mode now.

But that is not enough. On one hand, the path-manager has be notified
the state has changed. On the other hand, the "fully_established" flag
on the subflow socket should be turned on as well not to re-send the
MP_CAPABLE 3rd ACK content with the next ACK.

Fixes: 84dfe3677a ("mptcp: send out dedicated ADD_ADDR packet")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:16:54 +01:00
Paolo Abeni a0eea5f10e mptcp: fix memory leak on address flush
The endpoint cleanup path is prone to a memory leak, as reported
by syzkaller:

 BUG: memory leak
 unreferenced object 0xffff88810680ea00 (size 64):
   comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s)
   hex dump (first 32 bytes):
     58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de  Xu}<....".......
     01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00  ................
   backtrace:
     [<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline]
     [<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170
     [<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
     [<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
     [<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
     [<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
     [<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
     [<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
     [<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
     [<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
     [<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline]
     [<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724
     [<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403
     [<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457
     [<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
     [<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     [<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
     [<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We should not require an allocation to cleanup stuff.

Rework the code a bit so that the additional RCU work is no more needed.

Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:16:54 +01:00
Gerd Rausch fb4b1373dc net/rds: dma_map_sg is entitled to merge entries
Function "dma_map_sg" is entitled to merge adjacent entries
and return a value smaller than what was passed as "nents".

Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len")
rather than the original "nents" parameter ("sg_len").

This old RDS bug was exposed and reliably causes kernel panics
(using RDMA operations "rds-stress -D") on x86_64 starting with:
commit c588072bba ("iommu/vt-d: Convert intel iommu driver to the iommu ops")

Simply put: Linux 5.11 and later.

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:35:50 -07:00
Vladimir Oltean c1930148a3 net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
Currently we are unable to ping a bridge on top of a felix switch which
uses the ocelot-8021q tagger. The packets are dropped on the ingress of
the user port and the 'drop_local' counter increments (the counter which
denotes drops due to no valid destinations).

Dumping the PGID tables, it becomes clear that the PGID_SRC of the user
port is zero, so it has no valid destinations.

But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q
ports) is clearly missing from the forwarding mask of ports that are
under a bridge. So this has always been broken.

Looking at the version history of the patch, in v7
https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/
the code looked like this:

	/* Standalone ports forward only to DSA tag_8021q CPU ports */
	unsigned long mask = cpu_fwd_mask;

(...)
	} else if (ocelot->bridge_fwd_mask & BIT(port)) {
		mask |= ocelot->bridge_fwd_mask & ~BIT(port);

while in v8 (the merged version)
https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/
it looked like this:

	unsigned long mask;

(...)
	} else if (ocelot->bridge_fwd_mask & BIT(port)) {
		mask = ocelot->bridge_fwd_mask & ~BIT(port);

So the breakage was introduced between v7 and v8 of the patch.

Fixes: e21268efbe ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:34:52 -07:00
Linus Torvalds d6d09a6942 for-5.14-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmEdNDEACgkQxWXV+ddt
 WDvfDw//cDnR8HZEtrHwHX9qHitcYs6pubdwwGAsFlSZ/wh0iX05TxUjho4gGYMZ
 Kp9PXipMOEdxNLJ8oaPkI+i8vIXxTWWqAm5ZePkV0cjg+vTgqqKf9NLcMtS34kP4
 /GeQJgul9oreTMbXCx219J0B6lKpl6Iv0sCaSFyN09GIPNI8F6nyDbJTA3JKTRWb
 ElP8mGvdUFFcOKsG6Wh6BU/WVU/My7d+HumApsRXB2lDwmMambAkX0iGpRElGrbD
 ub+5ya0WeO8DB6KsVa4W8cMO5sWV9L9FcXMtGlwLbIkOxFdHvP7CT1pvH3TZe9Wy
 mr8oAL01IktuNjZgQ5sUn+yISf+LuHnWjhpu+QBRuylZiwfpMwSPb0geLcrXcYGj
 i8ERlmJvwbm6dAQlQDbA3yZKH6+FzePyTR99std2LK9JtbqBaFeSS6WM05SpRUDJ
 FNHCLOzsswzBUE54nkqsb+A8tBXpcxnvQkrU+nJeDNUYM9w6S5mbCGeZjxe/n+ov
 TGprz1ar2Ppm9YMH0zj6wOM690nJZYNrAvtUmeCl1xlLERYIoV2jXS1SzkaMdcQu
 u3UVVsOCghPN5krEac4jgiGBdVHvjVnJb/qGBNTpj3aDX29PADHUJr7TmzGJBa8F
 ePWqDngDYi+cVrm9JMls9UJhaCVmhzLAXtN5X3+fKfe5bNQE4gU=
 =aR16
 -----END PGP SIGNATURE-----

Merge tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix for cross-rename, adding a missing check for directory
  and subvolume, this could lead to a crash"

* tag 'for-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: prevent rename2 from exchanging a subvol with a directory from different parents
2021-08-18 12:06:42 -07:00
Linus Torvalds 01f15f3773 sound fixes for 5.14-rc7
Hopefully the last PR for 5.14: here includes only a few regression
 fixes and trivial device quirks.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmEc96sOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8U+xAAhOr64zl3oUAOMIU9znZeF9sRlEDqZC6UJdnX
 H6+a8nZnaW6HhvakYvvwoHIa8hwTRzhpG3PF7p1+He1AGMaOIpAAOZlUTkJ1Ja9x
 TeXnsfEVhWKnIIBtMSkp7hWtmTsaH033AvJfstKNg6lw+ott7uYmeHS+IhOY/1W9
 ASduIOmWe8I7ZbhaLLnZL47CS8o5V1BjDqpig2/hrH0gcVPRda10n9b0VTHUJWrE
 RaFtMzqLuYzxi7TJzGqSJP0XjcHvGalvDSlJ11ny5s/vQJEqjg+8RmPJuTwKcp1j
 gTkG4OLY/kSFXihVGLfSfzAVn4OPK55npaex0uiynGwzYGvAoT65FFAMXalLSKjw
 r37GsUt8kzw9OrMl0LGJC7fYP8aOYCE+KGuSeQpcR3zg+16rG2H5T2wM9rqWOFnS
 N3DllB2GVNtW1T2jFgEyN388F/zPGG3J/Ep1/RU+Nwydzk1UHvePV4aCRc3eaXTF
 y6ewrGOLIl+LQIN8EiurWDDQPFQuknkQ+77dQqn28aJzqdD5CbpPJgJ5DmkdbviU
 dOMzN36DdHBr88EL+kFSu+LR9B0b6ODVadKm08tW44OLrKC0CvBbvRg6DBsNUIal
 E3ZgL9ePzvV4HGA6P8riyfwTMHmVpRt6NGMeP28nw0Q4fimrgdJgrA4367wGSRfb
 EL640Ww=
 =ZKV7
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Only a few regression fixes and trivial device quirks"

* tag 'sound-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
  ALSA: hda: Fix hang during shutdown due to link reset
  ALSA: hda/realtek: Enable 4-speaker output for Dell XPS 15 9510 laptop
  ALSA: oxfw: fix functioal regression for silence in Apogee Duet FireWire
  ALSA: hda - fix the 'Capture Switch' value change notifications
2021-08-18 12:00:27 -07:00
Linus Torvalds a83955bdad cfi fix for v5.14-rc7
- Use rcu_read_{un}lock_sched_notrace to avoid recursion (Elliot Berman)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmEcxNAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJk8iD/9AS/wTtgllEeGMuogRUBSZXcxW
 AlFW2nBFhMGwj21EMYhdN3Oor+y87IQZ1hJuU4mNYS6SWTi1Z2ptjmJwwX5qX0wa
 s2q/I8cfz2LcQMkArsAVKxMTY6GlUmQO02Wrk5HRX83qaIkeCR8fbgOvqILEHwjf
 D3b+G/k5yRi3gqT9Dj0zYb5d5fAdUmlGilNBZWG2gBmvtVGtH6rKFy7Dxr/ytTOZ
 lhoYw3ApGefcWYFuYaWlXMO6cI1DjVF9sViKOJwNd5YWziT13ty2eLFyOEquN7UY
 ewaBPMD821BXvF1cKofO/fWP147b2RzKW5lzwmVlpWvdhSHQcAdIzTmnEuj6s9dl
 GPKJSsAdNTERrHaXqZ/Svq6eZsoJhGhUFG5JUfVo/7kBNbI28QCQu9XBP0sBs6o+
 P4Az/QgLGR8okK5NztzQMTe+YEeyzeJ1/W09Gzk040JUxwu84T2Bir+v089t3jHe
 qMyU8lpOa7OV8WcqKeLV9UakVq/NA6iWNRW8MF/jhr/ZmoSPJN2NJ3ixw19qtxPb
 EmDd5+O6z1zWQKkz5a+Vl7Bh3ZuBSuOwirlQCS9zoJoBgKjHZRDHxJAOMkhO7DfM
 409B72NJ/wKJN+4JOasus6x+6wxFNmO4u7/+xeCH2kQiamPiOyoeRGHyLdPDqziQ
 /4CjeToRidKX0HpvoQ==
 =Dtt5
 -----END PGP SIGNATURE-----

Merge tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull clang cfi fix from Kees Cook:

 - Use rcu_read_{un}lock_sched_notrace to avoid recursion (Elliot Berman)

* tag 'cfi-v5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  cfi: Use rcu_read_{un}lock_sched_notrace
2021-08-18 11:55:50 -07:00
Linus Torvalds 3b844826b6 pipe: avoid unnecessary EPOLLET wakeups under normal loads
I had forgotten just how sensitive hackbench is to extra pipe wakeups,
and commit 3a34b13a88 ("pipe: make pipe writes always wake up
readers") ended up causing a quite noticeable regression on larger
machines.

Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
not clear that this matters in real life all that much, but as Mel
points out, it's used often enough when comparing kernels and so the
performance regression shows up like a sore thumb.

It's easy enough to fix at least for the common cases where pipes are
used purely for data transfer, and you never have any exciting poll
usage at all.  So set a special 'poll_usage' flag when there is polling
activity, and make the ugly "EPOLLET has crazy legacy expectations"
semantics explicit to only that case.

I would love to limit it to just the broken EPOLLET case, but the pipe
code can't see the difference between epoll and regular select/poll, so
any non-read/write waiting will trigger the extra wakeup behavior.  That
is sufficient for at least the hackbench case.

Apart from making the odd extra wakeup cases more explicitly about
EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
write, not at the first write chunk.  That is actually much saner
semantics (as much as you can call any of the legacy edge-triggered
expectations for EPOLLET "sane") since it means that you know the wakeup
will happen once the write is done, rather than possibly in the middle
of one.

[ For stable people: I'm putting a "Fixes" tag on this, but I leave it
  up to you to decide whether you actually want to backport it or not.
  It likely has no impact outside of synthetic benchmarks  - Linus ]

Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
Fixes: 3a34b13a88 ("pipe: make pipe writes always wake up readers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Sandeep Patil <sspatil@android.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-18 11:39:46 -07:00
Thomas Weißschuh 1e35b8a778 platform/x86: gigabyte-wmi: add support for B450M S2H V2
Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-901207693

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20210818164435.99821-1-linux@weissschuh.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-18 19:39:31 +02:00
Pavel Skripkin a786e3195d net: asix: fix uninit value bugs
Syzbot reported uninit-value in asix_mdio_read(). The problem was in
missing error handling. asix_read_cmd() should initialize passed stack
variable smsr, but it can fail in some cases. Then while condidition
checks possibly uninit smsr variable.

Since smsr is uninitialized stack variable, driver can misbehave,
because smsr will be random in case of asix_read_cmd() failure.
Fix it by adding error handling and just continue the loop instead of
checking uninit value.

Added helper function for checking Host_En bit, since wrong loop was used
in 4 functions and there is no need in copy-pasting code parts.

Cc: Robert Foss <robert.foss@collabora.com>
Fixes: d9fe64e511 ("net: asix: Add in_pm parameter")
Reported-by: syzbot+a631ec9e717fb0423053@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 11:46:52 +01:00
kaixi.fan 01634047bf ovs: clear skb->tstamp in forwarding path
fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
doesn't clear skb->tstamp. We encountered a problem with linux
version 5.4.56 and ovs version 2.14.1, and packets failed to
dequeue from qdisc when fq qdisc was attached to ovs port.

Fixes: fb420d5d91 ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com>
Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 11:31:13 +01:00
David S. Miller 97712f8f91 Merge branch 'mdio-fixes'
Saravana Kannan says:

====================
Clean up and fix error handling in mdio_mux_init()

This patch series was started due to -EPROBE_DEFER not being handled
correctly in mdio_mux_init() and causing issues [1]. While at it, I also
did some more error handling fixes and clean ups. The -EPROBE_DEFER fix is
the last patch.

Ideally, in the last patch we'd treat any error similar to -EPROBE_DEFER
but I'm not sure if it'll break any board/platforms where some child
mdiobus never successfully registers. If we treated all errors similar to
-EPROBE_DEFER, then none of the child mdiobus will work and that might be a
regression. If people are sure this is not a real case, then I can fix up
the last patch to always fail the entire mdio-mux init if any of the child
mdiobus registration fails.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Saravana Kannan 7bd0cef5da net: mdio-mux: Handle -EPROBE_DEFER correctly
When registering mdiobus children, if we get an -EPROBE_DEFER, we shouldn't
ignore it and continue registering the rest of the mdiobus children. This
would permanently prevent the deferring child mdiobus from working instead
of reattempting it in the future. So, if a child mdiobus needs to be
reattempted in the future, defer the entire mdio-mux initialization.

This fixes the issue where PHYs sitting under the mdio-mux aren't
initialized correctly if the PHY's interrupt controller is not yet ready
when the mdio-mux is being probed. Additional context in the link below.

Fixes: 0ca2997d14 ("netdev/of/phy: Add MDIO bus multiplexer support.")
Link: https://lore.kernel.org/lkml/CAGETcx95kHrv8wA-O+-JtfH7H9biJEGJtijuPVN0V5dUKUAB3A@mail.gmail.com/#t
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Saravana Kannan 99d81e9424 net: mdio-mux: Don't ignore memory allocation errors
If we are seeing memory allocation errors, don't try to continue
registering child mdiobus devices. It's unlikely they'll succeed.

Fixes: 342fa19644 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Saravana Kannan 663d946af5 net: mdio-mux: Delete unnecessary devm_kfree
The whole point of devm_* APIs is that you don't have to undo them if you
are returning an error that's going to get propagated out of a probe()
function. So delete unnecessary devm_kfree() call in the error return path.

Fixes: b601616681 ("mdio: mux: Correct mdio_mux_init error path issues")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Vladimir Oltean ed5d2937a6 net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
It seems that of_find_compatible_node has a weird calling convention in
which it calls of_node_put() on the "from" node argument, instead of
leaving that up to the caller. This comes from the fact that
of_find_compatible_node with a non-NULL "from" argument it only supposed
to be used as the iterator function of for_each_compatible_node(). OF
iterator functions call of_node_get on the next OF node and of_node_put()
on the previous one.

When of_find_compatible_node calls of_node_put, it actually never
expects the refcount to drop to zero, because the call is done under the
atomic devtree_lock context, and when the refcount drops to zero it
triggers a kobject and a sysfs file deletion, which assume blocking
context.

So any driver call to of_find_compatible_node is probably buggy because
an unexpected of_node_put() takes place.

What should be done is to use the of_get_compatible_child() function.

Fixes: 5a8f09748e ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/
Suggested-by: Frank Rowand <frowand.list@gmail.com>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:21:01 +01:00
Toke Høiland-Jørgensen 86b9bbd332 sch_cake: fix srchost/dsthost hashing mode
When adding support for using the skb->hash value as the flow hash in CAKE,
I accidentally introduced a logic error that broke the host-only isolation
modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash
variable should stay initialised to 0 in cake_hash() in pure host-based
hashing mode. Add a check for this before using the skb->hash value as
flow_hash.

Fixes: b0c19ed608 ("sch_cake: Take advantage of skb->hash where appropriate")
Reported-by: Pete Heist <pete@heistp.net>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:14:00 +01:00
Thomas Weißschuh b9570f5c92 platform/x86: gigabyte-wmi: add support for X570 GAMING X
Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-900263115

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20210817154628.84992-1-linux@weissschuh.net
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-18 09:43:51 +02:00
Wang Hai 1b80fec7b0 ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
In ixgbe_xsk_pool_enable(), if ixgbe_xsk_wakeup() fails,
We should restore the previous state and clean up the
resources. Add the missing clear af_xdp_zc_qps and unmap dma
to fix this bug.

Fixes: d49e286d35 ("ixgbe: add tracking of AF_XDP zero-copy state for each queue pair")
Fixes: 4a9b32f30f ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210817203736.3529939-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 17:47:52 -07:00
Jakub Kicinski e5e487a2ec wireless-drivers fixes for v5.14
First set of fixes for v5.14 and nothing major this time. New devices
 for iwlwifi and one fix for a compiler warning.
 
 iwlwifi
 
 * support for new devices
 
 mt76
 
 * fix compiler warning about MT_CIPHER_NONE
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCgAzFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmEb7SAVHGt2YWxvQGNv
 ZGVhdXJvcmEub3JnAAoJEG4XJFUm622bDaEH/1TH5IlBXq4dNPGNwgZ0L4sXFlVk
 U7mSbuv2BdH+5edozGwdGl4d6o/vqzcoYSF/dh6d4rpX//zacSsB4LpACW1knlh5
 aJxjP5PdxLty/90JXtpBxL79WfQ9kVQ72ldKAg1Gk9XFk1UOqXSaOaLccNBtFk78
 n97hNwEeKKX1bw//fNLgyxUAlMoIVCaNjtcY9xJpoC5xLHQxM7ixhxqZF7XSeujQ
 z63CRUnT/7gFr4DbOLsZSZVYhCX9v+rz4imIsNbly3e6vLH9Mp2pkyHFfaHKFk6X
 tV/Kkd1Bq6OjQAGSq7mbddi7XqXSd3/1rUZChUy0ZSiyKlly52iqxWhWBBI=
 =YqJK
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-2021-08-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.14

First set of fixes for v5.14 and nothing major this time. New devices
for iwlwifi and one fix for a compiler warning.

iwlwifi
 * support for new devices

mt76
 * fix compiler warning about MT_CIPHER_NONE

* tag 'wireless-drivers-2021-08-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
  mt76: fix enum type mismatch
  iwlwifi: add new so-jf devices
  iwlwifi: add new SoF with JF devices
  iwlwifi: pnvm: accept multiple HW-type TLVs
====================

Link: https://lore.kernel.org/r/20210817171027.EC1E6C43460@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 15:08:14 -07:00
Linus Torvalds 614cb2751d tracing: Limit the shooting in the foot of tp_printk
The "tp_printk" option redirects the trace event output to printk at boot
 up. This is useful when a machine crashes before boot where the trace events
 can not be retrieved by the in kernel ring buffer. But it can be "dangerous"
 because trace events can be located in high frequency locations such as
 interrupts and the scheduler, where a printk can slow it down that it live
 locks the machine (because by the time the printk finishes, the next event
 is triggered). Thus tp_printk must be used with care.
 
 It was discovered that the filter logic to trace events does not apply to
 the tp_printk events. This can cause a surprise and live lock when the user
 expects it to be filtered to limit the amount of events printed to the
 console when in fact it still prints everything.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYRwL+RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkRHAP9gvTYH1es9l4V5SNFEQ7+GEwknsaq7
 B5q4znVKQKgajQD/cd5Cm/alTIbxXdrQ9nxJ7lfffrvk46iqAb9PRX9vhAQ=
 =8QxT
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Limit the shooting in the foot of tp_printk

  The "tp_printk" option redirects the trace event output to printk at
  boot up. This is useful when a machine crashes before boot where the
  trace events can not be retrieved by the in kernel ring buffer. But it
  can be "dangerous" because trace events can be located in high
  frequency locations such as interrupts and the scheduler, where a
  printk can slow it down that it live locks the machine (because by the
  time the printk finishes, the next event is triggered). Thus tp_printk
  must be used with care.

  It was discovered that the filter logic to trace events does not apply
  to the tp_printk events. This can cause a surprise and live lock when
  the user expects it to be filtered to limit the amount of events
  printed to the console when in fact it still prints everything"

* tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Apply trace filters on all output channels
2021-08-17 09:47:18 -10:00
Dinghao Liu 0a298d1338 net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
qlcnic_83xx_unlock_flash() is called on all paths after we call
qlcnic_83xx_lock_flash(), except for one error path on failure
of QLCRD32(), which may cause a deadlock. This bug is suggested
by a static analysis tool, please advise.

Fixes: 81d0aeb0a4 ("qlcnic: flash template based firmware reset recovery")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 08:27:31 -07:00
Johannes Berg 276e189f8e mac80211: fix locking in ieee80211_restart_work()
Ilan's change to move locking around accidentally lost the
wiphy_lock() during some porting, add it back.

Fixes: 45daaa1318 ("mac80211: Properly WARN on HW scan before restart")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20210817121210.47bdb177064f.Ib1ef79440cd27f318c028ddfc0c642406917f512@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 06:51:43 -07:00
Jason Wang dbcf24d153 virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
Commit a02e8964ea ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.

This leads to several issues:

- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc slowing
  down the traffic.

Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964ea
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964ea ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Ivan <ivan@prestigetransportation.com>
Tested-by: Ivan <ivan@prestigetransportation.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:45:09 +01:00
Takashi Iwai 4bf61ad5f0 ALSA: hda/via: Apply runtime PM workaround for ASUS B23E
ASUS B23E requires the same workaround like other machines with
VT1802, otherwise it looses the codec power on a few nodes and the
sound kept silence.

Fixes: a0645daf16 ("ALSA: HDA: Early Forbid of runtime PM")
Link: https://lore.kernel.org/r/ac2232f142efcd67fe6ac38897f704f7176bd200.camel@gmail.com
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210817052432.14751-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-17 08:02:44 +02:00
Imre Deak 0165c4e19f ALSA: hda: Fix hang during shutdown due to link reset
During system shutdown codecs may be still active, and resetting the
controller->codec HW link in this state - based on the bug reporter's
tests - leads to the shutdown sequence to get stuck. This happens at
least on the reporter's KBL system with an ALC662 codec.

For now fix the issue by skipping the link reset step.

Fixes: 472e18f63c ("ALSA: hda: Release controller display power during shutdown/reboot")
References: https://bugzilla.kernel.org/show_bug.cgi?id=214045
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3618#note_1024665
Reported-and-tested-by: youling257@gmail.com
Cc: youling257@gmail.com
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20210816174259.2759103-1-imre.deak@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-17 07:14:30 +02:00
Linus Torvalds 794c7931a2 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This contains a fix for a potential boot failure due to a missing
  Kconfig dependency for people upgrading with the DRBG enabled"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: drbg - select SHA512
2021-08-16 15:42:09 -10:00
Lahav Schlesinger 09e856d54b vrf: Reset skb conntrack connection on VRF rcv
To fix the "reverse-NAT" for replies.

When a packet is sent over a VRF, the POST_ROUTING hooks are called
twice: Once from the VRF interface, and once from the "actual"
interface the packet will be sent from:
1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
     This causes the POST_ROUTING hooks to run.
2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.

Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
second vrf_l3_rcv() calls them again.

As an example, consider the following SNAT rule:
> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1

In this case sending over a VRF will create 2 conntrack entries.
The first is from the VRF interface, which performs the IP SNAT.
The second will run the SNAT, but since the "expected reply" will remain
the same, conntrack randomizes the source port of the packet:
e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
rules are:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
SNAT-ed from 10000 --> 61033.

But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
conntrack entry is matched:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

And a "port 61033 unreachable" ICMP packet is sent back.

The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
the skb already has a conntrack flow attached to it, which means
nf_conntrack_in() will not resolve the flow again.

This means only the dest port is "reverse-NATed" (61033 -> 10000) but
the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
not received.
This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.

The fix is then to reset the flow when skb is received on a VRF, to let
conntrack resolve the flow again (which now will hit the earlier flow).

To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
  running 'bash ./repro'):
  $ cat run_in_A1.py
  import logging
  logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
  from scapy.all import *
  import argparse

  def get_packet_to_send(udp_dst_port, msg_name):
      return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
          IP(src='3.3.3.3', dst='2.2.2.2')/ \
          UDP(sport=53, dport=udp_dst_port)/ \
          Raw(f'{msg_name}\x0012345678901234567890')

  parser = argparse.ArgumentParser()
  parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
                      help="From run_in_A3.py")
  parser.add_argument('-socket_port', dest="socket_port", type=str,
                      required=True, help="From run_in_A3.py")
  parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
                      help="From script")

  args, _ = parser.parse_known_args()
  iface_mac = args.iface_mac
  socket_port = int(args.socket_port)
  v1_mac = args.v1_mac

  print(f'Source port before NAT: {socket_port}')

  while True:
      pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
      if 0 == len(pkts):
          print('Something failed, rerun the script :(', flush=True)
          break
      pkt = pkts[0]
      if not pkt.haslayer('UDP'):
          continue

      pkt_sport = pkt.getlayer('UDP').sport
      print(f'Source port after NAT: {pkt_sport}', flush=True)

      pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
      sendp(pkt_to_send, '_v0', verbose=False) # Will not be received

      pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
      sendp(pkt_to_send, '_v0', verbose=False)
      break

  $ cat run_in_A2.py
  import socket
  import netifaces

  print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
        flush=True)
  s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
               str('vrf_1' + '\0').encode('utf-8'))
  s.connect(('3.3.3.3', 53))
  print(f'{s. getsockname()[1]}', flush=True)
  s.settimeout(5)

  while True:
      try:
          # Periodically send in order to keep the conntrack entry alive.
          s.send(b'a'*40)
          resp = s.recvfrom(1024)
          msg_name = resp[0].decode('utf-8').split('\0')[0]
          print(f"Got {msg_name}", flush=True)
      except Exception as e:
          pass

  $ cat repro.sh
  ip netns del A1 2> /dev/null
  ip netns del A2 2> /dev/null
  ip netns add A1
  ip netns add A2

  ip -n A1 link add _v0 type veth peer name _v1 netns A2
  ip -n A1 link set _v0 up

  ip -n A2 link add e00000 type bond
  ip -n A2 link add lo0 type dummy
  ip -n A2 link add vrf_1 type vrf table 10001
  ip -n A2 link set vrf_1 up
  ip -n A2 link set e00000 master vrf_1

  ip -n A2 addr add 1.1.1.1/24 dev e00000
  ip -n A2 link set e00000 up
  ip -n A2 link set _v1 master e00000
  ip -n A2 link set _v1 up
  ip -n A2 link set lo0 up
  ip -n A2 addr add 2.2.2.2/32 dev lo0

  ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
  ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001

  ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
	SNAT --to-source 2.2.2.2 -o vrf_1

  sleep 5
  ip netns exec A2 python3 run_in_A2.py > x &
  XPID=$!
  sleep 5

  IFACE_MAC=`sed -n 1p x`
  SOCKET_PORT=`sed -n 2p x`
  V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
  ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
          ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
  sleep 5

  kill -9 $XPID
  wait $XPID 2> /dev/null
  ip netns del A1
  ip netns del A2
  tail x -n 2
  rm x
  set +x

Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16 16:37:01 -07:00