Both multipath and bonding events are changing the HW LAG state
independently.
Handling one of the features events while the other is already
enabled can cause unwanted behavior, for example handling
bonding event while multipath enabled will disable the lag and
cause multipath to stop working.
Fix it by ignoring bonding event while in multipath and ignoring FIB
events while in bonding mode.
Fixes: 544fe7c2e6 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Again it became bigger than wished, unfortunately, as this contains
quite a few ASoC fixes that came up a bit late. It also includes
yet more HD- and USB-audio quirks: I decided to merge them now, as
those are for stable, and we'll need them sooner or later.
Although the volumes are a bit high, all changes are device-specific
(and reasonably small) fixes, so it should be safe for the late rc.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmFu1wAOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE+dtA//RhmR89BafZJrkDVuXbJK5jeAwOpT/C5GzPER
o1sYfk3WYK/1fhQRm4welmsvkNEnjo+1Kkaud9hhDFD/2HEGupvn2E6uTnbonETf
OT5H7/PgohE5NP92j4z4HowxnBnshSYHjCK6+9SFS+pq881aZItcvJttaEzZf7AW
RWOjZHS/6OvRWubNv0MBPCKkkviAV7WWDuB7tOTHhZxls9Ab8LomUPTNeYwMBCCe
/02SDrpmCALD65TZJM17tWvcL6hVPIFyZunlwBZOQ6JnH5pPeiBQ4Z2+PCv+zcyB
MbaMvY6Bcpda2mtOUDr6sljsGWRcjQuvNwcPIy20qumwyzzS+tg7dMwx70wvXwBg
5hYDqGaRNbx8WtvBbwJkSJ3jUfih8R0zgoOFo2Dx3NeE2tZjuXwfxbp7AxO9A68D
YZZqYNPqX7faa9c+FCtSVpWjUybofTjzglx+HqbZgshMKr8237ARzjBXTDA/jxzi
xX8D8Izw+uGlFs0EAuQ82lmOm32xEq0XWDHt6QPRk3yP5+NFVzjseYquGIGpv2r3
l4p2M/jN2v4Ls2YcuoJOj+CmLR1sepnqd7TsHsLAL9m+eoSA3CnZ/Vp2ltMnxd/Q
0AJV63EwDwEOpVryUdsU4xhSjDMueDlXrkhiTWWFN03QUt+PlPVPEVh4rXJRsV8c
eEGc534=
=zrEF
-----END PGP SIGNATURE-----
Merge tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again it became bigger than wished, unfortunately, as this contains
quite a few ASoC fixes that came up a bit late. It also includes yet
more HD- and USB-audio quirks: I decided to merge them now, as those
are for stable, and we'll need them sooner or later.
Although the volumes are a bit high, all changes are device-specific
(and reasonably small) fixes, so it should be safe for the late rc"
* tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix microphone sound on Jieli webcam.
ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers
ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
ALSA: hda/realtek: Add quirk for Clevo PC50HS
ALSA: usb-audio: add Schiit Hel device to quirk table
ASoC: wm8960: Fix clock configuration on slave mode
ASoC: cs42l42: Ensure 0dB full scale volume is used for headsets
ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked()
ASoC: codec: wcd938x: Add irq config support
ASoC: DAPM: Fix missing kctl change notifications
ASoC: Intel: bytcht_es8316: Utilize dev_err_probe() to avoid log saturation
ASoC: Intel: bytcht_es8316: Switch to use gpiod_get_optional()
ASoC: Intel: bytcht_es8316: Use temporary variable for struct device
ASoC: Intel: bytcht_es8316: Get platform data via dev_get_platdata()
ASoC: wcd938x: Fix jack detection issue
ASoC: nau8824: Fix headphone vs headset, button-press detection no longer working
ASoC: cs4341: Add SPI device ID table
ASoC: pcm179x: Add missing entries SPI to device ID table
ASoC: fsl_xcvr: Fix channel swap issue with ARC
ASoC: pcm512x: Mend accesses to the I2S_1 and I2S_2 registers
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmFvbsAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOMsBAAkf8ZrL5UHq+0F60g+dEJHq7kwet7
QKWjDS1UOQK9Cvtt7T6Ggwu+kUMYG/HNlWMBkiv8+6SSy9o4KaftjpEoZDkZIO0F
lujGhgPsZdfadRUZgvhC0NrEmXwYQxdGqxWiE00tRYRBMviP2vb2Bf1Z/iXQZGeI
JZOENUCT6fridU9gNkUKP+qcV7/eZLaOTUItPyd8spYGtl4k85kKsHDHlF1C4MHM
ByRVAjuTbubGU8m5RF4tjju+f7CWBiiAXQFev/qouBzRXp+bk//WCxmc8e414Uwh
/QScy4wbRplIpq+iWTIcii8jwo0uJke7rPMetDik3VtqLCBu9hhh4Np0umfwjnOt
Fwis0H/2VoikJE8G7lC/0qd2ya3DgGtBbr+QMePh3QK8iUTkTlDTKiAf4b8JYm3x
MNXV/XSYIdlSoYUsSZXx9IciSCKnEa5TayY/N60CLFeyKOgmyxdtRA/Mql30bLc5
a141pVF+hNnovdpgcoIfCvA/oXhxsPYAL/Rh1OLPhwhTG+fKPrJfM8qsIZsvNUAV
Kg0UJRWxr5mkmYFv7vlPJSK+ZrJ/LlbNGskr2RuAPQ9QOsAQjf/3z/WJe2nfOQgH
oMLij2M0/sq1YuEP1yfQP4k2Du/Vqy4z+Ls1kKovlxldZEwk3TAEKND5voPUXVVv
h6rtxUWwGl+j0m4=
=tBGu
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"One small audit patch to add a pointer NULL check"
* tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix possible null-pointer dereference in audit_filter_rules
* various bugfixes
* endian fixes
* mt7921 aspm support
* cleanup
* mt7921 testmode support
* rate handling fixes
* tx status fixes/improvements
* mt7921 power management improvements
* mt7915 LED support
* DBDC fixes
* mt7921 6GHz support
* support for eeprom data in DT
* mt7915 TWT support
* mt7915 txbf + MU-MIMO improvements
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
iEYEABECAAYFAmFv4F4ACgkQ130UHQKnbvWiCQCbBaVvAjHfJdW8eJrqk7GCh64v
qo8An3OrUINp5qzewe+F/vlLoJGNfCM0
=ZkbS
-----END PGP SIGNATURE-----
Merge tag 'mt76-for-kvalo-2021-10-20' of https://github.com/nbd168/wireless
mt76 patches for 5.16
* various bugfixes
* endian fixes
* mt7921 aspm support
* cleanup
* mt7921 testmode support
* rate handling fixes
* tx status fixes/improvements
* mt7921 power management improvements
* mt7915 LED support
* DBDC fixes
* mt7921 6GHz support
* support for eeprom data in DT
* mt7915 TWT support
* mt7915 txbf + MU-MIMO improvements
# gpg: Signature made Wed 20 Oct 2021 12:24:46 PM EEST
# gpg: using DSA key D77D141D02A76EF5
# gpg: Good signature from "Felix Fietkau <nbd@nbd.name>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 75D1 1A7D 91A7 710F 4900 42EF D77D 141D 02A7 6EF5
As part of support for E810 XXV devices, some device ids were
inadvertently left out. Add those missing ids.
Fixes: 195fb97766 ("ice: add additional E810 device id")
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
The device ID for I226_K was incorrectly assigned, update the device
ID to the correct one.
Fixes: bfa5e98c9d ("igc: Add new device ID")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Update the HW MAC initialization flow. Do not gate DMA clock from
the modPHY block. Keeping this clock will prevent dropped packets
sent in burst mode on the Kumeran interface.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377
Fixes: fb776f5d57 ("e1000e: Add support for Tiger Lake")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
- While cleaning up some of the tracing recursion protection logic,
I discovered a scenario that the current design would miss, and
would allow an infinite recursion. Removing an optimization trick
that opened the hole, fixes the issue and cleans up the code as well.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYW7CqRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qmVWAQDnboKAUVmB/3D/L1T9XdWEq4AzCS6W
51QpzWff0pBVkwEAuc1af2gqDZ6/N9sQjN9kGxikY6luVs3CSQ1yHkcanQw=
=1zGP
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Recursion fix for tracing.
While cleaning up some of the tracing recursion protection logic, I
discovered a scenario that the current design would miss, and would
allow an infinite recursion. Removing an optimization trick that
opened the hole fixes the issue and cleans up the code as well"
* tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have all levels of checks prevent recursion
- Renamed CTL_STATUS to CTL_FSTATUS to fix a redefined warning
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEoHhMeiyk5VmwVMwNGZQEC4GjKPQFAmFt0A0UHGRpbmd1eWVu
QGtlcm5lbC5vcmcACgkQGZQEC4GjKPSNyRAArdyukcjrH4Ir+z2yfHFlQS+K0dnw
pnuJ6IJgTdaaz+GqdnsVCyCzOf0wd0jNR38qxILzOGzFYTHYt1aP8pLwQernaM4c
YKViWX+8EIMVU8bYcR0pEszf83NO/1tPRHLKgP7KJgQORrHPEQ0cRYG7ZYnCHmJD
d6QHjroCpGefMYVzRglUW9471Y9l02LsbGjMiQMatQ9uT6M/b9PVn+5iAfflCDNK
jsCUTdbUYH/DVjwUHxxDBvWfQdLjNE4H+q6KMh4Zkg5VJOt8WxvLKj2EI1yLJkkg
3NbeK8MAvnWQ5qUvs8k4vrmBBumXN6dRicYzvQXPvOpHQ66ryc/y0SxxfsUk3Gza
a1NhOnY4UfrTq6GKYI9z/FdY2GIOZi2Fa3RiwglQR3W8y+zvJW5dRMlLnlMyRJ8E
pu1CMy0nH9cOdLnzDl9PVx3fhHYJTt/rzKeyf9q9fbN54XgeGIpP39HXGiBmhNlc
7QZvEcUx5/HtOaDlTk3aYZvLLDUPmi1XyThLhdz+Z/BKQDExSd11hHnYs/Z+XLFW
6eIAa4kW2Ryv9HqothQO5tKMebGmdOjxMGl8Nlmb1mmHyr2xoOi3fmmiPZ+gG1Jw
SXGEkwqMUKtJwumKs0GoMKo0Ee4QJ+lv/VuHlfxfhRbwVvuoM+NoNv2q3pHKvj6Z
6k3KL+ft3GFlFP4=
=D96y
-----END PGP SIGNATURE-----
Merge tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fix from Dinh Nguyen:
- Renamed CTL_STATUS to CTL_FSTATUS to fix a redefined warning
* tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
NIOS2: irqflags: rename a redefined register name
* kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic
x86:
* clean ups and fixes for bus lock vmexit and lazy allocation of rmaps
* two fixes for SEV-ES (one more coming as soon as I get reviews)
* fix for static_key underflow
ARM:
* Properly refcount pages used as a concatenated stage-2 PGD
* Fix missing unlock when detecting the use of MTE+VM_SHARED
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmFtuqYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNGbAf9Ha4mlieY7lDQLk96GydPwlMofi1B
dteRaWizokT0Xk7HovPr8G1zwwE9DrqO1FuHiZrkckzf7cloaPDvncLag3D3Vakr
dWIqa7MaavSWBKDpcEIKOEo2SfIBU38xXQSEpegz2f2fhZK0Ud2xUNtGQMNrYatX
Lz6FXHRvHDmv4+9EjASoGBd0/C/NxMaumYa1VOxMt8JPyn+zho0z5rUDKDF4pg70
KAgxVZuksy15XFRTgaSaU0BqVn9uCHwZVqRFKBm+ocPXIFjhdMkgrxJ7NSYB1T+N
VFqcUBTFTjhg9e5eZnQ6GMf9FXpLzK912VhCRd0uU5PGeBwUDJTSnyu5OQ==
=GZqR
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Tools:
- kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic
x86:
- clean ups and fixes for bus lock vmexit and lazy allocation of rmaps
- two fixes for SEV-ES (one more coming as soon as I get reviews)
- fix for static_key underflow
ARM:
- Properly refcount pages used as a concatenated stage-2 PGD
- Fix missing unlock when detecting the use of MTE+VM_SHARED"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: reduce ghcb_sa_len to 32 bits
KVM: VMX: Remove redundant handling of bus lock vmexit
KVM: kvm_stat: do not show halt_wait_ns
KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload
Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET"
KVM: SEV-ES: Set guest_state_protected after VMSA update
KVM: X86: fix lazy allocation of rmaps
KVM: SEV-ES: fix length of string I/O
KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
KVM: arm64: Report corrupted refcount at EL2
KVM: arm64: Fix host stage-2 PGD refcount
KVM: s390: Function documentation fixes
We have the same LAN controller on different PCHs. Separate TGP board
type from SPT which will allow for specific fixes to be applied for
TGP platforms.
Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Mark Pearson <markpearson@lenovo.com>
Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Setting cred->ucounts in cred_alloc_blank does not make sense. The
uid and user_ns are deliberately not set in cred_alloc_blank but
instead the setting is delayed until key_change_session_keyring.
So move dealing with ucounts into key_change_session_keyring as well.
Unfortunately that movement of get_ucounts adds a new failure mode to
key_change_session_keyring. I do not see anything stopping the parent
process from calling setuid and changing the relevant part of it's
cred while keyctl_session_to_parent is running making it fundamentally
necessary to call get_ucounts in key_change_session_keyring. Which
means that the new failure mode cannot be avoided.
A failure of key_change_session_keyring results in a single threaded
parent keeping it's existing credentials. Which results in the parent
process not being able to access the session keyring and whichever
keys are in the new keyring.
Further get_ucounts is only expected to fail if the number of bits in
the refernece count for the structure is too few.
Since the code has no other way to report the failure of get_ucounts
and because such failures are not expected to be common add a WARN_ONCE
to report this problem to userspace.
Between the WARN_ONCE and the parent process not having access to
the keys in the new session keyring I expect any failure of get_ucounts
will be noticed and reported and we can find another way to handle this
condition. (Possibly by just making ucounts->count an atomic_long_t).
Cc: stable@vger.kernel.org
Fixes: 905ae01c4a ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I got memory leak as follows when doing fault injection test:
unreferenced object 0xffff88800906c618 (size 8):
comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
hex dump (first 8 bytes):
70 74 70 30 00 00 00 00 ptp0....
backtrace:
[<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
[<0000000079f6e2ff>] kvasprintf+0xb5/0x150
[<0000000026aae54f>] kvasprintf_const+0x60/0x190
[<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
[<000000004e35abdd>] dev_set_name+0xc0/0x100
[<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
[<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]
When posix_clock_register() returns an error, the name allocated
in dev_set_name() will be leaked, the put_device() should be used
to give up the device reference, then the name will be freed in
kobject_cleanup() and other memory will be freed in ptp_clock_release().
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: a33121e548 ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When utilizing End to End delay mechanism, the following error messages show up:
|root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
|ptp4l[950.573]: selected /dev/ptp3 as PTP clock
|ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
|ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
|ptp4l[956.879]: port 1: assuming the grand master role
|ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
|ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
|ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
|ptp4l[963.090]: port 1: received DELAY_REQ without timestamp
Commit f2fb6b6275 ("net: stmmac: enable timestamp snapshot for required PTP
packets in dwmac v5.10a") already addresses this problem for the dwmac
v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
check accordingly. Afterwards everything works as expected.
Tested on Intel Atom(R) x6414RE Processor.
Fixes: 14f347334b ("net: stmmac: Correctly take timestamp for PTPv2")
Fixes: f2fb6b6275 ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If something goes wrong in the remove callback, returning an error code
just results in an error message. The device still disappears.
So don't skip disabling the regulator in st95hf_remove() if resetting
the controller via spi fails. Also don't return an error code which just
results in two error messages.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Micrel KSZ8041NL PHY chips exhibit continuous RX errors after using
the power down mode bit (0.11). If the PHY is taken out of power down
mode in a certain temperature range, the PHY enters a weird state which
leads to continuously reporting RX errors. In that state, the MAC is not
able to receive or send any Ethernet frames and the activity LED is
constantly blinking. Since Linux is using the suspend callback when the
interface is taken down, ending up in that state can easily happen
during a normal startup.
Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
clock recovery when using power down mode. Even the latest revision (A4,
Revision ID 0x1513) seems to suffer that problem, and according to the
errata is not going to be fixed.
Remove the suspend/resume callback to avoid using the power down mode
completely.
[*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf
Fixes: 1a5465f5d6 ("phy/micrel: Add suspend/resume support to Micrel PHYs")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coverity complains of a possible dereference of a null return value.
5. returned_null: kzalloc returns NULL. [show details]
6. var_assigned: Assigning: si_data = NULL return value from kzalloc.
488 si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
489 cbd.length = cpu_to_le16(data_size);
490
491 dma = dma_map_single(&priv->si->pdev->dev, si_data,
492 data_size, DMA_FROM_DEVICE);
While this kzalloc() is unlikely to fail, I did notice that the function
returned without unmapping si_data.
Fix this by refactoring the error paths and checking for kzalloc()
failure.
Fixes: 888ae5a395 ("net: enetc: add tc flower psfp offload driver")
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While loading a driver and changing the number of queues, I noticed this
message in the kernel log:
"[253489.070080] Number of in use tx queues changed invalidating tc
mappings. Priority traffic classification disabled!"
But I had no idea what interface was being talked about because this
message used pr_warn().
After investigating, it appears we can use the netdev_* helpers already
defined to create predictably formatted messages, and that already handle
<unknown netdev> cases, in more of the messages in dev.c.
After this change, this message (and others) will look like this:
"[ 170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues
changed invalidating tc mappings. Priority traffic classification
disabled!"
One goal here was not to change the message significantly from the
original format so as to not break user's expectations, so I just
changed messages that used pr_* and generally started with %s ==
dev->name.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Convert batman from ether_addr_copy() to eth_hw_addr_set():
@@
expression dev, np;
@@
- ether_addr_copy(dev->dev_addr, np)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev->dev_addr will be constant soon, make sure
the qualifier is propagated thru batman-adv.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Coverity complains of unsigned compare against 0. There are 2 cases in
this function:
1821 itp = (irq_holdoff * 1000) / p->desc->qman_256_cycles_per_ns;
CID 121131 (#1 of 1): Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. itp < 0U.
1822 if (itp < 0 || itp > 4096) {
1823 max_holdoff = (p->desc->qman_256_cycles_per_ns * 4096) / 1000;
1824 pr_err("irq_holdoff must be between 0..%dus\n", max_holdoff);
1825 return -EINVAL;
1826 }
1827
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. irq_threshold < 0U.
1828 if (irq_threshold >= p->dqrr.dqrr_size || irq_threshold < 0) {
1829 pr_err("irq_threshold must be between 0..%d\n",
1830 p->dqrr.dqrr_size - 1);
1831 return -EINVAL;
1832 }
Fix this by removing the comparisons altogether as they are incorrect. Zero is
a possible value in either case. Also fix a minor comment typo and update the
2 pr_err() calls to use %u formatting as well as be more precise regarding
the exact error.
Fixes: ed1d2143fe ("soc: fsl: dpio: add support for irq coalescing per software portal")
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Roy Pledge <Roy.Pledge@nxp.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tidy and organize qca8k setup function from multiple for loop.
Change for loop in bridge leave/join to scan all port and skip cpu port.
No functional change intended.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-10-19
This series contains updates to ice driver only.
Brett implements support for ndo_set_vf_rate allowing for min_tx_rate
and max_tx_rate to be set for a VF.
Jesse updates DIM moderation to improve latency and resolves problems
with reported rate limit and extra software generated interrupts.
Wojciech moves a check for trusted VFs to the correct function,
disables lb_en for switchdev offloads, and refactors ethtool ops due
to differences in support for PF and port representor support.
Cai Huoqing utilizes the helper function devm_add_action_or_reset().
Gustavo A. R. Silva replaces uses of allocation to devm_kcalloc() as
applicable.
Dan Carpenter propagates an error instead of returning success.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
ethernet: manual netdev->dev_addr conversions (part 3)
Manual conversions of Ethernet drivers writing directly
to netdev->dev_addr (part 3 out of 3).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, do the swapping, then
call eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Break the address up into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Read the address into an array on the stack, then call
eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
hns3.ko includes network stack and pci_driver, hclge.ko includes
HW device action, algo_ops and timer task, hnae3.ko includes some
register function.
When SRIOV is enable and hclge.ko is removed, HW device is unloaded
but VF still exists, PF will not reply VF mbx messages, and cause
errors.
This patch fix it by disable SRIOV before remove hclge.ko.
Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The task of VF reset is performed through the workqueue. It checks the
value of hdev->reset_pending to determine whether to exit the loop.
However, the value of hdev->reset_pending may also be assigned by
the interrupt function hclgevf_misc_irq_handle(), which may cause the
loop fail to exit and keep occupying the workqueue. This loop is not
necessary, so remove it and the workqueue will be rescheduled if the
reset needs to be retried or a new reset occurs.
Fixes: 1cc9bc6e58 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when there is a rx page allocation failure, it is
possible that polling may be stopped if there is no more packet
to be reveiced, which may cause queue stall problem under memory
pressure.
This patch makes sure polling is scheduled again when there is
any rx page allocation failure, and polling will try to allocate
receive buffers until it succeeds.
Now the allocation retry is added, it is unnecessary to do the rx
page allocation at the end of rx cleaning, so remove it. And reset
the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
memory pressure.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rx unused desc is the desc that need attatching new buffer
before refilling to hw to receive new packet, the number of
desc need attatching new buffer is calculated using next_to_use
and next_to_clean. when next_to_use == next_to_clean, currently
hns3 driver assumes that all the desc has the buffer attatched,
but 'next_to_use == next_to_clean' also means all the desc need
attatching new buffer if hw has comsumed all the desc and the
driver has not attatched any buffer to the desc yet.
This patch adds 'refill' in desc_cb to indicate whether a new
buffer has been refilled to a desc.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the max tx size supported by the hw is calculated by
using the max BD num supported by the hw. According to the hw
user manual, the max tx size is fixed value for both non-TSO and
TSO skb.
This patch updates the max tx size according to the manual.
Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
mode. In this case, this tc may occupy all the tx bandwidth if it has
huge traffic, so it violates the purpose of the user setting.
To fix this problem, limit the ets dwrr bandwidth must greater than 0.
Fixes: cacde272dd ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, DWRR of tc will be initialized to a fixed value when this tc
is enabled, but it is not been reset to 0 when this tc is disabled. It
cause a problem that the DWRR of unused tc is not 0 after using tc tool
to add and delete multi-tc parameters.
For examples, after enabling 4 TCs and restoring to 1 TC by follow
tc commands:
$ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
8@0 8@8 8@16 8@24 hw 1 mode channel
$ tc qdisc del dev eth0 root
Now there is just one TC is enabled for eth0, but the tc info querying by
debugfs is shown as follow:
$ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
enabled tc number: 1
weight_offset: 14
TC MODE WEIGHT
0 dwrr 100
1 dwrr 100
2 dwrr 100
3 dwrr 100
4 dwrr 0
5 dwrr 0
6 dwrr 0
7 dwrr 0
This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.
Fixes: d914971df0 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:
BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
dump_stack_lvl+0xac/0x108
__schedule_bug+0xac/0xe0
__schedule+0xcf8/0x10d0
schedule_idle+0x3c/0x70
do_idle+0x2d8/0x4a0
cpu_startup_entry+0x38/0x40
start_secondary+0x2ec/0x3a0
start_secondary_prolog+0x10/0x14
This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8279 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."
However, since commit 2c669ef697 ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a376ca ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.
The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.
Tested with pseries and powernv in qemu, and pseries on PowerVM.
Fixes: 2c669ef697 ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
In isa206_idle_insn_mayloss() we store various registers into the stack
red zone, which is allowed.
However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
to 0(r1), which corrupts the stack back chain.
We used to do the same in isa206_idle_insn_mayloss() itself, but we
fixed that in 73287caa92 ("powerpc64/idle: Fix SP offsets when saving
GPRs"), however we missed that the macro also corrupts the back chain.
Corrupting the back chain is bad for debuggability but doesn't
necessarily cause a bug.
However we recently changed the stack handling in some KVM code, and it
now relies on the stack back chain being valid when it returns. The
corruption causes that code to return with r1 pointing somewhere in
kernel data, at some point LR is restored from the stack and we branch
to NULL or somewhere else invalid.
Only affects Power8 hosts running KVM guests, with dynamic_mt_modes
enabled (which it is by default).
The fixes tag below points to the commit that changed the KVM stack
handling, exposing this bug. The actual corruption of the back chain has
always existed since 948cf67c47 ("powerpc: Add NAP mode support on
Power7 in HV mode").
Fixes: 9b4416c509 ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au
Vladimir Oltean says:
====================
New RGMII delay DT bindings for the SJA1105 DSA driver
During recent reviews I've been telling people that new MAC drivers
should adopt a certain DT binding format for RGMII delays in order to
avoid conflicting interpretations. Some suggestions were better received
than others, and it appears we are still far from a consensus.
Part of the problem seems to be that there are still drivers that apply
RGMII delays based on an incorrect interpretation of the device tree,
and these serve as a bad example for others.
I happen to maintain one of those drivers and I am able to test it, so I
figure that one of the ways in which I can make a change is to stop
providing a bad example.
Therefore, this series adds support for the "rx-internal-delay-ps" and
"tx-internal-delay-ps" properties inside sja1105 switch port DT nodes,
and if these are present, they will decide what RGMII delays will the
driver apply.
The in-tree device trees are also updated to follow the new format, as
well as the schema validator.
I assume it's okay to get all changes merged in through the same tree
(net-next). Although the DTS changes could be split, if needed - the
driver works with or without them. There is one more DTS which should be
changed, which is in Shawn's tree but not in net-next:
https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git/tree/arch/arm64/boot/dts/freescale/fsl-lx2160a-bluebox3.dts?h=for-next
For that, I'd have to send a separate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).
The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.
When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.
One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a schema validator to nxp,sja1105.yaml and to dsa.yaml for explicit
MAC-level RGMII delays. These properties must be per port and must be
present only for a phy-mode that represents RGMII.
We tell dsa.yaml that these port properties might be present, we also
define their valid values for SJA1105. We create a common definition for
the RX and TX valid range, since it's quite a mouthful.
We also modify the example to include the explicit RGMII delay properties.
On the fixed-link ports (in the example, port 4), having these explicit
delays is actually mandatory, since with the new behavior, the driver
shouts that it is interpreting what delays to apply based on phy-mode.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since a switch is basically a bunch of Ethernet controllers, just
inherit the common schema for one to get stronger type validation of the
properties of a port.
For example, before this change it was valid to have a phy-mode = "xfi"
even if "xfi" is not part of ethernet-controller.yaml, now it is not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All ports require either a phy-handle or a fixed-link, and port 3 in the
example didn't have one. Add it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 09e856d54b.
When an interface is enslaved in a VRF, prerouting conntrack hook is
called twice: once in the context of the original input interface, and
once in the context of the VRF interface. If no special precausions are
taken, this leads to creation of two conntrack entries instead of one,
and breaks SNAT.
Commit above was intended to avoid creation of extra conntrack entries
when input interface is enslaved in a VRF. It did so by resetting
conntrack related data associated with the skb when it enters VRF context.
However it breaks netfilter operation. Imagine a use case when conntrack
zone must be assigned based on the original input interface, rather than
VRF interface (that would make original interfaces indistinguishable). One
could create netfilter rules similar to these:
chain rawprerouting {
type filter hook prerouting priority raw;
iif realiface1 ct zone set 1 return
iif realiface2 ct zone set 2 return
}
This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent
on the conntrack zone does not work.
Here is a reproducer script that demonstrates the difference in behaviour.
==========
#!/bin/sh
# This script demonstrates unexpected change of nftables behaviour
# caused by commit 09e856d54b ""vrf: Reset skb conntrack
# connection on VRF rcv"
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
#
# Before the commit, it was possible to assign conntrack zone to a
# packet (or mark it for `notracking`) in the prerouting chanin, raw
# priority, based on the `iif` (interface from which the packet
# arrived).
# After the change, # if the interface is enslaved in a VRF, such
# assignment is lost. Instead, assignment based on the `iif` matching
# the VRF master interface is honored. Thus it is impossible to
# distinguish packets based on the original interface.
#
# This script demonstrates this change of behaviour: conntrack zone 1
# or 2 is assigned depending on the match with the original interface
# or the vrf master interface. It can be observed that conntrack entry
# appears in different zone in the kernel versions before and after
# the commit.
IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__
uname -rv
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L
Signed-off-by: Eugene Crosser <crosser@average.org>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211018235021.1279697-16-kuba@kernel.org