Commit Graph

1061089 Commits

Author SHA1 Message Date
Paul Moore f4b3ee3c85 audit: improve robustness of the audit queue handling
If the audit daemon were ever to get stuck in a stopped state the
kernel's kauditd_thread() could get blocked attempting to send audit
records to the userspace audit daemon.  With the kernel thread
blocked it is possible that the audit queue could grow unbounded as
certain audit record generating events must be exempt from the queue
limits else the system enter a deadlock state.

This patch resolves this problem by lowering the kernel thread's
socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks
the kauditd_send_queue() function to better manage the various audit
queues when connection problems occur between the kernel and the
audit daemon.  With this patch, the backlog may temporarily grow
beyond the defined limits when the audit daemon is stopped and the
system is under heavy audit pressure, but kauditd_thread() will
continue to make progress and drain the queues as it would for other
connection problems.  For example, with the audit daemon put into a
stopped state and the system configured to audit every syscall it
was still possible to shutdown the system without a kernel panic,
deadlock, etc.; granted, the system was slow to shutdown but that is
to be expected given the extreme pressure of recording every syscall.

The timeout value of HZ/10 was chosen primarily through
experimentation and this developer's "gut feeling".  There is likely
no one perfect value, but as this scenario is limited in scope (root
privileges would be needed to send SIGSTOP to the audit daemon), it
is likely not worth exposing this as a tunable at present.  This can
always be done at a later date if it proves necessary.

Cc: stable@vger.kernel.org
Fixes: 5b52330bbf ("audit: fix auditd/kernel connection state tracking")
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-15 13:16:39 -05:00
Jesse Brandeburg 9c99d099f7 ice: use modern kernel API for kick
The kernel gained a new interface for drivers to use to combine tail
bump (doorbell) and BQL updates, attempt to use those new interfaces.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:49:25 -08:00
Jesse Brandeburg 21c6e36b1e ice: tighter control over VSI_DOWN state
The driver had comments to the effect of: This flag should be set before
calling this function. While reviewing code it was found that there were
several violations of this policy, which could introduce hard to find
bugs or races.

Fix the violations of the "VSI DOWN state must be set before calling
ice_down" and make checking the state into code with a WARN_ON.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:48:26 -08:00
Jesse Brandeburg cc14db11c8 ice: use prefetch methods
The kernel provides some prefetch mechanisms to speed up commonly
cold cache line accesses during receive processing. Since these are
software structures it helps to have these strategically placed
prefetches.

Be careful to call BQL prefetch complete only for non XDP queues.

Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:46:28 -08:00
Jesse Brandeburg 1c96c16858 ice: update to newer kernel API
Use the netif_tx_* API from netdevice.h which has simpler parameters.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:45:28 -08:00
Jacob Keller 399e27dbbd ice: support immediate firmware activation via devlink reload
The ice hardware contains an embedded chip with firmware which can be
updated using devlink flash. The firmware which runs on this chip is
referred to as the Embedded Management Processor firmware (EMP
firmware).

Activating the new firmware image currently requires that the system be
rebooted. This is not ideal as rebooting the system can cause unwanted
downtime.

In practical terms, activating the firmware does not always require a
full system reboot. In many cases it is possible to activate the EMP
firmware immediately. There are a couple of different scenarios to
cover.

 * The EMP firmware itself can be reloaded by issuing a special update
   to the device called an Embedded Management Processor reset (EMP
   reset). This reset causes the device to reset and reload the EMP
   firmware.

 * PCI configuration changes are only reloaded after a cold PCIe reset.
   Unfortunately there is no generic way to trigger this for a PCIe
   device without a system reboot.

When performing a flash update, firmware is capable of responding with
some information about the specific update requirements.

The driver updates the flash by programming a secondary inactive bank
with the contents of the new image, and then issuing a command to
request to switch the active bank starting from the next load.

The response to the final command for updating the inactive NVM flash
bank includes an indication of the minimum reset required to fully
update the device. This can be one of the following:

 * A full power on is required
 * A cold PCIe reset is required
 * An EMP reset is required

The response to the command to switch flash banks includes an indication
of whether or not the firmware will allow an EMP reset request.

For most updates, an EMP reset is sufficient to load the new EMP
firmware without issues. In some cases, this reset is not sufficient
because the PCI configuration space has changed. When this could cause
incompatibility with the new EMP image, the firmware is capable of
rejecting the EMP reset request.

Add logic to ice_fw_update.c to handle the response data flash update
AdminQ commands.

For the reset level, issue a devlink status notification informing the
user of how to complete the update with a simple suggestion like
"Activate new firmware by rebooting the system".

Cache the status of whether or not firmware will restrict the EMP reset
for use in implementing devlink reload.

Implement support for devlink reload with the "fw_activate" flag. This
allows user space to request the firmware be activated immediately.

For the .reload_down handler, we will issue a request for the EMP reset
using the appropriate firmware AdminQ command. If we know that the
firmware will not allow an EMP reset, simply exit with a suitable
netlink extended ACK message indicating that the EMP reset is not
available.

For the .reload_up handler, simply wait until the driver has finished
resetting. Logic to handle processing of an EMP reset already exists in
the driver as part of its reset and rebuild flows.

Implement support for the devlink reload interface with the
"fw_activate" action. This allows userspace to request activation of
firmware without a reboot.

Note that support for indicating the required reset and EMP reset
restriction is not supported on old versions of firmware. The driver can
determine if the two features are supported by checking the device
capabilities report. I confirmed support has existed since at least
version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
EMP reset request has existed in all version of the EMP firmware for the
ice hardware.

Check the device capabilities report to determine whether or not the
indications are reported by the running firmware. If the reset
requirement indication is not supported, always assume a full power on
is necessary. If the reset restriction capability is not supported,
always assume the EMP reset is available.

Users can verify if the EMP reset has activated the firmware by using
the devlink info report to check that the 'running' firmware version has
updated. For example a user might do the following:

 # Check current version
 $ devlink dev info

 # Update the device
 $ devlink dev flash pci/0000:af:00.0 file firmware.bin

 # Confirm stored version updated
 $ devlink dev info

 # Reload to activate new firmware
 $ devlink dev reload pci/0000:af:00.0 action fw_activate

 # Confirm running version updated
 $ devlink dev info

Finally, this change does *not* implement basic driver-only reload
support. I did look into trying to do this. However, it requires
significant refactor of how the ice driver probes and loads everything.
The ice driver probe and allocation flows were not designed with such
a reload in mind. Refactoring the flow to support this is beyond the
scope of this change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:40:38 -08:00
Jacob Keller af18d8866c ice: reduce time to read Option ROM CIVD data
During probe and device reset, the ice driver reads some data from the
NVM image as part of ice_init_nvm. Part of this data includes a section
of the Option ROM which contains version information.

The function ice_get_orom_civd_data is used to locate the '$CIV' data
section of the Option ROM.

Timing of ice_probe and ice_rebuild indicate that the
ice_get_orom_civd_data function takes about 10 seconds to finish
executing.

The function locates the section by scanning the Option ROM every 512
bytes. This requires a significant number of NVM read accesses, since
the Option ROM bank is 500KB. In the worst case it would take about 1000
reads. Worse, all PFs serialize this operation during reload because of
acquiring the NVM semaphore.

The CIVD section is located at the end of the Option ROM image data.
Unfortunately, the driver has no easy method to determine the offset
manually. Practical experiments have shown that the data could be at
a variety of locations, so simply reversing the scanning order is not
sufficient to reduce the overall read time.

Instead, copy the entire contents of the Option ROM into memory. This
allows reading the data using 4Kb pages instead of 512 bytes at a time.
This reduces the total number of firmware commands by a factor of 8. In
addition, reading the whole section together at once allows better
indication to firmware of when we're "done".

Re-write ice_get_orom_civd_data to allocate virtual memory to store the
Option ROM data. Copy the entire OptionROM contents at once using
ice_read_flash_module. Finally, use this memory copy to scan for the
'$CIV' section.

This change significantly reduces the time to read the Option ROM CIVD
section from ~10 seconds down to ~1 second. This has a significant
impact on the total time to complete a driver rebuild or probe.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:38:14 -08:00
Jacob Keller c9f7a483e4 ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image
The ice_devlink_flash_update function performs a few upfront checks and
then calls ice_flash_pldm_image.

Most if these checks make more sense in the context of code within
ice_flash_pldm_image. Merge ice_devlink_flash_update and
ice_flash_pldm_image into one function, placing it in ice_fw_update.c

Since this is still the entry point for devlink, call the function
ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a
single function which handles the devlink parameters and then initiates
a PLDM update.

With this change, the ice_devlink_flash_update function in
ice_fw_update.c becomes the main entry point for flash update. It
elimintes some unnecessary boiler plate code between the two previous
functions. The ultimate motivation for this is that it eases supporting
a dry run with the PLDM library in a future change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:37:14 -08:00
Jacob Keller c356eaa824 ice: move and rename ice_check_for_pending_update
The ice_devlink_flash_update function performs a few checks and then
calls ice_flash_pldm_image. One of these checks is to call
ice_check_for_pending_update. This function checks if the device has
a pending update, and cancels it if so. This is necessary to allow
a new flash update to proceed.

We want to refactor the ice code to eliminate ice_devlink_flash_update,
moving its checks into ice_flash_pldm_image.

To do this, ice_check_for_pending_update will become static, and only
called by ice_flash_pldm_image. To make this change easier to review,
first just move the function up within the ice_fw_update.c file.

While at it, note that the function has a misleading name. Its primary
action is to cancel a pending update. Using the verb "check" does not
imply this. Rename it to ice_cancel_pending_update.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:36:10 -08:00
Jacob Keller 78ad87da99 ice: devlink: add shadow-ram region to snapshot Shadow RAM
We have a region for reading the contents of the NVM flash as
a snapshot. This region does not allow reading the Shadow RAM, as it
always passes the FLASH_ONLY bit to the low level firmware interface.

Add a separate shadow-ram region which will allow snapshot of the
current contents of the Shadow RAM. This data is built from the NVM
contents but is distinct as the device builds up the Shadow RAM during
initialization, so being able to snapshot its contents can be useful
when attempting to debug flash related issues.

Fix the comment description of the nvm-flash region which incorrectly
stated that it filled the shadow-ram region, and add a comment
explaining that the nvm-flash region does not actually read the Shadow
RAM.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:34:54 -08:00
Nathan Chancellor a708376361 soc/tegra: fuse: Fix bitwise vs. logical OR warning
A new warning in clang points out two instances where boolean
expressions are being used with a bitwise OR instead of logical OR:

drivers/soc/tegra/fuse/speedo-tegra20.c:72:9: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
                reg = tegra_fuse_read_spare(i) |
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~
                                               ||
drivers/soc/tegra/fuse/speedo-tegra20.c:72:9: note: cast one or both operands to int to silence this warning
drivers/soc/tegra/fuse/speedo-tegra20.c:87:9: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
                reg = tegra_fuse_read_spare(i) |
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~
                                               ||
drivers/soc/tegra/fuse/speedo-tegra20.c:87:9: note: cast one or both operands to int to silence this warning
2 warnings generated.

The motivation for the warning is that logical operations short circuit
while bitwise operations do not.

In this instance, tegra_fuse_read_spare() is not semantically returning
a boolean, it is returning a bit value. Use u32 for its return type so
that it can be used with either bitwise or boolean operators without any
warnings.

Fixes: 25cd5a3914 ("ARM: tegra: Add speedo-based process identification")
Link: https://github.com/ClangBuiltLinux/linux/issues/1488
Suggested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-15 17:19:06 +01:00
Jakub Kicinski 3bc14ea0d1 ethtool: always write dev in ethnl_parse_header_dev_get
Commit 0976b888a1 ("ethtool: fix null-ptr-deref on ref tracker")
made the write to req_info.dev conditional, but as Eric points out
in a different follow up the structure is often allocated on the
stack and not kzalloc()'d so seems safer to always write the dev,
in case it's garbage on input.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:09:24 +00:00
Eric Dumazet f1d9268e06 net: add net device refcount tracker to struct packet_type
Most notable changes are in af_packet, tipc ones are trivial.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:07:04 +00:00
David S. Miller ab8c83cf87 Merge branch 'mlxsw-ipv6-underlay'
Ido Schimmel says:

====================
mlxsw: Add support for VxLAN with IPv6 underlay

So far, mlxsw only supported VxLAN with IPv4 underlay. This patchset
extends mlxsw to also support VxLAN with IPv6 underlay. The main
difference is related to the way IPv6 addresses are handled by the
device. See patch #1 for a detailed explanation.

Patch #1 creates a common hash table to store the mapping from IPv6
addresses to KVDL indexes. This table is useful for both IP-in-IP and
VxLAN tunnels with an IPv6 underlay.

Patch #2 converts the IP-in-IP code to use the new hash table.

Patches #3-#6 are preparations.

Patch #7 finally adds support for VxLAN with IPv6 underlay.

Patch #8 removes a test case that checked that VxLAN configurations with
IPv6 underlay are vetoed by the driver.

A follow-up patchset will add forwarding selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen fb488be8c2 selftests: mlxsw: vxlan: Remove IPv6 test case
Currently, there is a test case to verify that VxLAN with IPv6 underlay
is forbidden.

Remove this test case as support for VxLAN with IPv6 underlay was added
by the previous patch.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen 06c08f869c mlxsw: Add support for VxLAN with IPv6 underlay
Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
Add support for IPv6 underlay.

The main differences are:

* Learning is not supported for IPv6 FDB entries, use static entries and
  do not allow 'learning' flag for IPv6 VxLAN.

* IPv6 addresses for FDB entries should be saved as part of KVDL.
  Use the new API to allocate and release entries for IPv6 addresses.

* Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
  packets with checksum zero are dropped.
  Force the relevant flags which allow the VxLAN device to generate UDP
  packets with zero checksum and also receive them.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen 0860c76416 mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entries
FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold
a reference on a resource. Namely, the KVDL entry where the IPv6
underlay destination IP is stored. When such an FDB entry is deleted, it
needs to drop the reference from the corresponding KVDL entry.

To that end, maintain a hash table that maps an FDB entry (i.e., {MAC,
FID}) to the IPv6 address used by it.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen 4b08c3e676 mlxsw: reg: Add a function to fill IPv6 unicast FDB entries
Add a function to fill IPv6 unicast FDB entries. Use the common function
for common fields.

Unlike IPv4 entries, the underlay IP address is not filled in the
register payload, but instead a pointer to KVDL is used.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen 1fd85416e3 mlxsw: Split handling of FDB tunnel entries between address families
Currently, the function which adds/removes unicast tunnel FDB entries is
shared between IPv4 and IPv6, while for IPv6 it warns because there is
no support for it.

The code for IPv6 will be more complicated because it needs to
allocate/release a KVDL pointer for the underlay IPv6 address.

As a preparation for IPv6 underlay support, split the code according to
address family.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen 720d683cbe mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address family
As part of 'can_offload' checks, there is a check of VxLAN flags.

The supported flags for IPv6 VxLAN will be different from the existing
flags because of some limitations.

As preparation for IPv6 underlay support, make this check per address
family.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:44 +00:00
Amit Cohen cf42911523 mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping
Use the common hash table introduced by the previous patch instead of
the IP-in-IP specific implementation.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:43 +00:00
Amit Cohen e846efe273 mlxsw: spectrum: Add hash table for IPv6 address mapping
The device supports forwarding entries such as routes and FDBs that
perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation.
When the underlay is IPv6, these entries do not encode the 128 bit IPv6
address used for encapsulation / decapsulation. Instead, these entries
encode a 24 bit pointer to an array called KVDL where the IPv6 address
is stored.

Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent
patches will add support for VxLAN with IPv6 underlay. To avoid
duplicating the logic required to store and retrieve these IPv6
addresses, introduce a hash table that will store the mapping between
IPv6 addresses and their KVDL index.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:05:43 +00:00
David S. Miller f71f1bcbd8 mlx5-updates-2021-12-14
Parsing Infrastructure for TC actions:
 
 The series introduce a TC action infrastructure to help
 parsing TC actions in a generic way for both FDB and NIC rules.
 
 To help maintain the parsing code of TC actions, we the parsing code to
 action parser per action TC type in separate files, instead of having one
 big switch case loop, duplicated between FDB and NIC parsers as before this
 patchset.
 
 Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler
 which has callbacks to check if the specific action is offload supported and
 to parse the specific action.
 
 We move each case (TC action) handling into the specific handler, which is
 responsible for parsing and determining if the action is supported.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmG5fUoACgkQSD+KveBX
 +j6FXwgAuth9IVE/9N/KRxlTmdG2MqHF4fXFFGtQgZ+f1a7ViwsyEXtxGb4mISzF
 EVF14etoyAuvHSFZDhD/8uqxwAKe+kGywT6BVzYKHHeRQbPRdUulOQ4AEa/CmJ6C
 fJF5d3I2ktoSkGIn1L9sOLJQJ1bWy+qpohBkkW0q0fdW1kjb2QPb0hXtIQA0gM2J
 zXZVPt0yHg21Px3stYn3HSUCdxjY9CweXZRsP5uJ7eMmDxCp7qb3xFXzzExjI9zF
 d+3QM3rRj9GBAGJHGTMYrpRVSeVPwdJL2WgA1YTO7qNXHTtewGjizHtEIKTyxBRG
 3e9HZjA4ZuwzZDlCegisw/WCE/dUIg==
 =6jKv
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saed Mahameed says:

====================
mlx5-updates-2021-12-14

Parsing Infrastructure for TC actions:

The series introduce a TC action infrastructure to help
parsing TC actions in a generic way for both FDB and NIC rules.

To help maintain the parsing code of TC actions, we the parsing code to
action parser per action TC type in separate files, instead of having one
big switch case loop, duplicated between FDB and NIC parsers as before this
patchset.

Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler
which has callbacks to check if the specific action is offload supported and
to parse the specific action.

We move each case (TC action) handling into the specific handler, which is
responsible for parsing and determining if the action is supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 14:46:33 +00:00
David S. Miller 1d1c950faa wireless-drivers fixes for v5.16
Second set of fixes for v5.16, hopefully also the last one. I changed
 my email in MAINTAINERS, one crash fix in iwlwifi and some build
 problems fixed.
 
 iwlwifi
 
 * fix crash caused by a warning
 
 * fix LED linking problem
 
 brcmsmac
 
 * rework LED dependencies for being consistent with other drivers
 
 mt76
 
 * mt7921: fix build regression
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmG5+OoRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZt8cQgAirUKiqxpaEvrSNLU+H1qzzXtBeX1+jzL
 pJXJAmCgZua2rAYMO4VWNo4wYg4huXG5DfNbxZDRSJLJPwHjNAicFQltZEDdJS+L
 9LIWSLQTrEkZwSLscoSC6wEAcOjjINUv6v/ulnTUTmhN+yJ2LqgV7K9CEF80Oxag
 FmgHa4c35fzBzXEMGIR0LKjB+gp44PrzCHnb9Ct5AS8neP31v5Zwgnk3IOJBv1nX
 pOtNPzhlwF0zAafTzYS/gYcBoCkLTILhFc6kFyh08YnV6uZdAoYbvMka9GeEaAIR
 QrGPAnT/bQ/s/p50O6kv8YzLTnfMFLQEdMJAaPl4Bw8WEfW64hkEWg==
 =yguO
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-2021-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.16

Second set of fixes for v5.16, hopefully also the last one. I changed
my email in MAINTAINERS, one crash fix in iwlwifi and some build
problems fixed.

iwlwifi

* fix crash caused by a warning

* fix LED linking problem

brcmsmac

* rework LED dependencies for being consistent with other drivers

mt76

* mt7921: fix build regression
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 14:43:07 +00:00
zhangyue 4d375c2e51 rsi: fix array out of bound
Limit the max of 'ii'. If 'ii' greater than or
equal to 'RSI_MAX_VIFS', the array 'adapter->vifs'
may be out of bound

Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20211208095341.47777-1-zhangyue1@kylinos.cn
2021-12-15 16:28:26 +02:00
David S. Miller 7c8089f980 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-12-14

This series contains updates to ice driver only.

Karol corrects division that was causing incorrect calculations and
adds a check to ensure stale timestamps are not being used.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 11:01:13 +00:00
David S. Miller 5a21bf5bb4 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-12-14

This series contains updates to ice driver only.

Haiyue adds support to query hardware for supported PTYPEs.

Jeff changes PTYPE validation to utilize the capabilities queried from
the hardware instead of maintaining a per DDP support list.

Brett refactors promiscuous functions to provide common and clear
interfaces to call for configuration.

Wojciech modifies DDP package load to simplify determining the final
state of the load.

Tony removes the use of ice_status from the driver. This involves
removing string conversion functions, converting variables and values to
standard errors, and clean up. He also removes an unused define.

Dan Carpenter removes unneeded casts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:59:57 +00:00
Joakim Zhang 0b6f65c707 net: fec: fix system hang during suspend/resume
1. During normal suspend (WoL not enabled) process, system has posibility
to hang. The root cause is TXF interrupt coming after clocks disabled,
system hang when accessing registers from interrupt handler. To fix this
issue, disable all interrupts when system suspend.

2. System also has posibility to hang with WoL enabled during suspend,
after entering stop mode, then magic pattern coming after clocks
disabled, system will be waked up, and interrupt handler will be called,
system hang when access registers. To fix this issue, disable wakeup
irq in .suspend(), and enable it in .resume().

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:30:25 +00:00
Clément Léger 8438699512 net: ocelot: add support to get port mac from device-tree
Add support to get mac from device-tree using of_get_ethdev_address.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:29:41 +00:00
Conley Lee 3899c928bc sun4i-emac.c: remove unnecessary branch
According to the current implementation of emac_rx, every arrived packet
will be processed in the while loop. So, there is no remain packet last
time. The skb_last field and this branch for dealing with it is
unnecessary.

Signed-off-by: Conley Lee <conleylee@foxmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:29:03 +00:00
Eric Dumazet 34ac17ecbf ethtool: use ethnl_parse_header_dev_put()
It seems I missed that most ethnl_parse_header_dev_get() callers
declare an on-stack struct ethnl_req_info, and that they simply call
dev_put(req_info.dev) when about to return.

Add ethnl_parse_header_dev_put() helper to properly untrack
reference taken by ethnl_parse_header_dev_get().

Fixes: e4b8954074 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 10:27:47 +00:00
Roi Dayan 35bb524214 net/mlx5e: Move goto action checks into tc_action goto post parse op
Move goto action checks from parse nic/fdb funcs into the tc action
infra goto post parse op.
While moving this part also use NL_SET_ERR_MSG_MOD() instead of
NL_SET_ERR_MSG().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:46 -08:00
Roi Dayan c22080352e net/mlx5e: Move vlan action chunk into tc action vlan post parse op
Move vlan prio tag rewrite handling into tc action infra vlan post parse op.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:46 -08:00
Roi Dayan dd5ab6d115 net/mlx5e: Add post_parse() op to tc action infrastructure
The post_parse() op should be called after the parse op was called
for all actions. It could be an action state is dependent on other
actions. In the new op an action can fail the parse if the state
is not valid anymore.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:45 -08:00
Roi Dayan 6bcba1bded net/mlx5e: Move sample attr allocation to tc_action sample parse op
There is no reason to wait with the kmalloc to after parsing all
other actions. There could still be a failure later and before
offloading the rule. So alloc the mem when parsing.
The memory is being released on mlx5e_flow_put() which is called
also on error flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:45 -08:00
Roi Dayan 8333d53e3f net/mlx5e: TC action parsing loop
Introduce a common function to implement the generic parsing loop.
The same function can be used for parsing NIC and FDB (Switchdev mode) flows.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:45 -08:00
Roi Dayan 922d69ed96 net/mlx5e: Add redirect ingress to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:44 -08:00
Roi Dayan 3929ff583d net/mlx5e: Add sample and ptype to tc_action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:44 -08:00
Roi Dayan 758bc13422 net/mlx5e: Add ct to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:44 -08:00
Roi Dayan ab3f3d5eff net/mlx5e: Add mirred/redirect to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:43 -08:00
Roi Dayan 163b766f56 net/mlx5e: Add mpls push/pop to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:43 -08:00
Roi Dayan 8ee7263834 net/mlx5e: Add vlan push/pop/mangle to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:43 -08:00
Roi Dayan e36db1ee7a net/mlx5e: Add pedit to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:42 -08:00
Roi Dayan 9ca1bb2cf6 net/mlx5e: Add csum to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:42 -08:00
Roi Dayan c65686d79c net/mlx5e: Add tunnel encap/decap to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:42 -08:00
Roi Dayan 67d62ee7f4 net/mlx5e: Add goto to tc action infra
Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:41 -08:00
Roi Dayan fad5479069 net/mlx5e: Add tc action infrastructure
Add an infrastructure to help parsing tc actions in a generic way.

Supporting an action parser means implementing struct mlx5e_tc_act
for that action.

The infrastructure will give the possibility to be generic when parsing tc
actions, i.e. parse_tc_nic_actions() and parse_tc_fdb_actions().
To parse tc actions a user needs to allocate a parse_state instance
and pass it when iterating over the tc actions parsers.
If a parser doesn't exists then a user can treat it as unsupported.

To add an action parser a user needs to implement two callbacks.
The can_offload() callback to quickly check if an action can be offloaded.
The parse_action() callback to do actual parsing and prepare for offload.

Add implementation for drop, trap, mark and accept action parsers with this
commit to act as examples and implement usage of the new infrastructure for
those actions.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14 21:29:41 -08:00
Daniel Borkmann e523102cb7 bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
Fix up unprivileged test case results for 'Dest pointer in r0' verifier tests
given they now need to reject R0 containing a pointer value, and add a couple
of new related ones with 32bit cmpxchg as well.

  root@foo:~/bpf/tools/testing/selftests/bpf# ./test_verifier
  #0/u invalid and of negative number OK
  #0/p invalid and of negative number OK
  [...]
  #1268/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
  #1269/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
  #1270/p XDP pkt read, pkt_data <= pkt_meta', good access OK
  #1271/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #1272/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1900 PASSED, 0 SKIPPED, 0 FAILED

Acked-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14 19:33:06 -08:00
Daniel Borkmann a82fe085f3 bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg
The implementation of BPF_CMPXCHG on a high level has the following parameters:

  .-[old-val]                                          .-[new-val]
  BPF_R0 = cmpxchg{32,64}(DST_REG + insn->off, BPF_R0, SRC_REG)
                          `-[mem-loc]          `-[old-val]

Given a BPF insn can only have two registers (dst, src), the R0 is fixed and
used as an auxilliary register for input (old value) as well as output (returning
old value from memory location). While the verifier performs a number of safety
checks, it misses to reject unprivileged programs where R0 contains a pointer as
old value.

Through brute-forcing it takes about ~16sec on my machine to leak a kernel pointer
with BPF_CMPXCHG. The PoC is basically probing for kernel addresses by storing the
guessed address into the map slot as a scalar, and using the map value pointer as
R0 while SRC_REG has a canary value to detect a matching address.

Fix it by checking R0 for pointers, and reject if that's the case for unprivileged
programs.

Fixes: 5ffa25502b ("bpf: Add instructions for atomic_[cmp]xchg")
Reported-by: Ryota Shiga (Flatt Security)
Acked-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14 19:33:06 -08:00
Daniel Borkmann 180486b430 bpf, selftests: Add test case for atomic fetch on spilled pointer
Test whether unprivileged would be able to leak the spilled pointer either
by exporting the returned value from the atomic{32,64} operation or by reading
and exporting the value from the stack after the atomic operation took place.

Note that for unprivileged, the below atomic cmpxchg test case named "Dest
pointer in r0 - succeed" is failing. The reason is that in the dst memory
location (r10 -8) there is the spilled register r10:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (bf) r0 = r10
  1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (7b) *(u64 *)(r10 -8) = r0
  2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
  2: (b7) r1 = 0
  3: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=fp
  3: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r1)
  4: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
  4: (79) r1 = *(u64 *)(r0 -8)
  5: R0_w=fp0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
  5: (b7) r0 = 0
  6: R0_w=invP0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
  6: (95) exit

However, allowing this case for unprivileged is a bit useless given an
update with a new pointer will fail anyway:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (bf) r0 = r10
  1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (7b) *(u64 *)(r10 -8) = r0
  2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
  2: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r10)
  R10 leaks addr into mem

Acked-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14 19:33:06 -08:00