Commit Graph

841533 Commits

Author SHA1 Message Date
Julian Wiedmann 2066e1db9e s390/qeth: convert RCD code to common IO infrastructure
The RCD code is the last remaining IO path that doesn't use the
qeth_send_control_data() infrastructure. Doing so allows us to remove
all sorts of custom state machinery and logic in the IRQ handler.

Instead of introducing statically allocated cmd buffers for this single
IO on the data channel, use the new qeth_alloc_cmd() helper.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 405548959c s390/qeth: add support for dynamically allocated cmds
qeth currently uses a fixed set of statically allocated cmd buffers for
the read and write IO channels. This (1) doesn't play well with the single
RCD cmd we need to issue on the data channel, (2) doesn't provide the
necessary flexibility for certain IDX improvements, and (3) is also rather
wasteful since the buffers are idle most of the time.

Add a new type of cmd buffer that is dynamically allocated, and keeps
its ccw chain in the DMA data area. Since this touches most callers of
qeth_setup_ccw(), also add a new CCW flags parameter for future usage.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann fcda7f73b6 s390/qeth: remove 'channel' parameter from callbacks
Each cmd buffer maintains a pointer to the IO channel that it was/will
be issued on. So when dealing with cmd buffers, we don't need to pass
around a separate channel pointer.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 57a688aa22 s390/qeth: convert device-specific trace entries
The vast majority of SETUP-classified trace entries can be moved to
their device-specific trace file. This reduces pollution of the global
SETUP file, and provides a consistent trace view of all activity on the
device.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 0ce37ec297 s390/qeth: remove OSN-specific IO code
OSN currently provides a custom code path to submit IPA cmds, without
waiting for the cmd response. Replace it with qeth_send_ipa_cmd(), which
uses the common qeth_send_control_data() IO infrastructure.

By setting a custom iob->callback, we can now provide feedback to the
caller about whether the cmd has been successfully submitted to HW.
Since the callback then immediately wakes up the reply-waiter object, we
maintain the old behaviour of returning early without waiting for the
response.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 1273a80014 s390/qeth: remove qeth_wait_for_buffer()
The basic MPC initialization sequence is strictly sequential, and
waiting for an available cmd buffer should never be necessary.
So this change only affects the OSN path, where dangling waiters on an
unbounded wait_event() are not desirable. Switch to qeth_get_buffers(),
and let OSN callers deal with -ENOMEM.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 09ac887f03 s390/qeth: clean up setting of BLKT defaults
When called from qeth_core_probe_device(), qeth_determine_capabilities()
initializes the device's BLKT defaults. From all other callers, the
ccw_device has already been set online and the BLKT setting is skipped.

Clean this up by extracting the BLKT setting into a separate helper that
gets called from the right place.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 7cbc9e8fe6 s390/qeth: restart pending READ cmd from callback
The completion of a pending READ cmd is processed via
qeth_issue_next_read_cb(). Let this callback also start the next READ
cmd, instead of hardcoding that step into the IRQ handler.

While at it remove the check of the channel state,
__qeth_issue_next_read() already does this.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann ad16087222 s390/qeth: simplify DOWN state handling
When the tear down sequence in qeth_l?_stop_card() has finished, the
card is guaranteed to be in DOWN state and we don't have to check for
it again.
With this insight we can also remove the redundant setting of
card->state in qeth_l?_set_online()'s error path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann 4e2fe4edca s390/qeth: use mm helpers
Slightly reduce the complexity of the core xmit path, by replacing some
open-coded logic with the corresponding helpers.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
Julian Wiedmann e8b1116118 s390/qeth: don't mask TX errors on IQD devices
Current code suppresses debug entries when an TX buffer completes in
ERROR state with no error indication set in SBALF15.
This was introduced back with
commit 58490f1807 ("qeth: HiperSockets SIGA retry support on CC=2.").
But qeth no longer retries after CC=2, and this sort of suppression
make no sense anymore. Remove it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:39:31 -07:00
David S. Miller 11817aa69b Merge branch 'mlxsw-Add-support-for-physical-hardware-clock'
Ido Schimmel says:

====================
mlxsw: Add support for physical hardware clock

Shalom says:

This patchset adds support for physical hardware clock for Spectrum-1
ASIC only.

Patches #1, #2 and #3 add the ability to query the free running clock
PCI address.

Patches #4 and #5 add two new register, the Management UTC Register and
the Management Pulse Per Second Register.

Patch #6 publishes scaled_ppm_to_ppb() to allow drivers to use it.

Patch #7 adds the physical hardware clock operations.

Patch #8 initializes the physical hardware clock.

Patch #9 adds a selftest for testing the PTP physical hardware clock.

v2 (Richard):
* s/ptp_clock_scaled_ppm_to_ppb/scaled_ppm_to_ppb/
* imply PTP_1588_CLOCK in mlxsw Kconfig
* s/mlxsw_sp1_ptp_update_phc_settime/mlxsw_sp1_ptp_phc_settime/
* s/mlxsw_sp1_ptp_update_phc_adjfreq/mlxsw_sp1_ptp_phc_adjfreq/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 9366211f32 selftests: ptp: Add Physical Hardware Clock test
Test the PTP Physical Hardware Clock functionality using the "phc_ctl" (a
part of "linuxptp").

The test contains three sub-tests:
  * "settime" test
  * "adjtime" test
  * "adjfreq" test

"settime" test:
  * set the PHC time to 0 seconds.
  * wait for 120.5 seconds.
  * check if PHC time equal to 120.XX seconds.

"adjtime" test:
  * set the PHC time to 0 seconds.
  * adjust the time by 10 seconds.
  * check if PHC time equal to 10.XX seconds.

"adjfreq" test:
  * adjust the PHC frequency to be 1% faster.
  * set the PHC time to 0 seconds.
  * wait for 100.5 seconds.
  * check if PHC time equal to 101.XX seconds.

Usage:
  $ ./phc.sh /dev/ptp<X>

  It is possible to run a subset of the tests, for example:
    * To run only the "settime" test:
      $ TESTS="settime" ./phc.sh /dev/ptp<X>

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 412cd2ad18 mlxsw: spectrum: PTP physical hardware clock initialization
Initialize the PTP physical hardware clock.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 992aa864dc mlxsw: spectrum_ptp: Add implementation for physical hardware clock operations
Implement physical hardware clock operations.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 4368dada5b ptp: ptp_clock: Publish scaled_ppm_to_ppb
Publish scaled_ppm_to_ppb to allow drivers to use it.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 1078645af6 mlxsw: reg: Add Management Pulse Per Second Register
The MTPPS register provides the device PPS capabilities, configure the PPS
in and out modules and holds the PPS in time stamp.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 55a8b00157 mlxsw: reg: Add Management UTC Register
The MTUTC register configures the HW UTC counter.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 8289169dd2 mlxsw: pci: Query free running clock PCI BAR and offsets
Query free running clock PCI BAR and offsets during the pci_init.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 34dacb4d51 mlxsw: core: Add a new interface for reading the hardware free running clock
Add two new bus operations for reading the hardware free running clock.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Shalom Toledo 4b6b8c02f6 mlxsw: cmd: Free running clock PCI BAR and offsets via query firmware
Add free running clock PCI BAR and offset to query firmware command.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:34:55 -07:00
Roman Mashak 514fcaac37 tc-tests: updated fw with bind actions by reference use cases
Extended fw TDC tests with use cases where actions are pre-created and
attached to a filter by reference, i.e. by action index.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 22:32:15 -07:00
David S. Miller 425b0fad9c Merge branch 'net-stmmac-Convert-to-phylink'
Jose Abreu says:

====================
net: stmmac: Convert to phylink

This converts stmmac to use phylink. Besides the code redution this will
allow to gain more flexibility.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 14:02:09 -07:00
Jose Abreu 74371272f9 net: stmmac: Convert to phylink and remove phylib logic
Convert everything to phylink.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 14:02:09 -07:00
Jose Abreu eeef2f6b9f net: stmmac: Start adding phylink support
Start adding the phylink callbacks.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 14:02:09 -07:00
Jose Abreu 9ad372fc5a net: stmmac: Prepare to convert to phylink
In preparation for the convertion, split the adjust_link function into
mac_config and add the mac_link_up and mac_link_down functions.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 14:02:09 -07:00
YueHaibing 5948d11766 qede: Make two functions static
Fix sparse warning:

drivers/net/ethernet/qlogic/qede/qede_main.c:963:6:
 warning: symbol 'qede_lock' was not declared. Should it be static?
drivers/net/ethernet/qlogic/qede/qede_main.c:969:6:
 warning: symbol 'qede_unlock' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 13:59:33 -07:00
YueHaibing 1dbb98699c net: dsa: sja1105: Make two functions static
Fix sparse warnings:

drivers/net/dsa/sja1105/sja1105_main.c:1848:6:
 warning: symbol 'sja1105_port_rxtstamp' was not declared. Should it be static?
drivers/net/dsa/sja1105/sja1105_main.c:1869:6:
 warning: symbol 'sja1105_port_txtstamp' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-13 13:58:32 -07:00
Moshe Shemesh 06efeb5555 Documentation: net: mlx5: Devlink health documentation
Documentation for devlink health reporters supported by mlx5.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:19 -07:00
Moshe Shemesh b3bd076f75 net/mlx5: Report devlink health on FW fatal issues
Report devlink health on FW fatal issues via fw_fatal_reporter. The
driver recover flow for FW fatal error is now being handled by the
devlink health.

Having the recovery controlled by devlink health, the user has the
ability to cancel the auto-recovery for debug session and run it
manually.

Call mlx5_enter_error_state() before calling devlink_health_report() to
ensure entering device error state even if auto-recovery is off.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:19 -07:00
Moshe Shemesh 9b1f298236 net/mlx5: Add support for FW fatal reporter dump
Add support of dump callback for mlx5 FW fatal reporter.
The FW fatal dump uses cr-dump functionality to gather cr-space data for
debug. The cr-dump uses vsc interface which is valid even if the FW
command interface is not functional, which is the case in most FW fatal
errors.

Command example and output:
$ devlink health dump show pci/0000:82:00.0 reporter fw_fatal
 crdump_data:
  00 20 00 01 00 00 00 00 03 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 ba 82 00 00
  0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 00
  a4 0e 00 00 00 00 00 00 80 c7 fe ff 50 0a 00 00
...
...

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:19 -07:00
Moshe Shemesh 96c82cdfe7 net/mlx5: Add fw fatal devlink_health_reporter
Create mlx5_devlink_health_reporter for fw fatal reporter.
The fw fatal reporter is added in addition to the fw reporter and
implements the recover callback.
The point of having two reporters for FW issues, is that we
don't want to run FW recover on any issue, but only fatal ones.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:19 -07:00
Moshe Shemesh d1bf0e2cc4 net/mlx5: Report devlink health on FW issues
Use devlink_health_report() to report any symptom of FW issue as FW
counter miss or new health syndrome.
The FW issues detected in mlx5 during poll_health which is called in
timer atomic context and so health work queue is used to schedule the
reports.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:19 -07:00
Moshe Shemesh fd1483fe1f net/mlx5: Add support for FW reporter dump
Add support of dump callback for mlx5 FW reporter.  Once we trigger FW
dump, the FW will write the core dump to its raw data buffer. The tracer
translates the raw data to traces and save it to a cyclic array. Once
dump is done, the saved traces data is filled into the dump buffer. In
case syndrome is not zero the health buffer content will be printed as
well.

FW dump example:
$ devlink health dump show pci/0000:82:00.0 reporter fw
 dump fw traces:
   timestamp: 509006640427 lost: false event_id: 185 msg: dump general
info GVMI=0x0000
   timestamp: 509006645474 lost: false event_id: 185 msg: GVMI
management info, gvmi_management context:
   timestamp: 509006654463 lost: false event_id: 185 msg: [000]:
00000000  00000000  00000000  00000000
   timestamp: 509006656127 lost: false event_id: 185 msg: [010]:
00000000  00000000  00000000  00000000
   timestamp: 509006656255 lost: false event_id: 185 msg: [020]:
00000000  00000000  00000000  00000000
   timestamp: 509006656511 lost: false event_id: 185 msg: [030]:
00000000  00000000  00000000  00000000
   timestamp: 509006656639 lost: false event_id: 185 msg: [040]:
00000000  00000000  00000000  00000000
   timestamp: 509006656895 lost: false event_id: 185 msg: [050]:
00000000  00000000  00000000  00000000
   timestamp: 509006657023 lost: false event_id: 185 msg: [060]:
00000000  00000000  00000000  00000000
   timestamp: 509006657180 lost: false event_id: 185 msg: [070]:
00000000  00000000  00000000  00000000
   timestamp: 509006659839 lost: false event_id: 185 msg: CMDIF dbase
from IRON: active_dbase_slots = 0x00000000
   timestamp: 509006667391 lost: false event_id: 185 msg: GVMI=0x0000
hw_toc context:
   timestamp: 509006667647 lost: false event_id: 185 msg: [000]:
00000000  00000000  00000000  fffff000
   timestamp: 509006667775 lost: false event_id: 185 msg: [010]:
00000000  00000000  00000000  80d00000
...
...

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:18 -07:00
Moshe Shemesh 1e34f3efd4 net/mlx5: Create FW devlink_health_reporter
Create mlx5_devlink_health_reporter for FW reporter. The FW reporter
implements devlink_health_reporter diagnose callback.

The fw reporter diagnose command can be triggered any time by the user
to check current fw status.
In healthy status, it will return clear syndrome. Otherwise it will
return the syndrome and description of the error type.

Command example and output on healthy status:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 0

Command example and output on non healthy status:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 8 Description: unrecoverable hardware error

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:18 -07:00
Feras Daoud 3e5b72ac2f net/mlx5: Issue SW reset on FW assert
If a FW assert is considered fatal, indicated by a new bit in the health
buffer, reset the FW. After the reset go through the normal recovery
flow. Only one PF needs to issue the reset, so an attempt is made to
prevent the 2nd function from also issuing the reset.
It's not an error if that happens, it just slows recovery.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:18 -07:00
Feras Daoud 1ef6f1a17e net/mlx5: Control CR-space access by different PFs
Since the FW can be shared between different PFs/VFs it is common
that more than one health poll will detected a failure, this can
lead to multiple resets which are unneeded.

The solution is to use a FW locking mechanism using semaphore space
to provide a way to allow only one device to collect the cr-dump and
to issue a sw-reset.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:17 -07:00
Feras Daoud 63cbc552ee net/mlx5: Handle SW reset of FW in error flow
New mlx5 adapters allow the driver to reset the FW in the event of an
error, this action called "SW Reset". When an SW reset is issued on any
PF all PFs enter reset state which is a recoverable condition. The
existing recovery flow was designed to allow the recovery of a VF after
a PF driver reload. This patch adds the sw reset to the NIC states
as a preparation for sw reset handling.

When a software reset is issued the following occurs:
1. The NIC interface mode is set to 7 while the reset is in progress.
2. Once the reset completes the NIC interface mode is set to 1.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:17 -07:00
Alex Vesker 8b9d8baae1 net/mlx5: Add Crdump support
Crdump allows the driver to retrieve a dump of the FW PCI crspace.
This is useful in case of catastrophic issues which may require FW
reset. The crspace dump can be used for later debug.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:17 -07:00
Alex Vesker b25bbc2f24 net/mlx5: Add Vendor Specific Capability access gateway
The Vendor Specific Capability (VSC) is used to activate a gateway
interfacing with the device. The gateway is used to read or write
device configurations, which are organized in different domains (spaces).
A configuration access may result in multiple actions, reads, writes.

Example usages are accessing the Crspace domain to read the crspace or
locking a device semaphore using the Semaphore domain.

The configuration access use pci_cfg_access to prevent parallel access to
the VSC space by the driver and userspace calls.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:17 -07:00
Eran Ben Elisha 1f28d7768f net/mlx5: Move all devlink related functions calls to devlink.c
Centralize all devlink related callbacks in one file.
In the downstream patch, some more functionality will be added, this
patch is preparing the driver infrastructure for it.

Currently, move devlink un/register functions calls into this file.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:16 -07:00
Saeed Mahameed 00091c0da1 Documentation: net: mlx5: Add mlx5 initial documentation
Add initial documentation for mlx5 driver.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:16 -07:00
Aya Levin e44ef4e451 devlink: Hang reporter's dump method on a dumpit cb
The devlink health reporter provides a dump method on an error. Dump
may contain a large amount of data, in this case doit cb isn't sufficient.
This is because the user side is blocking and doesn't allow draining of
the socket until the socket runs out of buffers. Using dumpit cb
is the correct way to go.
Please note that thankfully the dump op is not yet implemented in any
driver and therefore this change is not breaking userspace.

Fixes: 35455e23e6 ("devlink: Add health dump {get,clear} commands")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-13 13:23:16 -07:00
Eric Dumazet a842fe1425 tcp: add optional per socket transmit delay
Adding delays to TCP flows is crucial for studying behavior
of TCP stacks, including congestion control modules.

Linux offers netem module, but it has unpractical constraints :
- Need root access to change qdisc
- Hard to setup on egress if combined with non trivial qdisc like FQ
- Single delay for all flows.

EDT (Earliest Departure Time) adoption in TCP stack allows us
to enable a per socket delay at a very small cost.

Networking tools can now establish thousands of flows, each of them
with a different delay, simulating real world conditions.

This requires FQ packet scheduler or a EDT-enabled NIC.

This patchs adds TCP_TX_DELAY socket option, to set a delay in
usec units.

  unsigned int tx_delay = 10000; /* 10 msec */

  setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));

Note that FQ packet scheduler limits might need some tweaking :

man tc-fq

PARAMETERS
   limit
       Hard  limit  on  the  real  queue  size. When this limit is
       reached, new packets are dropped. If the value is  lowered,
       packets  are  dropped so that the new limit is met. Default
       is 10000 packets.

   flow_limit
       Hard limit on the maximum  number  of  packets  queued  per
       flow.  Default value is 100.

Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
so packets would be dropped if any of the previous limit is hit.

Use of a jump label makes this support runtime-free, for hosts
never using the option.

Also note that TSQ (TCP Small Queues) limits are slightly changed
with this patch : we need to account that skbs artificially delayed
wont stop us providind more skbs to feed the pipe (netem uses
skb_orphan_partial() for this purpose, but FQ can not use this trick)

Because of that, using big delays might very well trigger
old bugs in TSO auto defer logic and/or sndbuf limited detection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 13:05:43 -07:00
David S. Miller e0ffbd37f3 Merge branch 'ena-dynamic-queue-sizes'
Sameeh Jubran says:

====================
Support for dynamic queue size changes

This patchset introduces the following:
* add new admin command for supporting different queue size for Tx/Rx
* add support for Tx/Rx queues size modification through ethtool
* allow queues allocation backoff when low on memory
* update driver version

Difference from v2:
* Dropped superfluous range checks which are already done in ethtool. [patch 5/7]
* Dropped inline keyword from function. [patch 4/7]
* Added a new patch which drops inline keyword all *.c files. [patch 6/7]

Difference from v1:
* Changed ena_update_queue_sizes() signature to use u32 instead of int
  type for the size arguments. [patch 5/7]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:45 -07:00
Sameeh Jubran dbbc6e6877 net: ena: update driver version from 2.0.3 to 2.1.0
Update driver version to match device specification.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:45 -07:00
Sameeh Jubran c2b5420447 net: ena: remove inline keyword from functions in *.c
Let the compiler decide if the function should be inline in *.c files

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:45 -07:00
Sameeh Jubran eece4d2ab9 net: ena: add ethtool function for changing io queue sizes
Implement the set_ringparam() function of the ethtool interface
to enable the changing of io queue sizes.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:45 -07:00
Sameeh Jubran 13ca32a69e net: ena: allow queue allocation backoff when low on memory
If there is not enough memory to allocate io queues the driver will
try to allocate smaller queues.

The backoff algorithm is as follows:

1. Try to allocate TX and RX and if successful.
1.1. return success

2. Divide by 2 the size of the larger of RX and TX queues (or both if their size is the same).

3. If TX or RX is smaller than 256
3.1. return failure.
4. else
4.1. go back to 1.

Also change the tx_queue_size, rx_queue_size field names in struct
adapter to requested_tx_queue_size and requested_rx_queue_size, and
use RX and TX queue 0 for actual queue sizes.
Explanation:
The original fields were useless as they were simply used to assign
values once from them to each of the queues in the adapter in ena_probe().
They could simply be deleted. However now that we have a backoff
feature, we have use for them. In case of backoff there is a difference
between the requested queue sizes and the actual sizes. Therefore there
is a need to save the requested queue size for future retries of queue
allocation (for example if allocation failed and then ifdown + ifup was
called we want to start the allocation from the original requested size of
the queues).

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:44 -07:00
Sameeh Jubran 9f9ae3f98b net: ena: make ethtool show correct current and max queue sizes
Currently ethtool -g shows the same size for current and max queue
sizes.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:23:44 -07:00