Commit Graph

843490 Commits

Author SHA1 Message Date
Mahesh Bandewar 4de83b88c6 loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:34:46 -07:00
Hariprasad Kelam 8909783cb5 net: ethernet: broadcom: bcm63xx_enet: Remove unneeded memset
Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
__GFP_ZERO flag

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:29:19 -07:00
David S. Miller fec3b9ec47 Merge branch 'net-netsec-Add-XDP-Support'
Ilias Apalodimas says:

====================
net: netsec: Add XDP Support

This is a respin of https://www.spinics.net/lists/netdev/msg526066.html
Since page_pool API fixes are merged into net-next we can now safely use
it's DMA mapping capabilities.

First patch changes the buffer allocation from napi/netdev_alloc_frag()
to page_pool API. Although this will lead to slightly reduced performance
(on raw packet drops only) we can use the API for XDP buffer recycling.
Another side effect is a slight increase in memory usage, due to using a
single page per packet.

The second patch adds XDP support on the driver.
There's a bunch of interesting options that come up due to the single
Tx queue.
Locking is needed(to avoid messing up the Tx queues since ndo_xdp_xmit
and the normal stack can co-exist). We also need to track down the
'buffer type' for TX and properly free or recycle the packet depending
on it's nature.

Changes since RFC:
- Bug fixes from Jesper and Maciej
- Added page pool API to retrieve the DMA direction

Changes since v1:
- Use page_pool_free correctly if xdp_rxq_info_reg() failed
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas ba2b232108 net: netsec: add XDP support
The interface only supports 1 Tx queue so locking is introduced on
the Tx queue if XDP is enabled to make sure .ndo_start_xmit and
.ndo_xdp_xmit won't corrupt Tx ring

- Performance (SMMU off)

Benchmark   XDP_SKB     XDP_DRV
xdp1        291kpps     344kpps
rxdrop      282kpps     342kpps

- Performance (SMMU on)
Benchmark   XDP_SKB     XDP_DRV
xdp1        167kpps     324kpps
rxdrop      164kpps     323kpps

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas bb005f2a70 net: page_pool: add helper function for retrieving dma direction
Since the dma direction is stored in page pool params, offer an API
helper for driver that choose not to keep track of it locally

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Ilias Apalodimas 5c67bf0ec4 net: netsec: Use page_pool API
Use page_pool and it's DMA mapping capabilities for Rx buffers instead
of netdev/napi_alloc_frag()

Although this will result in a slight performance penalty on small sized
packets (~10%) the use of the API will allow to easily add XDP support.
The penalty won't be visible in network testing i.e ipef/netperf etc, it
only happens during raw packet drops.
Furthermore we intend to add recycling capabilities on the API
in the future. Once the recycling is added the performance penalty will
go away.
The only 'real' penalty is the slightly increased memory usage, since we
now allocate a page per packet instead of the amount of bytes we need +
skb metadata (difference is roughly 2kb per packet).
With a minimum of 4BG of RAM on the only SoC that has this NIC the
extra memory usage is negligible (a bit more on 64K pages)

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:27:08 -07:00
Roman Mashak a8488b7026 tc-testing: added tdc tests for prio qdisc
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:20:43 -07:00
David S. Miller c8881faf6e Merge branch 'mirred-batch-fixes'
Roman Mashak says:

====================
Fix batched event generation for mirred action

When adding or deleting a batch of entries, the kernel sends upto
TCA_ACT_MAX_PRIO entries in an event to user space. However it does not
consider that the action sizes may vary and require different skb sizes.

For example :

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action mirred
for i in `seq 1 $1`;
do
   cmd="action mirred egress redirect dev lo index $i "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of mirred action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant test cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:04 -07:00
Roman Mashak 5d15a8ec2a tc-testing: updated mirred action tests with batch create/delete
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:04 -07:00
Roman Mashak b84b2d4e38 net sched: update mirred action for batched events operations
Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:18:03 -07:00
Jason A. Donenfeld 362b87f5b1 netlink: use 48 byte ctx instead of 6 signed longs for callback
People are inclined to stuff random things into cb->args[n] because it
looks like an array of integers. Sometimes people even put u64s in there
with comments noting that a certain member takes up two slots. The
horror! Really this should mirror the usage of skb->cb, which are just
48 opaque bytes suitable for casting a struct. Then people can create
their usual casting macros for accessing strongly typed members of a
struct.

As a plus, this also gives us the same amount of space on 32bit and 64bit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:12:10 -07:00
Jon Maloy 53962bcea9 tipc: embed jiffies in macro TIPC_BC_RETR_LIM
The macro TIPC_BC_RETR_LIM is always used in combination with 'jiffies',
so we can just as well perform the addition in the macro itself. This
way, we get a few shorter code lines and one less line break.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:10:57 -07:00
Eiichi Tsukata 00dc3307c0 net/ipv6: Fix misuse of proc_dointvec "flowlabel_reflect"
/proc/sys/net/ipv6/flowlabel_reflect assumes written value to be in the
range of 0 to 3. Use proc_dointvec_minmax instead of proc_dointvec.

Fixes: 323a53c412 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:04:48 -07:00
Yunsheng Lin 27ba4059e0 net: link_watch: prevent starvation when processing linkwatch wq
When user has configured a large number of virtual netdev, such
as 4K vlans, the carrier on/off operation of the real netdev
will also cause it's virtual netdev's link state to be processed
in linkwatch. Currently, the processing is done in a work queue,
which may cause rtnl locking starvation problem and worker
starvation problem for other work queue, such as irqfd_inject wq.

This patch releases the cpu when link watch worker has processed
a fixed number of netdev' link watch event, and schedule the
work queue again when there is still link watch event remaining.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 19:02:47 -07:00
David S. Miller 0d0bcacc54 Merge branch 'mlxsw-PTP-timestamping-support'
Ido Schimmel says:

====================
mlxsw: PTP timestamping support

This is the second patchset adding PTP support in mlxsw. Next patchset
will add PTP shapers which are required to maintain accuracy under rates
lower than 40Gb/s, while subsequent patchsets will add tracepoints and
selftests.

Petr says:

This patch set introduces support for retrieving and processing hardware
timestamps for PTP packets.

The way PTP timestamping works on Spectrum-1 is that there are two queues
associated with each front panel port. When a packet is timestamped, the
timestamp is put to one of the queues: timestamps for transmitted packets
to one and for received packets to the other. Activity on these queues is
signaled through the events PTP_ING_FIFO and PTP_EGR_FIFO.

Packets themselves arrive through two traps: PTP0 and PTP1. It is possible
to configure which PTP messages should be trapped under which PTP trap. On
Spectrum systems, mlxsw will use PTP0 for event messages (which need
timestamping), and PTP1 for general messages (which do not).

There are therefore four relevant traps: receive of PTP event resp. general
message, and receive of timestamp for a transmitted resp. received PTP
packet. The obvious point where to put the new logic is a custom listener
to the mentioned traps.

Besides handling ingress traffic (be in packets or timestamps), the driver
also needs to handle timestamping of transmitted packets. One option would
be to invoke the relevant logic from mlxsw_core_ptp_transmitted(). However
on Spectrum-2, the timestamps are actually delivered through the completion
queue, and for that reason this patchset opts to invoke the logic from the
PCI code, via core and the driver, to a chip-specific operation. That way
the invocation will be done in a place where a Spectrum-2 implementation
will have an opportunity to extract the timestamp.

As indicated above, the PTP FIFO signaling happens independently from
packet delivery. A packet corresponding to any given timestamp could be
delivered sooner or later than the timestamp itself. Additionally, the
queues are only four elements deep, and it is therefore possible that the
timestamp for a delivered packet never arrives at all. Similarly a PTP
packet might be dropped due to CPU traffic pressure, and never be delivered
even if the corresponding timestamp was.

The driver thus needs to hold a cache of as-yet-unmatched SKBs and
timestamps. The first piece to arrive (be it timestamp or SKB) is put to
this cache. When the other piece arrives, the timestamp is attached to the
SKB and that is passed on. A delayed work is run at regular intervals to
prune the old unmatched entries.

As mentioned above, the mechanism for timestamp delivery changes on
Spectrum-2, where timestamps are part of completion queue elements, and all
packets are timestamped. All this bookkeeping is therefore unnecessary on
Spectrum-2. For this reason, this patchset spends some time introducing
Spectrum-1 specific artifacts such as a possibility to register a given
trap only on Spectrum-1.

Patches #1-#4 describe new registers.

Patches #5 and #6 introduce the possibility to register certain traps
only on some systems. The list of Spectrum-1 specific traps is left empty
at this point.

Patch #7 hooks into packet receive path by registering PTP traps
and appropriate handlers (that however do nothing of substance yet).

Patch #8 adds a helper to allow storing custom data to SKB->cb.

Patch #9 adds a call into the PCI completion queue handler that invokes,
via core and spectrum code, a PTP transmit handler. (Which also does not do
anything interesting yet.)

Patch #10 introduces code to invoke PTP initialization and adds data types
for the cache of unmatched entries.

Patches #11 and #12 implement the timestamping itself. In #11, the PHC
spin_locks are converted to _bh variants, because unlike normal PHC path,
which runs in process context, timestamp processing runs as soft interrupt.
Then #12 introduces the code for saving and retrieval of unmatched entries,
invokes PTP classifier to identify packets of interest, registers timestamp
FIFO events, and handles decoding and attaching timestamps to packets.

Patch #13 introduces a garbage collector for left-behind entries that have
not been matched for about a second.

In patch #14, PTP message types are configured to arrive as PTP0
(events) or PTP1 (everything else) as appropriate. At this point, the PTP
packets start arriving through the traps, but because PTP is disabled and
there is no way to enable it yet, they are always just passed to the usual
receive path right away.

Finally patches #15 and #16 add the plumbing to actually make it possible
to enable this code through SIOCSHWTSTAMP ioctl, and to advertise the
hardware timestamping capabilities through ethtool.

v2:
- Patch #12:
    - In mlxsw_sp1_ptp_fifo_event_func(), post-increment when iterating over PTP
      FIFO records.
- Patch #14:
    - Change namespace of message type enumerators from MLXSW_ to MLXSW_SP_.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:35 -07:00
Petr Machata 87ee07f8e2 mlxsw: spectrum: PTP: Support ethtool get_ts_info
The get_ts_info callback is used for obtaining information about
timestamping capabilities of a network device. On Spectrum-1, implement
it to advertise the PHC and the capability to do HW timestamping, and
the supported RX and TX filters.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:35 -07:00
Petr Machata 8748642751 mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls
The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port.
Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find
which PTP messages need to be timestamped and configure MTPPPC
accordingly.

The SIOCGHWTSTAMP ioctl is getter for the current configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata a773c76cb8 mlxsw: spectrum: PTP: Configure PTP traps and FIFO events
Configure MTPTPT to set which message types should arrive under which
PTP trap, and MOGCR to clear the timestamp queue after its contents are
reported through PTP_ING_FIFO or PTP_EGR_FIFO.

With this configuration, PTP packets start arriving through the PTP
traps. However since timestamping is disabled by default and there is
currently no way to enable it, they will not be timestamped.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 5d23e41597 mlxsw: spectrum: PTP: Garbage-collect unmatched entries
On Spectrum-1, timestamped PTP packets and the corresponding timestamps
need to be kept in caches until both are available, at which point they are
matched up and packets forwarded as appropriate. However, not all packets
will ever see their timestamp, and not all timestamps will ever see their
packet. It is therefore necessary to dispose of such abandoned entries.

To that end, introduce a garbage collector to collect entries that have
not had their counterpart turn up within about a second. The GC
maintains a monotonously-increasing value of GC cycle. Every entry that
is put to the hash table is annotated with the GC cycle at which it
should be collected. When the GC runs, it walks the hash table, and
collects the objects according to their GC cycle annotation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata d92e4e6e33 mlxsw: spectrum: PTP: Support timestamping on Spectrum-1
On Spectrum-1, timestamps arrive through a pair of dedicated events:
MLXSW_TRAP_ID_PTP_ING_FIFO and _EGR_FIFO. The payload delivered with
those traps is contents of the timestamp FIFO at a given port in a given
direction. Add a Spectrum-1-specific handler for these two events which
decodes the timestamps and forwards them to the PTP module.

Add a function that parses a packet, dispatching to ptp_classify_raw(),
and decodes PTP message type, domain number, and sequence ID. Add a new
mlxsw dependency on the PTP classifier.

Add helpers that can store and retrieve unmatched timestamps and SKBs to
the hash table added in a preceding patch.

Add the matching code itself: upon arrival of a timestamp or a packet,
look up the corresponding unmatched entry, and match it up. If there is
none, add a new unmatched entry. This logic is the same on ingress as on
egress.

Packets and timestamps that never matched need to be eventually disposed
of. A garbage collector added in a follow-up patch will take care of
that. Since currently all this code is turned off, no crud will
accumulate in the hash table.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 89e602ee6e mlxsw: spectrum: PTP: Disable BH when working with PHC
Up until now, the PTP hardware clock code was only invoked in the process
context (SYS_clock_adjtime -> do_clock_adjtime -> k_clock::clock_adj ->
pc_clock_adjtime -> posix_clock_operations::clock_adjtime ->
ptp_clock_info::adjtime -> mlxsw_spectrum).

In order to enable HW timestamping, which is tied into trap handling, it
will be necessary to take the clock lock from the PCI queue handler
tasklets as well.

Therefore use the _bh variants when handling the clock lock. Incidentally,
Documentation/ptp/ptp.txt recommends _irqsave variants, but that's
unnecessarily strong for our needs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 810256cec1 mlxsw: spectrum: PTP: Add PTP initialization / finalization
Add two ptp_ops: init and fini, to initialize and finalize the PTP
subsystem. Call as appropriate from mlxsw_sp_init() and _fini().

Lay the groundwork for Spectrum-1 support. On Spectrum-1, the received
timestamped packets and their corresponding timestamps arrive
independently, and need to be matched up. Introduce the related data types
and add to struct mlxsw_sp_ptp_state the hash table that will keep the
unmatched entries.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 0714256c3d mlxsw: pci: PTP: Hook into packet transmit path
On Spectrum-1, timestamps are delivered separately from the packets, and
need to paired up. Therefore, at some point after mlxsw_sp_port_xmit()
is invoked, it is necessary to involve the chip-specific driver code to
allow it to do the necessary bookkeeping and matching.

On Spectrum-2, timestamps are delivered in CQE. For that reason,
position the point of driver involvement into mlxsw_pci_cqe_sdq_handle()
to make it hopefully easier to extend for Spectrum-2 in the future.

To tell the driver what port the packet was sent on, keep tx_info
in SKB control buffer.

Introduce a new driver core interface mlxsw_core_ptp_transmitted(), a
driver callback ptp_transmitted, and a PTP op transmitted. The callee is
responsible for taking care of releasing the SKB passed to the new
interfaces, and correspondingly have the new stub callbacks just call
dev_kfree_skb_any().

Follow-up patches will introduce the actual content into
mlxsw_sp1_ptp_transmitted() in particular.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata d7cd206dbf mlxsw: core: Add support for using SKB control buffer
The SKB control buffer is useful (and used) for bookkeeping of information
related to that SKB. Add helpers so that the mlxsw driver(s) can safely use
the buffer as well. The structure is currently empty, individual users will
add members to it as necessary.

Note that SKB allocation functions already clear the buffer, so the cleanup
is only necessary when ndo_start_xmit is called.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata aed4b57211 mlxsw: spectrum: PTP: Hook into packet receive path
When configured, the Spectrum hardware can recognize PTP packets and
trap them to the CPU using dedicated traps, PTP0 and PTP1.

One reason to get PTP packets under dedicated traps is to have a
separate policer suitable for the amount of PTP traffic expected when
switch is operated as a boundary clock. For this, add two new trap
groups, MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP0 and _PTP1, and associate the
two PTP traps with these two groups.

In the driver, specifically for Spectrum-1, event PTP packets will need
to be paired up with their timestamps. Those arrive through a different
set of traps, added later in the patch set. To support this future use,
introduce a new PTP op, ptp_receive.

It is possible to configure which PTP messages should be trapped under
which PTP trap. On Spectrum systems, we will use PTP0 for event
packets (which need timestamping), and PTP1 for control packets (which
do not). Thus configure PTP0 trap with a custom callback that defers to
the ptp_receive op.

Additionally, L2 PTP packets are actually trapped through the LLDP trap,
not through any of the PTP traps. So treat the LLDP trap the same way as
the PTP0 trap. Unlike PTP traps, which are currently still disabled,
LLDP trap is active. Correspondingly, have all the implementations of
the ptp_receive op return true, which the handler treats as a signal to
forward the packet immediately.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata dadbc6bc09 mlxsw: spectrum: Add support for traps specific to Spectrum-1
On Spectrum-1, timestamps for PTP packets are delivered through queues
of ingress and egress timestamps. There are two event traps
corresponding to activity on each of those queues. This mechanism is
absent on Spectrum-2, and therefore the traps should only be registered
on Spectrum-1.

Carry a chip-specific listener array in mlxsw_sp->listeners and
listeners_count. Register listeners from that array in
mlxsw_sp_traps_init(). Add a new listener array for Spectrum-1 traps and
configure the newly-added mlxsw_sp->listeners with this array.

The listener array is empty for now, the events will be added in a later
patch.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 4b6b91ed2d mlxsw: spectrum: Extract a helper for trap registration
On Spectrum-1, timestamps for PTP packets are delivered through queues
of ingress and egress timestamps. There are two event traps
corresponding to activity on each of those queues. This mechanism is
absent on Spectrum-2, and therefore the traps should only be registered
on Spectrum-1.

Extract out of mlxsw_sp_traps_init() a generic helper,
mlxsw_sp_traps_register(), and likewise with _unregister(). The new helpers
will later be called with Spectrum-1-specific traps.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 41ce78b92e mlxsw: reg: Add Monitoring Global Configuration Register
This register serves to configure global parameters of certain
monitoring operations. The following patches will use it to configure
that when PTP timestamps are delivered through the PTP FIFO traps, the
FIFO in question is cleared as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 98b9028ea5 mlxsw: reg: Add Time Precision Packet Timestamping Reading
The MTPPTR is used for reading the per port PTP timestamp FIFO.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:34 -07:00
Petr Machata 4dfecb6570 mlxsw: reg: Add Monitoring Precision Time Protocol Trap Register
This register is used for configuring under which trap to deliver PTP
packets depending on type of the packet.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:33 -07:00
Petr Machata da28e87847 mlxsw: reg: Add Monitoring Time Precision Packet Port Configuration Register
This register serves for configuration of which PTP messages should be
timestamped. This is a global configuration, despite the register name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 18:58:33 -07:00
Bodong Wang 4a3929b223 net/mlx5: E-Switch, Handle UC address change in switchdev mode
When NVME device emulation mode is enabled, more than one PFs use the
same physical port. In this case, MPFS is required to program L2
addresses.

It used to rely on netdev set_rx_mode in switchdev mode, but driver
later changed to not create netdev for eswitch manager once in
switchdev mode. So, UC address event should be handled.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang 411ec9e0b4 net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop
When ECPF is the eswitch manager, host PF is treated like other VFs.
Driver should do the same for inline mode and vlan pop.

Add new iterators to include host PF if ECPF is the eswitch manager.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang db68cc569e net/mlx5: E-Switch, Use iterator for vlan and min-inline setups
Use the defined iterators to traversal VF reps/vport. Also, rely on
num of VFs rather than the counter of enabled vports as PF will also
be enabled from ECPF side, and the counter will be different from
num of VFs.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang 16fff98a7e net/mlx5: E-Switch, Reg/unreg function changed event at correct stage
When driver is doing eswitch mode change, it's critical to keep number
of enabled VFs unchanged. However, it can be changed on the fly once
function changed event is registered.

To remove this uncertainty, function changed event should not be
registered before all setups, and first be unregistered before all
cleanups. Wrap this functionality together with vport event handler.

Fixes: 61fc880839e6 ("net/mlx5: E-Switch, Handle representors creation in handler context")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang 062f4bf4aa net/mlx5: E-Switch, Consolidate eswitch function number of VFs
Enabled number of VFs is key for eswich manager to do flow steering
initialization and vport configurations. However, the number of
enabled VFs may come from two sources as below.

PF: num of VFs is provided by enabled SR-IOV of itself.
ECPF: num of VFs is provided by enabled SR-IOV from its peer PF. And
      SR-IOV can't be enabled from ECPF itself.

Current driver handles the two cases in different stages and passing
the number of enabled VFs among a large scope of internal functions.
It is usually hard to find out where is the real number of VFs from
due to layers of argument pass-in.

This patch consolidated that number from the entry point of doing
eswitch setup, and maintained a copy so that eswitch functions can
refer to it directly.

Eswitch driver shall always use this number when referring to enabled
number of VFs, don't use other numbers such as from SR-IOV.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:31 -07:00
Bodong Wang f6455de0b0 net/mlx5: E-Switch, Refactor eswitch SR-IOV interface
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang e1d974d03e net/mlx5: Handle host PF vport mac/guid for ECPF
When ECPF is eswitch manager, it has the privilege to query and
configure the mac and node guid of host PF.

While vport number of host PF is 0, the vport command should be
issued with other_vport set in this case as the cmd is issued by
ECPF vport(0xfffe).

Add a specific function to query own vport mac. Low level functions
are used by vport manager to query/modify any vport mac and node guid.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang 5f5d2536be net/mlx5: E-Switch, Use correct flags when configuring vlan
Before the offending commit, vlan will be configured if either vlan
or qos is set. After the change with new set flags, function callers
should provide flags accordingly.

Fixes: e33dfe316c ("net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit d886aba677 net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs
While enabling SR-IOV, PCI core already checks that if SR-IOV is already
enabled, it returns failure error code.
Hence, remove such duplicate check from mlx5_core driver.

While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in
reverse order of mlx5_device_enable_sriov().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang 5ccf2770e8 net/mlx5: Don't handle VF func change if host PF is disabled
When ECPF eswitch manager is at offloads mode, it monitors functions
changed event from host PF side and acts according to the number of
VFs enabled/disabled.

As ECPF and host PF work in two independent hosts, it's possible that
host PF OS reboots but ECPF system is still kept on and continues
monitoring events from host PF. When kernel from host PF side is
booting, PCI iov driver does sriov_init and compute_max_vf_buses by
iterating over all valid num of VFs. This triggers FLR and generates
functions changed events, even though host PF HCA is not enabled at
this time. However, ECPF is not aware of this information, and still
handles these events as usual. ECPF system will see massive number of
reps are created, but destroyed immediately once creation finished.

To eliminate this noise, a bit is added to host parameter context to
indicate host PF is disabled. ECPF will not handle the VF changed
event if this bit is set.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit 7e26dac281 net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices
As mlx5_get_next_phys_dev is used only for PCI PF devices use case,
limit it to search only for PCI devices.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit d22663edac net/mlx5: Move pci status reg access mutex to mlx5_pci_init
mlx5_pci_init() performs pci specific initialization of the
mlx5_core_dev struct.
Hence move pci_status_mutex to pci initialization routine
mlx5_pci_init().
This allows reusing mlx5_mdev_init() to non PCI devices.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Huy Nguyen 386e75af99 net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5
device types.

mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep
mlx5_coredev_type in mlx5_core_dev structure.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang b8ca123860 RDMA/mlx5: Cleanup rep when doing unload
When an IB rep is loaded, netdev for the same vport is saved for later
reference. However, it's not cleaned up when doing unload. For ECPF,
kernel crashes when driver is referring to the already removed netdev.

Following steps lead to a shown call trace:
1. Create n VFs from host PF
2. Distroy the VFs
3. Run "rdma link" from ARM

Call trace:
  mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib]
  mlx5_query_port_roce+0x268/0x558 [mlx5_ib]
  mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib]
  ib_query_port+0x9c/0xfc [ib_core]
  fill_port_info+0x74/0x28c [ib_core]
  nldev_port_get_doit+0x1a8/0x1e8 [ib_core]
  rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core]
  rdma_nl_rcv+0xe8/0x144 [ib_core]
  netlink_unicast+0x184/0x214
  netlink_sendmsg+0x288/0x354
  sock_sendmsg+0x18/0x2c
  __sys_sendto+0xbc/0x138
  __arm64_sys_sendto+0x28/0x34
  el0_svc_common+0xb0/0x100
  el0_svc_handler+0x6c/0x84
  el0_svc+0x8/0xc

Cleanup the rep and netdev reference when unloading IB rep.

Fixes: 26628e2d58 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang 2f69e591e4 {IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping
In the single IB device mode, the mapping between vport number and
rep relies on a counter. However for dynamic vport allocation, it is
desired to keep consistent map of eswitch vport and IB port.

Hence, simplify code to remove the free running counter and instead
use the available vport index during load/unload sequence from the
eswitch.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Bodong Wang d6518db278 net/mlx5: E-Switch, Use vport index when init rep
Driver is referring to the array index when doing rep initialization,
using vport is confusing as it's normally interpreted as vport number.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Shay Agroskin a82e0b5bda net/mlx5: Added MCQI and MCQS registers' description to ifc
Given a fw component index, the MCQI register allows us to query
this component's information (e.g. its version and capabilities).

Given a fw component index, the MCQS register allows us to query the
status of a fw component, including its type and state
(e.g. PRESET/IN_USE).
It can be used to find the index of a component of a specific type, by
sequentially increasing the component index, and querying each time the
type of the returned component.
If max component index is reached, 'last_index_flag' is set by the HCA.

These registers' description was added to query the running and pending
fw version of the HCA.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Parav Pandit 1759d322f4 net/mlx5: Add hardware definitions for sub functions
Update mlx5 device interface data structures for:
1. New command definitions for allocating, deallocating SF
2. Query SF partition
3. Eswitch SF fields
4. HCA CAP SF fields
5. Extend Eswitch functions command for SF

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01 16:40:30 -07:00
Daniel T. Lee 6e32a74a6f samples: pktgen: allow to specify destination port
Currently, kernel pktgen has the feature to specify udp destination port
for sending packet. (e.g. pgset "udp_dst_min 9")

But on samples, each of the scripts doesn't have any option to achieve this.

This commit adds the DST_PORT option to specify the target port(s) in the script.

    -p : ($DST_PORT)  destination PORT range (e.g. 433-444) is also allowed

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01 11:02:20 -07:00