Commit Graph

1202 Commits

Author SHA1 Message Date
Jakub Kicinski f4e63525ee net: bpf: rename ndo_xdp to ndo_bpf
ndo_xdp is a control path callback for setting up XDP in the
driver.  We can reuse it for other forms of communication
between the eBPF stack and the drivers.  Rename the callback
and associated structures and definitions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 22:26:18 +09:00
David S. Miller 2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Elena Reshetova 17ac99b2b8 drivers, net, mlx4: convert mlx4_srq.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_srq.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Elena Reshetova 0068895ff8 drivers, net, mlx4: convert mlx4_qp.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_qp.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Elena Reshetova ff61b5e3f0 drivers, net, mlx4: convert mlx4_cq.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_cq.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Tariq Toukan f025fd6061 net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion
In XDP_TX, some fields in tx_info and tx_desc are constants across
all entries of the different XDP_TX rings.
Assign values to these fields on ring creation time, rather than in
data-path.

Patchset performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
------------------------------
Before    | After     | Gain |
13.7 Mpps | 14.0 Mpps | %2.2 |
------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan f6f0aa9741 net/mlx4_en: Obsolete call to generic write_desc in XDP xmit flow
Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit.
Use the relevant parts inline instead.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan 5dad61b838 net/mlx4_en: Replace netdev parameter with priv in XDP xmit function
The struct net_device parameter was passed only to extract
struct mlx4_en_priv out of it.
Here we pass the priv parameter directly.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Inbar Karmy 80a8dc75ee net/mlx4_en: Increase number of default RX rings
Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy b8d394367a net/mlx4_en: Limit the number of RX rings
Limit the number of RX rings by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy 7e1dc5e926 net/mlx4_en: Limit the number of TX rings
Limit the number of TX rings per UP by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Tariq Toukan 7ba5e7bd64 net/mlx4_en: Use __force to fix a sparse warning in TX datapath
In TX data-path, we intentionally do not byte-swap, as documented
in code and in the cited commit log.
This fixes sparse warning:
en_tx.c:720:23: warning: incorrect type in argument 1 (different base types)
en_tx.c:720:23:    expected unsigned int [unsigned] [usertype] <noident>
en_tx.c:720:23:    got restricted __be32 [usertype] doorbell_qpn

Fixes: 492f5add4b ("net/mlx4_en: Doorbell is byteswapped in Little Endian archs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Tariq Toukan b71322d9db net/mlx4_core: Fix cast warning in fw.c
Fix the following SPARSE warning, in MLX4_GET() macro:
drivers/net/ethernet/mellanox/mlx4/fw.c:233:9: warning: cast to restricted __be64

Fixes: 17d5ceb6e4 ("net/mlx4_core: Fix unaligned accesses")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Tariq Toukan bb428a5c4d net/mlx4: Fix endianness issue in qp context params
Should take care of the endianness before assigning to params2 field.

Fixes: 53f33ae295 ("net/mlx4_core: Port aggregation upper layer interface")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Daniel Borkmann de8f3a83b0 bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 13:36:44 -07:00
Allen Pais d2a0012e76 drivers: net: mlx4: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the
    function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:44:42 -07:00
Linus Torvalds aae3dbb477 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

 3) Allow generic XDP to work on virtual devices, from John Fastabend.

 4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

 5) Remove UFO offloads from the tree, gave us little other than bugs.

 6) Remove the IPSEC flow cache, from Florian Westphal.

 7) Support ipv6 route offload in mlxsw driver.

 8) Support VF representors in bnxt_en, from Sathya Perla.

 9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

13) Add new jump instructions to eBPF, from Daniel Borkmann.

14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

15) Support XDP in tap driver, from Jason Wang.

16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

17) Add Huawei hinic ethernet driver.

18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
  i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
  i40e: avoid NVM acquire deadlock during NVM update
  drivers: net: xgene: Remove return statement from void function
  drivers: net: xgene: Configure tx/rx delay for ACPI
  drivers: net: xgene: Read tx/rx delay for ACPI
  rocker: fix kcalloc parameter order
  rds: Fix non-atomic operation on shared flag variable
  net: sched: don't use GFP_KERNEL under spin lock
  vhost_net: correctly check tx avail during rx busy polling
  net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
  rxrpc: Make service connection lookup always check for retry
  net: stmmac: Delete dead code for MDIO registration
  gianfar: Fix Tx flow control deactivation
  cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
  cxgb4: Fix pause frame count in t4_get_port_stats
  cxgb4: fix memory leak
  tun: rename generic_xdp to skb_xdp
  tun: reserve extra headroom only when XDP is set
  net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
  net: dsa: bcm_sf2: Advertise number of egress queues
  ...
2017-09-06 14:45:08 -07:00
Thomas Meyer 691223ec97 net/mlx4_core: Use ARRAY_SIZE macro
Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
yourself.
Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
/ARRAY_SIZE(\1)/g' and manual check/verification.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05 11:49:16 -07:00
Linus Torvalds aa9d4648c2 Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates as
   well)
 - rxe updates
 - various mlx updates
 - Set default roce type to RoCEv2
 - Several larger fixes for bnxt_re that were too big for -rc
 - Several larger fixes for qedr that, likewise, were too big for -rc
 - Misc core changes
 - Make the hns_roce driver compilable on arches other than aarch64 so we
   can more easily debug build issues related to it
 - Add rdma-netlink infrastructure updates
 - Add automatic IRQ affinity infrastructure
 - Add 32bit lid support
 - Lots of misc fixes across the subsystem from random people
 - Autoloading of RDMA netlink modules
 - PCI pool cleanups from Romain Perier
 - mlx5 driver feature additions and fixes
 - Hardware tag matchine feature
 - Fix sleeping in atomic when resolving roce ah
 - Add experimental ioctl interface as posted to linux-api@
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
 7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
 oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
 egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
 gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
 xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
 JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
 AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
 clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
 8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
 yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
 A4joXduz94ZD4w23leD1
 =2ntC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a big pull request.

  Of note is that I'm sending you the new ioctl API for the rdma
  subsystem. We put it up on linux-api@, but didn't get much response.
  The API is complex, but it solves two different problems in one go:

   1) The bi-directional nature of the RDMA file write calls, which
      created the security hole we had to handle (and for which the fix
      is now causing problems for systems in production, we were a bit
      over zealous in the fix and the ability to open a device, then
      fork, then create new queue pairs on the device and use them is
      broken).

   2) The bloat caused by different vendors implementing extensions to
      the base verbs API. Each vendor's hardware is slightly different,
      and the hardware might be suitable for one extension but not
      another.

      By the time we add generic extensions for all the different ways
      that the different hardware can offload things, the API becomes
      bloated. Things like our completion structs have started to exceed
      a cache line in size because of all the elements needed to support
      this. That in turn shows up heavily in the performance graphs with
      a noticable drop in performance on 100Gigabit links as our
      completion structs go from occupying one cache line to 1+.

      This API makes things like the completion structs modular in a
      very similar way to netlink so that your structs can only include
      the items needed for the offloads/features you are actually using
      on a given queue pair. In that way we support everything, but only
      use what we need, and our structs stay smaller.

  The ioctl API is better explained by the posting on linux-api@ than I
  can explain it here, so I'll just leave it at that.

  The rest of the pull request is typical stuff.

  Updates for 4.14 kernel merge window

   - Lots of hfi1 driver updates (mixed with a few qib and core updates
     as well)

   - rxe updates

   - various mlx updates

   - Set default roce type to RoCEv2

   - Several larger fixes for bnxt_re that were too big for -rc

   - Several larger fixes for qedr that, likewise, were too big for -rc

   - Misc core changes

   - Make the hns_roce driver compilable on arches other than aarch64 so
     we can more easily debug build issues related to it

   - Add rdma-netlink infrastructure updates

   - Add automatic IRQ affinity infrastructure

   - Add 32bit lid support

   - Lots of misc fixes across the subsystem from random people

   - Autoloading of RDMA netlink modules

   - PCI pool cleanups from Romain Perier

   - mlx5 driver feature additions and fixes

   - Hardware tag matchine feature

   - Fix sleeping in atomic when resolving roce ah

   - Add experimental ioctl interface as posted to linux-api@"

* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
  IB/core: Expose ioctl interface through experimental Kconfig
  IB/core: Assign root to all drivers
  IB/core: Add completion queue (cq) object actions
  IB/core: Add legacy driver's user-data
  IB/core: Export ioctl enum types to user-space
  IB/core: Explicitly destroy an object while keeping uobject
  IB/core: Add macros for declaring methods and attributes
  IB/core: Add uverbs merge trees functionality
  IB/core: Add DEVICE object and root tree structure
  IB/core: Declare an object instead of declaring only type attributes
  IB/core: Add new ioctl interface
  RDMA/vmw_pvrdma: Fix a signedness
  RDMA/vmw_pvrdma: Report network header type in WC
  IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
  IB/cm: Fix sleeping in atomic when RoCE is used
  IB/core: Add support to finalize objects in one transaction
  IB/core: Add a generic way to execute an operation on a uobject
  Documentation: Hardware tag matching
  IB/mlx5: Support IB_SRQT_TM
  net/mlx5: Add XRQ support
  ...
2017-09-03 17:49:17 -07:00
Colin Ian King 942e7e5fc1 net/mlx4_core: fix incorrect size allocation for dev->caps.spec_qps
The current allocation for dev->caps.spec_qps is for the size of the
pointer and not the size of the actual  mlx4_spec_qps structure.  Fix
this by using the correct size.   Also splint allocation over a few
lines to make it cppcheck clean on overly wide lines.

Detected by CoverityScan, CID#1455222 ("Wrong sizeof argument")

Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 10:57:10 -07:00
Colin Ian King 542deb88b0 net/mlx4_core: fix memory leaks on error exit path
The structures hca_param and func_cap are not being kfree'd on an error
exit path causing two memory leaks. Fix this by jumping to the existing
free memory error exit path.

Detected by CoverityScan, CID#1455219, CID#1455224 ("Resource Leak")

Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 10:57:10 -07:00
Moshe Shemesh be59960395 net/mlx4: Add user mac FW update support
Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 14:58:32 -07:00
Tariq Toukan a434f1fd2c net/mlx4_core: Fix misplaced brackets of sizeof
When changing the sizeof style usage in the patch cited below,
one brackets misplacement was introduced. Here we fix it.

Fixes: 31975e27a4 ("mlx4: sizeof style usage")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 14:58:32 -07:00
Leon Romanovsky 187782eb58 net/mlx4_core: Make explicit conversion to 64bit value
The "lg" variable is declared as int so in all places where this variable
is used as a shift operand, the output will be int too.

This produces the following smatch warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1532 mlx4_map_cmd() warn:
	should '1 << lg' be a 64 bit type?

Simple declaration of "1" to be "1ULL" will fix the issue.

Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 14:58:32 -07:00
Eran Ben Elisha c73c8b1e47 net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap
In order to avoid temporary large structs on the stack,
allocate them dynamically.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 14:58:32 -07:00
Bhumika Goyal 3f2c5fb2d8 net/mlx4_core: make mlx4_profile const
Make these const as they are only used in a copy operation.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 12:33:31 -07:00
Romain Perier b9f761aa78 mlx4: Replace PCI pool old API
The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.

Signed-off-by: Romain Perier <romain.perier@collabora.com>
Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 13:59:46 -04:00
David S. Miller e2a7c34fb2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
Huy Nguyen ca3d89a3eb net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
enable_4k_uar module parameter was added in patch cited below to
address the backward compatibility issue in SRIOV when the VM has
system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
implementation.

The above compatibility issue does not exist in the non SRIOV case.
In this patch, we always enable 4k uar implementation if SRIOV
is not enabled on mlx4's supported cards.

Fixes: 76e39ccf9c ("net/mlx4_core: Fix backward compatibility on VFs")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18 16:15:37 -07:00
Colin Ian King ba5c4dac03 net/mlx4: fix spelling mistake: "availible" -> "available"
Trivial fix to spelling mistakes in the mlx4 driver

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 14:23:45 -07:00
stephen hemminger 31975e27a4 mlx4: sizeof style usage
The kernel coding style is to treat sizeof as a function
(ie. with parenthesis) not as an operator.

Also use kcalloc and kmalloc_array

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:57 -07:00
Zhu Yanjun e084a8b89c mlx4: remove unnecessary pci_set_drvdata()
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not necessary to manually clear the
device driver data to NULL.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-15 16:46:48 -07:00
David S. Miller 3118e6e19d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.

The TCP conflict was a case of local variables in a function
being removed from both net and net-next.

In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 16:28:45 -07:00
Davide Caratti e718fe450e net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
checksum is successful, the driver subtracts the pseudo-header checksum
from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.

V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set

Reported-by: Shuang Li <shuali@redhat.com>
Fixes: f8c6455bb0 ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 17:59:57 -07:00
Jiri Pirko de4784ca03 net: sched: get rid of struct tc_to_netdev
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 09:42:37 -07:00
Jiri Pirko 38cf0426e5 net: sched: change return value of ndo_setup_tc for driver supporting mqprio only
Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 09:42:37 -07:00
Jiri Pirko 5fd9fc4e20 net: sched: push cls related args into cls_common structure
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 09:42:37 -07:00
Jiri Pirko 2572ac53c4 net: sched: make type an argument for ndo_setup_tc
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 09:42:35 -07:00
Jack Morgenstein bff0c6840c net/mlx4_core: Fixes missing capability bit in flags2 capability dump
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

    QUERY_DEV_CAP_DIAG_RPRT_PER_PORT

However, it failed to introduce a corresponding entry in function
dump_dev_cap_flags2() for outputting a line in the message log
when this capability bit is set.

The change here fixes that omission.

Fixes: c7c122ed67 ("net/mlx4: Add diagnostic counters capability bit")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Jack Morgenstein f9fb9d0b85 net/mlx4_core: Fix namespace misalignment in QinQ VST support commit
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

    MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP

However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay
consistent with the value used in another namespace in
function dump_dev_cap_flags2(), which is manually kept in sync.
The change here restores that consistency.

Fixes: 7c3d21c815 ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Jack Morgenstein 5886259c12 net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump
The index value in function dump_dev_cap_flags2() for outputting
"sl to vl mapping table change event support" needs to be
consistent with the value of the enumerated constant
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file
include/linux/mlx4_device.h

The change here restores that consistency.

Fixes: fd10ed8e6f ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Inbar Karmy c994f778bb net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.

Tested:
As accepted, when NIC supports WoL- ethtool reports:
	Supports Wake-on: g
	Wake-on: d
when NIC doesn't support WoL- ethtool reports:
        Supports Wake-on: d
        Wake-on: d

Fixes: 14c07b1358 ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-02 10:44:09 -07:00
Zhu Yanjun f575a02ee7 mlx4_en: remove unnecessary error check
The function mlx4_en_get_profile always return zero. So it is not
necessary to check its return value.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 17:26:32 -07:00
Zhu Yanjun f3eebe8819 mlx4_en: remove unnecessary returned value
The function mlx4_en_arm_cq always returns zero. So change the
return type of the function mlx4_en_arm_cq to void.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:29:55 -07:00
Moshe Shemesh f330187016 (IB, net)/mlx4: Add resource utilization support
Adding visibility of resource usage of QPs, CQs and counters used by
virtual functions. This feature will be used to give the PF administrator
more data while debugging VF status. Usage info was added to ALLOC_RES
command, to notify the PF if the resource which is being reserved or
allocated for the VF will be used by kernel driver or by user verbs.

Updated reservation and allocation functions of QP, CQ and counter with
additional usage parameter.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:35 -04:00
Linus Torvalds 96080f6977 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) BPF verifier signed/unsigned value tracking fix, from Daniel
    Borkmann, Edward Cree, and Josef Bacik.

 2) Fix memory allocation length when setting up calls to
    ->ndo_set_mac_address, from Cong Wang.

 3) Add a new cxgb4 device ID, from Ganesh Goudar.

 4) Fix FIB refcount handling, we have to set it's initial value before
    the configure callback (which can bump it). From David Ahern.

 5) Fix double-free in qcom/emac driver, from Timur Tabi.

 6) A bunch of gcc-7 string format overflow warning fixes from Arnd
    Bergmann.

 7) Fix link level headroom tests in ip_do_fragment(), from Vasily
    Averin.

 8) Fix chunk walking in SCTP when iterating over error and parameter
    headers. From Alexander Potapenko.

 9) TCP BBR congestion control fixes from Neal Cardwell.

10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger.

11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong
    Wang.

12) xmit_recursion in ppp driver needs to be per-device not per-cpu,
    from Gao Feng.

13) Cannot release skb->dst in UDP if IP options processing needs it.
    From Paolo Abeni.

14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander
    Levin and myself.

15) Revert some rtnetlink notification changes that are causing
    regressions, from David Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  net: bonding: Fix transmit load balancing in balance-alb mode
  rds: Make sure updates to cp_send_gen can be observed
  net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
  ipv4: initialize fib_trie prior to register_netdev_notifier call.
  rtnetlink: allocate more memory for dev_set_mac_address()
  net: dsa: b53: Add missing ARL entries for BCM53125
  bpf: more tests for mixed signed and unsigned bounds checks
  bpf: add test for mixed signed and unsigned bounds checks
  bpf: fix up test cases with mixed signed/unsigned bounds
  bpf: allow to specify log level and reduce it for test_verifier
  bpf: fix mixed signed/unsigned derived min/max value bounds
  ipv6: avoid overflow of offset in ip6_find_1stfragopt
  net: tehuti: don't process data if it has not been copied from userspace
  Revert "rtnetlink: Do not generate notifications for CHANGEADDR event"
  net: dsa: mv88e6xxx: Enable CMODE config support for 6390X
  dt-binding: ptp: Add SoC compatibility strings for dte ptp clock
  NET: dwmac: Make dwmac reset unconditional
  net: Zero terminate ifr_name in dev_ifname().
  wireless: wext: terminate ifr name coming from userspace
  netfilter: fix netfilter_net_init() return
  ...
2017-07-20 16:33:39 -07:00
Leon Romanovsky 8900b894e7 {net, IB}/mlx4: Remove gfp flags argument
The caller to the driver marks GFP_NOIO allocations with help
of memalloc_noio-* calls now. This makes redundant to pass down
to the driver gfp flags, which can be GFP_KERNEL only.

The patch removes the gfp flags argument and updates all driver paths.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:24 -04:00
Zhu Yanjun e36fef66f4 mlx4_en: remove unnecessary returned value check
The function __mlx4_zone_remove_one_entry always returns zero. So
it is not necessary to check it.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-15 14:29:49 -07:00
Zhu Yanjun 3b68067bd2 mlx4_en: make mlx4_log_num_mgm_entry_size static
The variable mlx4_log_num_mgm_entry_size is only called in main.c.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03 02:41:26 -07:00