Commit Graph

9114 Commits

Author SHA1 Message Date
Parav Pandit dd81b2c8a3 IB/core: Change filter function return type from int to bool
Filter functions returns either 0 or 1, therefore better change their
return type from int to bool to reflect the same.  Additionally some
filter functions have suffix of _filter some doesn't.  Make all filter
function consistent to have __filter suffix to improve code readability.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15 13:33:20 -06:00
Parav Pandit d12e2eed27 IB/core: Update GID entries for netdevice whose mac address changes
Update all GID table entries of the netdevice whose MAC address changed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15 13:33:20 -06:00
Parav Pandit 464b79b45a IB/core: Add default GIDs of the bond master netdev
Currently following issues exist:
1. Default GIDs of the lower (slave) netdevice if the bond netdevice is
   added. Rather default GID should be of bond master netdevice.
2. Due to this, when failover event occurs FAILOVER event handler attempts
   to delete the GID of the upper device and tries to add the default GID
   of the lower device. This is incorrect behavior.

To have simple and correct code:
(a) Split default GIDs addition out of add_netdev_ips().  This allows
    easier removal in future if RoCE default GIDs are removed.
(b) Add default GIDs of the bond master device by using right filter and
    callback function.
(c) Remove unused function enum_netdev_default_gids().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15 13:33:20 -06:00
Parav Pandit a03d4d2775 IB/core: Consider adding default GIDs of bond device
Now that we correctly delete the default GIDs of lower devices during
CHANGEUPPER event, add default GIDs of the bonding master device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15 13:33:19 -06:00
Parav Pandit 408f1242d9 IB/core: Delete lower netdevice default GID entries in bonding scenario
When NETDEV_CHANGEUPPER event occurs, lower device is not yet established
as slave of the master, and when upper device is bond device, default GID
entries not deleted.

Due to this, when bond device is fully configured, default GID entries of
bond device cannot be added as default GID entries are occupied by the
lower netdevice. This is incorrect.

Default GID entries should really be of bond netdevice because in all RoCE
GIDs (default or IP), MAC address of the bond device will be used.  It is
confusing to have default GID of netdevice which is not really used for
any purpose.

Therefore, as first step, implement
(a) filter function which filters if a CHANGEUPPER event netdevice and
    associated upper device is master device or not.
(b) callback function which deletes the default GIDs of lower (event
    netdevice).

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15 13:33:19 -06:00
Parav Pandit b9f09866e0 IB/core: Avoid confusing del_netdev_default_ips
Currently bond_delete_netdev_default_gids() is called by two callers.
(a) del_netdev_default_ips_join()
(b) del_netdev_default_ips()

Both above functions changes the argument order while calling
bond_delete_netdev_default_gids().  This required silly
del_netdev_default_ips() wrapper.

Additionally, del_netdev_default_ips() deletes default GIDs not IP based
GIDs.  del_netdev_default_ips() having _ips suffix is confusing.

Therefore, get rid of confusing del_netdev_default_ips() and simplify
bond_delete_netdev_default_gids() to follow same argument order as its
caller.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 16:43:55 -06:00
Parav Pandit 666e7099a4 IB/core: Add comment for change upper netevent handling
Add comment for handling CHANGEUPPER netevent handling.
To improve code readability,
(a) move cmd definitions to its respective if-else branches,
(b) avoid single line structure definitions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 16:43:55 -06:00
Yuval Bason 40b173ddce qedr: Add user space support for SRQ
This patch adds support for SRQ's created in user space and update
qedr_affiliated_event to deal with general SRQ events.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 16:31:54 -06:00
Yuval Bason 3491c9e799 qedr: Add support for kernel mode SRQ's
Implement the SRQ specific verbs and update the poll_cq verb to deal with
SRQ completions.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 16:31:54 -06:00
Yuval Bason 1212767e23 qedr: Add wrapping generic structure for qpidr and adjust idr routines.
Today, we are using idr mechanism for QP's only.
This patch prepares the qedr_idr stuctures and the idr routines for
both QP's and SRQ's.

Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 16:31:54 -06:00
Jason Gunthorpe 0625b4ba1a IB/mlx5: Fix leaking stack memory to userspace
mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes
were written.

Fixes: 41d902cb7c ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp")
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14 15:35:12 -06:00
Linus Torvalds 2c20443ec2 ACPI updates for 4.19-rc1
- Revert two ACPICA commits that are not needed any more (Erik
    Schmauss).
 
  - Rework property graph support in the ACPI device properties
    framework to make it behave more like the analogous DT code
    and update the documentation of it (Sakari Ailus).
 
  - Change the default ACPI device status after initialization
    to ACPI_STA_DEFAULT instead of 0 (Hans de Goede).
 
  - Add a special platform driver for enumerating multiple I2C devices
    hooked up to the same object in the ACPI tables (Hans de Goede).
 
  - Fix the ACPI battery driver to avoid reporting full capacity on
    systems without support for that and clean it up (Hans de Goede,
    Dmitry Rozhkov, Lucas Rangit Magasweran).
 
  - Add two system wakeup quirks to the ACPI EC driver (Aaron Ma,
    Mika Westerberg).
 
  - Add the touchscreen on Dell Venue Pro 7139 to the list of "always
    present" devices to make it work (Tristian Celestin).
 
  - Revert a special tables handling quirk for Dell XPS 9570 and
    Precision M5530 which is not needed any more (Kai Heng Feng).
 
  - Add support for a new OEM _OSI string to allow system vendors to
    work around issues with NVidia HDMI audio (Alex Hung).
 
  - Prevent the ACPI button driver from reporting excessive system
    wakeup events and clean it up (Ravi Chandra Sadineni, Randy Dunlap).
 
  - Clean up two minor code style issues in the ACPI core and GHES
    handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJbcqQyAAoJEILEb/54YlRxpeIP/i+eB9sq+Pwwb6rpvrcx8JkQ
 cWYbjBQPZLsqqHBQDlwVC4I5o0gn6OhRsQlgSQdxheXA+G6kRtbh942oPptIH7NY
 vF0g0OYFZGhahjgkLyBtXlacMi+A+aSFkZ6lc4Ie4x/nFAELumjpYtef4v09Ecme
 p1G5wz5KYv+47t0DHl+Lb1NglIHsLKtiNMiy9QHyXl3oxrcbW7VIG5o8INw6TgIg
 /yaYOAzU8UMjnhcn6gDgrD+OKdux8jt3yAaHW/90/zBp4qbAJ9fp8zhb5LP7M0sE
 r63pC2qUq7NDKAl99sMuX5I2jSub9d2yJcjcOSyt+FnB/ErsBCbJGm5rEkK4sRHv
 wkV8ybHdq+PkOblR1ob5LMvDXcTlsVTMPonS/oPbEH4J7+jbJjU4vxZxblJGyan6
 6GeiOoI6tMyZB7DBiTJH4EgwcNu6GESfPURnyUqRGCUswS7ocQBmRdSbsf6hRgn9
 loOZnYXb+PPIxYHH9deFsej71d6En9bMUyUZmOGeLWZ+NSW69XISgnke0tmtgvlZ
 pj3zJ7Egnw4SBms2gL/VtPEulxJ2MxScju1RbKMpjP86oIDrO4u3pD8tYSTouSSG
 P3AjFXVqpKMTmfVUnbqu/Avg7GsuDHcv8lgUbY8BQGLXaZVni2oJBtcwc8Ca0lJg
 8q0xTvX+YEnuOwgvNoNz
 =5mW8
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These revert two ACPICA commits that are not needed any more, rework
  the property graphs support in ACPI to be more aligned with the
  analogous DT code, add some new quirks and remove one that isn't
  needed any more, add a special platform driver to enumerate multiple
  I2C devices hooked up to the same device object in the ACPI tables and
  update the battery and button drivers.

  Specifics:

   - Revert two ACPICA commits that are not needed any more (Erik
     Schmauss).

   - Rework property graph support in the ACPI device properties
     framework to make it behave more like the analogous DT code and
     update the documentation of it (Sakari Ailus).

   - Change the default ACPI device status after initialization to
     ACPI_STA_DEFAULT instead of 0 (Hans de Goede).

   - Add a special platform driver for enumerating multiple I2C devices
     hooked up to the same object in the ACPI tables (Hans de Goede).

   - Fix the ACPI battery driver to avoid reporting full capacity on
     systems without support for that and clean it up (Hans de Goede,
     Dmitry Rozhkov, Lucas Rangit Magasweran).

   - Add two system wakeup quirks to the ACPI EC driver (Aaron Ma, Mika
     Westerberg).

   - Add the touchscreen on Dell Venue Pro 7139 to the list of "always
     present" devices to make it work (Tristian Celestin).

   - Revert a special tables handling quirk for Dell XPS 9570 and
     Precision M5530 which is not needed any more (Kai Heng Feng).

   - Add support for a new OEM _OSI string to allow system vendors to
     work around issues with NVidia HDMI audio (Alex Hung).

   - Prevent the ACPI button driver from reporting excessive system
     wakeup events and clean it up (Ravi Chandra Sadineni, Randy
     Dunlap).

   - Clean up two minor code style issues in the ACPI core and GHES
     handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd)"

* tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
  platform/x86: Add ACPI i2c-multi-instantiate pseudo driver
  ACPI / x86: utils: Remove status workaround from acpi_device_always_present()
  ACPI / scan: Create platform device for fwnodes with multiple i2c devices
  ACPI / scan: Initialize status to ACPI_STA_DEFAULT
  ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th
  ACPI: bus: Fix a pointer coding style issue
  arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()
  ACPI / scan: Add static attribute to indirect_io_hosts[]
  ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity
  ACPI / EC: Use ec_no_wakeup on ThinkPad X1 Yoga 3rd
  ACPI / battery: get rid of negations in conditions
  ACPI / battery: use specialized print macros
  ACPI / battery: reorder headers alphabetically
  ACPI / battery: drop inclusion of init.h
  ACPI: battery: remove redundant old_present check on insertion
  ACPI: property: graph: Update graph documentation to use generic references
  ACPI: property: graph: Improve graph documentation for port/ep numbering
  ACPI: property: graph: Fix graph documentation
  ACPI: property: Update documentation for hierarchical data extension 1.1
  ACPI: property: Document key numbering for hierarchical data extension refs
  ...
2018-08-14 13:39:52 -07:00
Linus Torvalds 73ba2fb33c for-4.19/block-20180812
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl
 tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J
 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn
 thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ
 QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8
 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk
 yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X
 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l
 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK
 FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA
 Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5
 Zolos5H+JA==
 =b9ib
 -----END PGP SIGNATURE-----

Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "First pull request for this merge window, there will also be a
  followup request with some stragglers.

  This pull request contains:

   - Fix for a thundering heard issue in the wbt block code (Anchal
     Agarwal)

   - A few NVMe pull requests:
      * Improved tracepoints (Keith)
      * Larger inline data support for RDMA (Steve Wise)
      * RDMA setup/teardown fixes (Sagi)
      * Effects log suppor for NVMe target (Chaitanya Kulkarni)
      * Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
      * TP4004 (ANA) support (Christoph)
      * Various NVMe fixes

   - Block io-latency controller support. Much needed support for
     properly containing block devices. (Josef)

   - Series improving how we handle sense information on the stack
     (Kees)

   - Lightnvm fixes and updates/improvements (Mathias/Javier et al)

   - Zoned device support for null_blk (Matias)

   - AIX partition fixes (Mauricio Faria de Oliveira)

   - DIF checksum code made generic (Max Gurtovoy)

   - Add support for discard in iostats (Michael Callahan / Tejun)

   - Set of updates for BFQ (Paolo)

   - Removal of async write support for bsg (Christoph)

   - Bio page dirtying and clone fixups (Christoph)

   - Set of bcache fix/changes (via Coly)

   - Series improving blk-mq queue setup/teardown speed (Ming)

   - Series improving merging performance on blk-mq (Ming)

   - Lots of other fixes and cleanups from a slew of folks"

* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
  blkcg: Make blkg_root_lookup() work for queues in bypass mode
  bcache: fix error setting writeback_rate through sysfs interface
  null_blk: add lock drop/acquire annotation
  Blk-throttle: reduce tail io latency when iops limit is enforced
  block: paride: pd: mark expected switch fall-throughs
  block: Ensure that a request queue is dissociated from the cgroup controller
  block: Introduce blk_exit_queue()
  blkcg: Introduce blkg_root_lookup()
  block: Remove two superfluous #include directives
  blk-mq: count the hctx as active before allocating tag
  block: bvec_nr_vecs() returns value for wrong slab
  bcache: trivial - remove tailing backslash in macro BTREE_FLAG
  bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
  bcache: set max writeback rate when I/O request is idle
  bcache: add code comments for bset.c
  bcache: fix mistaken comments in request.c
  bcache: fix mistaken code comments in bcache.h
  bcache: add a comment in super.c
  bcache: avoid unncessary cache prefetch bch_btree_node_get()
  bcache: display rate debug parameters to 0 when writeback is not running
  ...
2018-08-14 10:23:25 -07:00
Rafael J. Wysocki 4a3f421b87 Merge branches 'acpica' and 'acpi-property'
Merge ACPICA changes and updates of the ACPI device properties
framework for 4.19.

These revert two ACPICA commits that are not needed any more and
modify the properties graph support in ACPI to be more in-line with
the analogous DT code.

* acpica:
  ACPICA: Update version to 20180629
  ACPICA: Revert "iASL compiler: allow compilation of externals with paths that refer to existing names"
  ACPICA: Revert "iASL: change processing of external op namespace nodes for correctness"

* acpi-property:
  ACPI: property: graph: Update graph documentation to use generic references
  ACPI: property: graph: Improve graph documentation for port/ep numbering
  ACPI: property: graph: Fix graph documentation
  ACPI: property: Update documentation for hierarchical data extension 1.1
  ACPI: property: Document key numbering for hierarchical data extension refs
  ACPI: property: Use data node name and reg property for graphs
  ACPI: property: Allow direct graph endpoint references
  ACPI: property: Make the ACPI graph API private
  ACPI: property: Document hierarchical data extension references
  ACPI: property: Allow making references to non-device nodes
  ACPI: Convert ACPI reference args to generic fwnode reference args
2018-08-14 10:04:59 +02:00
Jason Gunthorpe 486edfb103 IB/ucm: Fix compiling ucm.c
Even though this interface is marked CONFIG_BROKEN we still expect it to
compile, at least until we delete it completely.

Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations
can be detected.

Fixes: e7ff98aefc ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13 20:04:37 -06:00
Linus Torvalds de5d1b39ea Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking/atomics update from Thomas Gleixner:
 "The locking, atomics and memory model brains delivered:

   - A larger update to the atomics code which reworks the ordering
     barriers, consolidates the atomic primitives, provides the new
     atomic64_fetch_add_unless() primitive and cleans up the include
     hell.

   - Simplify cmpxchg() instrumentation and add instrumentation for
     xchg() and cmpxchg_double().

   - Updates to the memory model and documentation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  locking/atomics: Rework ordering barriers
  locking/atomics: Instrument cmpxchg_double*()
  locking/atomics: Instrument xchg()
  locking/atomics: Simplify cmpxchg() instrumentation
  locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
  tools/memory-model: Rename litmus tests to comply to norm7
  tools/memory-model/Documentation: Fix typo, smb->smp
  sched/Documentation: Update wake_up() & co. memory-barrier guarantees
  locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
  sched/core: Use smp_mb() in wake_woken_function()
  tools/memory-model: Add informal LKMM documentation to MAINTAINERS
  locking/atomics/Documentation: Describe atomic_set() as a write operation
  tools/memory-model: Make scripts executable
  tools/memory-model: Remove ACCESS_ONCE() from model
  tools/memory-model: Remove ACCESS_ONCE() from recipes
  locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
  MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
  tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
  tools/memory-model: Add litmus test for full multicopy atomicity
  locking/refcount: Always allow checked forms
  ...
2018-08-13 12:23:39 -07:00
Jason Gunthorpe 4ce719f846 IB/uverbs: Do not check for device disassociation during ioctl
Now that the ioctl path and uobjects are converted to use uverbs_api, it
is now safe to remove the disassociation protection from the common ioctl
code.

This completes the work to make destroy functions continue to work even
after device disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13 09:17:19 -06:00
Jason Gunthorpe 51d0a2b4cf IB/uverbs: Remove struct uverbs_root_spec and all supporting code
Everything now uses the uverbs_uapi data structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13 09:17:19 -06:00
Jason Gunthorpe 3a863577a7 IB/uverbs: Use uverbs_api to unmarshal ioctl commands
Convert the ioctl method syscall path to use the uverbs_api data
structures. The new uapi structure includes all the same information, just
in a different and more optimal way.

 - Use attr_bkey instead of 2 level radix trees for everything related to
   attributes. This includes the attribute storage, presence, and
   detection of missing mandatory attributes.
 - Avoid iterating over all attribute storage at finish, instead use
   find_first_bit with the attr_bkey to locate only those attrs that need
   cleanup.
 - Organize things to always run, and always rely on, cleanup. This
   avoids a bunch of tricky error unwind cases.
 - Locate the method using the radix tree, and locate the attributes
   using a very efficient incremental radix tree lookup
 - Use the precomputed destroy_bkey to handle uobject destruction
 - Use the precomputed allocation sizes and precomputed 'need_stack'
   to avoid maths in the fast path. This is optimal if userspace
   does not pass (many) unsupported attributes.

Overall this results in much better codegen for the attribute accessors,
everything is now stored in bitmaps or linear arrays indexed by attr_bkey.
The compiler can compute attr_bkey values at compile time for all method
attributes, meaning things like uverbs_attr_is_valid() now compile into
single instruction bit tests.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13 09:17:16 -06:00
Jason Gunthorpe b61815e241 IB/uverbs: Use uverbs_alloc for allocations
Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-13 09:16:13 -06:00
Jason Gunthorpe 461bb2eee4 IB/uverbs: Add a simple allocator to uverbs_attr_bundle
This is similar in spirit to devm, it keeps track of any allocations
linked to this method call and ensures they are all freed when the method
exits. Further, if there is space in the internal/onstack buffer then the
allocator will hand out that memory and avoid an expensive call to
kalloc/kfree in the syscall path.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-13 09:16:08 -06:00
Jason Gunthorpe 6a1f444fef IB/uverbs: Remove the ib_uverbs_attr pointer from each attr
Memory in the bundle is valuable, do not waste it holding an 8 byte
pointer for the rare case of writing to a PTR_OUT. We can compute the
pointer by storing a small 1 byte array offset and the base address of the
uattr memory in the bundle private memory.

This also means we can access the kernel's copy of the ib_uverbs_attr, so
drop the copy of flags as well.

Since the uattr base should be private bundle information this also
de-inlines the already too big uverbs_copy_to inline and moves
create_udata into uverbs_ioctl.c so they can see the private struct
definition.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe 4b3dd2bbf0 IB/uverbs: Provide implementation private memory for the uverbs_attr_bundle
This already existed as the anonymous 'ctx' structure, but this was not
really a useful form. Hoist this struct into bundle_priv and rework the
internal things to use it instead.

Move a bunch of the processing internal state into the priv and reduce the
excessive use of function arguments.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe 6b0d08f4a2 IB/uverbs: Use uverbs_api to manage the object type inside the uobject
Currently the struct uverbs_obj_type stored in the ib_uobject is part of
the .rodata segment of the module that defines the object. This is a
problem if drivers define new uapi objects as we will be left with a
dangling pointer after device disassociation.

Switch the uverbs_obj_type for struct uverbs_api_object, which is
allocated memory that is part of the uverbs_api and is guaranteed to
always exist. Further this moves the 'type_class' into this memory which
means access to the IDR/FD function pointers is also guaranteed. Drivers
cannot define new types.

This makes it safe to continue to use all uobjects, including driver
defined ones, after disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe 9ed3e5f447 IB/uverbs: Build the specs into a radix tree at runtime
This radix tree datastructure is intended to replace the 'hash' structure
used today for parsing ioctl methods during system calls. This first
commit introduces the structure and builds it from the existing .rodata
descriptions.

The so-called hash arrangement is actually a 5 level open coded radix tree.
This new version uses a 3 level radix tree built using the radix tree
library.

Overall this is much less code and much easier to build as the radix tree
API allows for dynamic modification during the building. There is a small
memory penalty to pay for this, but since the radix tree is allocated on
a per device basis, a few kb of RAM seems immaterial considering the
gained simplicity.

The radix tree is similar to the existing tree, but also has a 'attr_bkey'
concept, which is a small value'd index for each method attribute. This is
used to simplify and improve performance of everything in the next
patches.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe 7d96c9b176 IB/uverbs: Have the core code create the uverbs_root_spec
There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.

The core uses this to build up the runtime uapi for each device.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10 16:06:24 -06:00
Jason Gunthorpe 922983c2a1 IB/uverbs: Fix reading of 32 bit flags
This is missing a zeroing of the high bits of flags, and is also not
correct for big endian machines. Properly zero extend the 32 bit flags
into the 64 bit stack variable.

Reported-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Fixes: bccd06223f ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-09 15:46:07 -06:00
Bart Van Assche 61b717d041 RDMA/rxe: Set wqe->status correctly if an unexpected response is received
Every function that returns COMPST_ERROR must set wqe->status to another
value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code
path for which this is not yet the case.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-08 09:57:30 -06:00
Potnuri Bharat Teja 2e51e45cf6 iw_cxgb4: pass window scale in flowc work request
This will allow FW to not send more data to TP (which would then need to
be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR.

Also refactor send_flowc() a bit to clean it up.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-08 09:54:55 -06:00
Leon Romanovsky 0dfe452241 RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wq
[   61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34
[   61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int'
[   61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96
[   61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   61.188315] Call Trace:
[   61.188661]  dump_stack+0xc7/0x13b
[   61.190427]  ubsan_epilogue+0x9/0x49
[   61.190899]  __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f
[   61.197040]  mlx5_ib_create_wq+0x1c99/0x1d50
[   61.206632]  ib_uverbs_ex_create_wq+0x499/0x820
[   61.213892]  ib_uverbs_write+0x77e/0xae0
[   61.248018]  vfs_write+0x121/0x3b0
[   61.249831]  ksys_write+0xa1/0x120
[   61.254024]  do_syscall_64+0x7c/0x2a0
[   61.256178]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   61.259211] RIP: 0033:0x7f54bab70e99
[   61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89
[   61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99
[   61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
[   61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002
[   61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0
[   61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000

Cc: <stable@vger.kernel.org> # 4.7
Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-08 09:47:26 -06:00
Parav Pandit 58796e67d5 IB/ucm: Initialize sgid request GID attribute pointer
sgid_attr is uninitialized on the stack, initialize it to NULL.

Fixes: 398391071f ("IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-07 12:56:10 -06:00
Jens Axboe 05b9ba4b55 Linux 4.18-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
 mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
 vJJlibI=
 =42a5
 -----END PGP SIGNATURE-----

Merge tag 'v4.18-rc6' into for-4.19/block2

Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
merge conflict down the line.

Signed-of-by: Jens Axboe <axboe@kernel.dk>
2018-08-05 19:32:09 -06:00
David S. Miller c1c8626fce Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Lots of overlapping changes, mostly trivial in nature.

The mlxsw conflict was resolving using the example
resolution at:

https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05 13:04:31 -07:00
Linus Torvalds f6229c3958 One bug for missing user input validation:
- Refuse invalid port numbers in the modify_qp system call
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbZH4AAAoJEDht9xV+IJsausAP/RBqXIgC7qvZGrrmhBo5dF1j
 n3+fmynR6KaGDMYRlEbuj+Ce70HbOWyvHx/KgyEMtejjtldVvQqBmGYGR6Dcpn0/
 NA3VWFm7XaM+Oxg136NwkWtdiwUt0a/Ois6wSZsY6XJkXzTKmlyImcDJx1oQwvxZ
 +WwVx7t3kWoCWTXlcbAmz41gDHJFD7vszavhgz0o7Ik1IloYs4bV7NVgGw/athan
 2M8Df7tZNHhMgZDFhp3HeMbkjgApwnSMbGwtaEuaBZF4y1yEw7tp/xn6b87LzP9N
 1JoVPenCp5+Pzw73FgT2sPY9gfI6RieHH/ZGM09Ng9awwPe44pIPnbH9AF6KxlBB
 xFZT43i33CP1+85NFeSyxDeW3wwYXi48Qup7brtJ91LyD/KUmihAyoBuVaFbfYOr
 lOPn8aMBvyumprSjAGzCiiEO8nrwoGs8ZRMJeQXyBkpT9siPvLAMUt5ZweWl4rD2
 48/oR+KOrJ2rJCLwsmuQBoevxAJZ0/FVlg/o4oBk8ntjbDKVJzTZgEccqN+6sKrj
 W3CgK8RHpXWWIScZHcAseQvdKrrOPNtNszOYtXFF8QjdP1QzQvoisrd9yM1L+/+g
 RlsvMpLTHTF0S+d2mGNtB0xY6jh5HOAT62YNJGt8GNJ4d5c23bQ3s+SuiWLrF+6f
 jKER3Mz0YbwwVR07Ai2h
 =8q1F
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fix from Jason Gunthorpe:
 "One bug for missing user input validation: refuse invalid port numbers
  in the modify_qp system call"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Expand primary and alt AV port checks
2018-08-03 10:49:47 -07:00
Jason Gunthorpe 7601097604 IB/ipoib: Consolidate checking of the proposed child interface
Move all the checking for pkey and other validity to the __ipoib_vlan_add
function. This removes the last difference from the control flow
of the __ipoib_vlan_add to make the overall design simpler to
understand.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:44 -06:00
Jason Gunthorpe 13476d35bb IB/ipoib: Maintain the child_intfs list from ndo_init/uninit
This fixes a bug in the netlink path where the vlan_rwsem was not
held around __ipoib_vlan_add causing the child_intfs to be manipulated
unsafely.

In the process this greatly simplifies the vlan_rwsem write side locking
to only cover a single non-sleeping statement.

This also further increases the safety of the removal ordering by holding
the netdev of the parent while the child is active to ensure most bugs
become either an oops on a NULL priv or a deadlock on the netdev refcount.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:44 -06:00
Jason Gunthorpe 25405d98a2 IB/ipoib: Do not remove child devices from within the ndo_uninit
Switching to priv_destructor and needs_free_netdev created a subtle
ordering problem in ipoib_remove_one.

Now that unregister_netdev frees the netdev and priv we must ensure that
the children are unregistered before trying to unregister the parent,
or child unregister will use after free.

The solution is to unregister the children, then parent, in the same batch
all while holding the rtnl_lock. This closes all the races where a new
child could have been added and ensures proper ordering.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Jason Gunthorpe ee190ab734 IB/ipoib: Get rid of the sysfs_mutex
This mutex was introduced to deal with the deadlock formed by calling
unregister_netdev from within the sysfs callback of a netdev.

Now that we have priv_destructor and needs_free_netdev we can switch
to the more targeted solution of running the unregister from a
work queue. This avoids the deadlock and gets rid of the mutex.

The next patch in the series needs this mutex eliminated to create
atomicity of unregisteration.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Jason Gunthorpe 9f49a5b5c2 RDMA/netdev: Use priv_destructor for netdev cleanup
Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Jason Gunthorpe eaeb398425 IB/ipoib: Move init code to ndo_init
Now that we have a proper ndo_uninit, move code that naturally pairs
with the ndo_uninit into ndo_init. This allows the netdev core to natually
handle ordering.

This fixes the situation where register_netdev can fail before calling
ndo_init, in which case it wouldn't call ndo_uninit either.

Also move a bunch of duplicated init code that is shared between child
and parent for clarity. Now the child and parent register functions look
very similar.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Jason Gunthorpe 7cbee87c17 IB/ipoib: Move all uninit code into ndo_uninit
Currently uninit is sometimes done twice in error flows, and is sprinkled
a bit all over the place.

Improve the clarity of the design by moving all uninit only into
ndo_uinit.

Some duplication is removed:
 - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but
   this duplicates the process in ipoib_neigh_hash_init
 - Flushing priv->wq was sometimes done before unregister,
   but that duplicates what has been done in ndo_uninit

Uniniting the IB event queue must remain before unregister_netdev as it
requires the RTNL lock to be dropped, this is moved to a helper to make
that flow really clear and remove some duplication in error flows.

If register_netdev fails (and ndo_init is NULL) then it almost always
calls ndo_uninit, which lets us remove all the extra code from the error
unwinds. The next patch in the series will close the 'almost always' hole
by pairing a proper ndo_init with ndo_uninit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:27:43 -06:00
Erez Shitrit cda8daf179 IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task
The neigh_reap_task is self restarting, but so long as we call
cancel_delayed_work_sync() it will be guaranteed to not be running and
never start again. Thus we don't need to have the racy
IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes
calling flush_workqueue after the cancel.

This fixes a situation where the GC work could have been left running
in some rare situations.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02 20:27:19 -06:00
Jason Gunthorpe 577e07ffba IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN
This essentially duplicates the netdev's reg_state, so just use that
directly. The reg_state is updated under the rntl_lock, and all places
using GOING_DOWN already acquire the rtnl_lock so checking is safe.

Since the only place we use GOING_DOWN is for the parent device this
does not fix any bugs, but it is a step to tidy up the unregister flow
so that after later patches the flow is uniform and sane.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02 20:19:41 -06:00
Potnuri Bharat Teja 94245f4ad9 iw_cxgb4: Support FW write completion WR
To optimize NVME-oF READ IOPs, use a specialized WQE that combines
the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target
driver.

This reduces uP overhead per NVME-oF IO, and results in over 10%
improvement in NVME-oF 4K READ IOPs.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02 20:16:02 -06:00
Potnuri Bharat Teja b9855f4ca0 iw_cxgb4: RDMA write with immediate support
Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02 20:16:02 -06:00
Dan Carpenter 8001b717f0 rdma/cxgb4: fix some info leaks
In c4iw_create_qp() there are several struct members which potentially
aren't inintialized like uresp.rq_key.  I've fixed this code before in
in commit ae1fe07f3f ("RDMA/cxgb4: Fix stack info leak in
c4iw_create_qp()") so this time I'm just going to take a big hammer
approach and memset the whole struct to zero.  Hopefully, it will stay
fixed this time.

In c4iw_create_srq() we don't clear uresp.reserved.

Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02 20:10:54 -06:00
Yixian Liu 0425e3e6e0 RDMA/hns: Support flush cqe for hip08 in kernel space
According to IB protocol, there are some cases that work requests must
return the flush error completion status through the completion queue. Due
to hardware limitation, the driver needs to assist the flush process.

This patch adds the support of flush cqe for hip08 in the cases that
needed, such as poll cqe, post send, post recv and aeqe handle.

The patch also considered the compatibility between kernel and user space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02 20:03:25 -06:00
Mike Christie b287e3517e scsi: target: srp, vscsi, sbp, qla: use target_remove_session
This converts the drivers that called transport_deregister_session_configfs
and then immediately called transport_deregister_session to use
target_remove_session.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Boot <bootc@bootc.net>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: <qla2xxx-upstream@qlogic.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02 15:29:31 -04:00
Mike Christie fa83428730 scsi: target: rename target_alloc_session
Rename target_alloc_session to target_setup_session to avoid confusion with
the other transport session allocation function that only allocates the
session and because the target_alloc_session does so much more. It
allocates the session, sets up the nacl and registers the session.

The next patch will then add a remove function to match the setup in this
one, so it should make sense for all drivers, except iscsi, to just call
those 2 functions to setup and remove a session.

iscsi will continue to be the odd driver.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Boot <bootc@bootc.net>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: <qla2xxx-upstream@qlogic.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02 15:29:31 -04:00
Denis Drozdov 75da96067a IB/IPoIB: Set ah valid flag in multicast send flow
The change of ipoib_ah data structure with adding "valid" flag and
checks of ah->valid in ipoib_start_xmit affected multicast packet flow.

Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag
remains unset, so that ipoib_start_xmit end up with neigh_refresh_path
instead of sending the packet using neigh.

"ah->valid" has to be set in multicast send flow. As a result IPoIB
starts sending packets via neigh immediately and eliminates 60sec delay
of neigh keep alive interval.

The typical example of this issue are two sequential arpings:

arping 11.134.208.9 -> got response (mcast_send)
arping 11.134.208.9 -> no response  (ah->valid = 0)

Fixes: fa9391dbad ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 15:23:03 -06:00
Jason Gunthorpe 0f50d88a6e IB/uverbs: Allow all DESTROY commands to succeed after disassociate
The disassociate function was broken by design because it failed all
commands. This prevents userspace from calling destroy on a uobject after
it has detected a device fatal error and thus reclaiming the resources in
userspace is prevented.

This fix is now straightforward, when anything destroys a uobject that is
not the user the object remains on the IDR with a NULL context and object
pointer. All lookup locking modes other than DESTROY will fail. When the
user ultimately calls the destroy function it is simply dropped from the
IDR while any related information is returned.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe a9b66d6453 IB/uverbs: Do not block disassociate during write()
Now that all the callbacks are safe to run concurrently with
disassociation this test can be eliminated. The ufile core infrastructure
becomes entirely self contained and is not sensitive to disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe e83f0ecdc4 IB/uverbs: Do not pass struct ib_device to the ioctl methods
This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.

- Retrieve the ib_dev from the uobject->context->device. This is
  safe under ioctl as the core has already done rdma_alloc_begin_uobject
  and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe bbd51e881f IB/uverbs: Do not pass struct ib_device to the write based methods
This is a step to get rid of the global check for disassociation. In this
model, the ib_dev is not proven to be valid by the core code and cannot be
provided to the method. Instead, every method decides if it is able to
run after disassociation and obtains the ib_dev using one of three
different approaches:

- Call srcu_dereference on the udevice's ib_dev. As before, this means
  the method cannot be called after disassociation begins.
  (eg alloc ucontext)
- Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext()
- Retrieve the ib_dev from the uobject->object after checking
  under SRCU if disassociation has started (eg uobj_get)

Largely, the code is all ready for this, the main work is to provide a
ib_dev after calling uobj_alloc(). The few other places simply use
ib_uverbs_get_ucontext() to get the ib_dev.

This flexibility will let the next patches allow destroy to operate
after disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe cc2e14e680 IB/uverbs: Lower the test for ongoing disassociation
Commands that are reading/writing to objects can test for an ongoing
disassociation during their initial call to rdma_lookup_get_uobject.  This
directly prevents all of these commands from conflicting with an ongoing
disassociation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe 1e857e65d4 IB/uverbs: Allow uobject allocation to work concurrently with disassociate
After all the recent structural changes this is now straightforward, hold
the hw_destroy_rwsem across the entire uobject creation. We already take
this semaphore on the success path, so holding it a bit longer is not
going to change the performance.

After this change none of the create callbacks require the
disassociate_srcu lock to be correct.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe 7452a3c745 IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate
After all the recent structural changes this is now straightfoward, hoist
the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around
the uobject write lock as well as the destroy.

This is necessary as obtaining a write lock concurrently with
uverbs_destroy_ufile_hw() will cause malfunction.

After this change none of the destroy callbacks require the
disassociate_srcu lock to be correct.

This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the
IOCTL interface needs to hold an unlocked kref until all command
verification is completed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe 9867f5c669 IB/uverbs: Convert 'bool exclusive' into an enum
This is more readable, and future patches will need a 3rd lookup type.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe 87ad80abc7 IB/uverbs: Consolidate uobject destruction
There are several flows that can destroy a uobject and each one is
minimized and sprinkled throughout the code base, making it difficult to
understand and very hard to modify the destroy path.

Consolidate all of these into uverbs_destroy_uobject() and call it in all
cases where a uobject has to be destroyed.

This makes one change to the lifecycle, during any abort (eg when
alloc_commit is not called) we always call out to alloc_abort, even if
remove_commit needs to be called to delete a HW object.

This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
clarify its actual usage and revises some of the comments to reflect what
the life cycle is for the type implementation.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe 32ed5c00ac IB/uverbs: Make the write path destroy methods use the same flow as ioctl
The ridiculous dance with uobj_remove_commit() is not needed, the write
path can follow the same flow as ioctl - lock and destroy the HW object
then use the data left over in the uobject to form the response to
userspace.

Two helpers are introduced to make this flow straightforward for the
caller.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:48 -06:00
Jason Gunthorpe aa72c9a5f9 IB/uverbs: Remove rdma_explicit_destroy() from the ioctl methods
The core code will destroy the HW object on behalf of the method, if the
method provides an implementation it must simply copy data from the stub
uobj into the response. Destroy methods cannot touch the HW object.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01 14:55:37 -06:00
Kamal Heib 26e551c5ae RDMA: Fix return code check in rdma_set_cq_moderation
The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31 17:03:46 -06:00
Bart Van Assche dd708e7b45 rdma/cxgb4: Simplify a structure initialization
This patch avoids that sparse reports the following warning:

drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31 16:57:23 -06:00
Bart Van Assche eb2463bab4 rdma/cxgb4: Fix SRQ endianness annotations
This patch avoids that sparse complains about casts to restricted __be32.

Fixes: a3cdaa69e4 ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31 16:57:23 -06:00
Bart Van Assche 7810e09bfb rdma/cxgb4: Remove a set-but-not-used variable
This patch avoids that the following warning is reported when building with
W=1:

drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
  u8 status;
     ^~~~~~

Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-31 16:57:22 -06:00
Parav Pandit 8546331651 RDMA/core: Prefix _ib to IB/RoCE specific functions
In rdma cm module, functions which are common between IB and iWarp
are named with cma_.
iWarp specific functions are prefixed with cma_iw.
IB specific functions are perfixed with cma_ib.

However some functions in request processing path didn't follow
cma_ib notion. Prefix them with _ib for better code clarity.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit 79d684f026 RDMA/core: Simplify gid type check in cma_acquire_dev()
cma_add_one() initializes the default GID regardless of device type.
listen_id is bound to a device and an IP address, its GID type is
initialized by cma_acquire_dev().

Therefore a valid default GID type is always available, it is not needed
to check port type during cma_acquire_dev().

Initialize gid type of a cm id when the cm_id is created instead of
doing conditional checks during cma_acquire_dev() and trying to
initialize to 0 during _cma_attach_to_dev().

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit 7582df8267 RDMA/core: Avoid holding lock while initializing fields on stack
In various functions rdma_cm_event is zero initialized on stack using
memset() while holding lock which is not necessary.
Therefore, don't hold the lock while initializing on stack.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit ca3a8ace2b RDMA/core: Return bool instead of int
Return bool for following internal and inline functions as their
underlying APIs return bool too.

1. cma_zero_addr()
2. cma_loopback_addr()
3. cma_any_addr()
4. ib_addr_any()
5. ib_addr_loopback()

While we are touching cma_loopback_addr(), remove extra white spaces
in it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit 05e0b86c41 RDMA/cma: Get rid of 1 bit boolean
Arrange fields of cma_req_info structure for efficiency on
stack and get rid of one bit boolean field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit e7ff98aefc RDMA/cma: Constify path record, ib_cm_event, listen_id pointers
Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit 2df7dba855 RDMA/core: Constify dst_addr argument
Following APIs are not supposed to modify addr or dest_addr contents.
Therefore make those function argument const for better code
readability.

1. rdma_resolve_ip()
2. rdma_addr_size()
3. rdma_resolve_addr()

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit 219d2e9dfd RDMA/cma: Simplify rdma_resolve_addr() error flow
Currently dst address is first set and later on cleared on either of the
3 error conditions are met.
However none of the APIs or checks are supposed to refer to the
destination address of the cm_id.
Therefore, set the destination address after necessary checks pass which
simplifies the error flow.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Parav Pandit e11fef9f8d RDMA/cma: Initialize resource type in __rdma_create_id()
Currently rdma_cm_id's resource tracking fields such as owner task and
kern_name and other non resource tracking fields are initialized in
in single function __rdma_create_id().

Therefore, initialize rdma_cm_id's resource type also in same init
function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:49:04 -06:00
Lijun Ou cdfa4ad5d6 RDMA/hns: Program the tclass and flow label into the hardware
This was missed in a few places, and was just using 0.

Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:42:44 -06:00
Lijun Ou 426c414619 RDMA/hns: Use macro instead of magic number
This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:42:44 -06:00
Lijun Ou ac7cbf96c2 RDMA/hns: Modify qp will return errno when qp type is illegal
Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:42:44 -06:00
Lijun Ou c8e46f8d63 RDMA/hns: Assign the value for vlan field of qp context
This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:42:44 -06:00
Lijun Ou 610b89677f RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set
Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:42:44 -06:00
Kamal Heib 1ffba62642 RDMA/providers: Remove pointless functions
The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:31:54 -06:00
Kamal Heib 0584c47bbc RDMA/core: Check for verbs callbacks before using them
Make sure the providers implement the verbs callbacks before calling
them, otherwise return -EOPNOTSUPP.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:31:09 -06:00
Kamal Heib 7150c3d554 RDMA/core: Remove {create,destroy}_ah from mandatory verbs
{create,destroy}_ah aren't mandatory verbs, because not all providers
are implementing them.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:31:09 -06:00
Kamal Heib e586e1e1b7 RDMA/ipoib: Fix check for return code from ib_create_srq
Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the
return code from ib_create_srq() in case that it not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:29:45 -06:00
Kamal Heib 8380b74e7d RDMA/providers: Fix return value from create_srq callbacks
The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:29:45 -06:00
Jack Morgenstein f95ccffc71 IB/mlx4: Use 4K pages for kernel QP's WQE buffer
In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.

Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.

This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.

This patch was tested with Lustre and no performance degradation
was observed.

Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.

Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]

Fixes: 73898db043 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:25:46 -06:00
Jason Gunthorpe bccd06223f IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language
This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.

Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.

If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.

Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-30 20:23:29 -06:00
Bart Van Assche d34ac5cd3a RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:09:34 -06:00
Bart Van Assche 7bb1fafc2f IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument
Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Bart Van Assche f696bf6d64 RDMA: Constify the argument of the work request conversion functions
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 20:00:20 -06:00
Bart Van Assche 3e081b773e IB/iser: Inline two work request conversion functions
Since the next patch will change the return type of these functions into a
const pointer and since the iSER driver modifies the work request these
functions return a pointer two, inline two work request conversion
function calls. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30 19:59:40 -06:00
Max Gurtovoy ddd0bc7569 block: move ref_tag calculation func to the block layer
Currently this function is implemented in the scsi layer, but it's
actual place should be the block layer since T10-PI is a general
data integrity feature that is used in the nvme protocol as well.

Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-30 08:27:01 -06:00
Lijun Ou df06510793 RDMA/hns: Enable modify_cq for uverbs.
The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 20:12:35 -06:00
Lijun Ou 0c4a0e2987 RDMA/hns: Update the data type of immediate data
Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 20:10:37 -06:00
Lijun Ou 73b4e1f4c0 RDMA/hns: Use delay instead of usleep
In order to avoid using usleep function in lock function, we use delay
function instead of it.  Besides, it also use brackets for standardized
the computed order.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:51:47 -06:00
Lijun Ou 26f63b9c33 RDMA/hns: Add illegal hop_num judgement
When hop_num is more than three, it need to return -EINVAL.  This patch
fixes it.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:51:34 -06:00
Lijun Ou dedf63506a RDMA/hns: Return correct error code from hns_roce_v1_rsv_lp_qp()
When create loop qp fail, it will return the correct result when
modify_qp() fails.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:51:12 -06:00
Lijun Ou aaa3156779 RDMA/hns: Add 50GE type of hnae3 device match
This patch adds PCI matching for the hns 50GE NIC.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:49:06 -06:00
Lijun Ou 3635ac0208 RDMA/hns: Do not overwrite the error code during error unwind in hns_roce_init
When init cmq fail in initial flow of RoCE, it should return the errno of
cmq_init function, not of the rest call.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:48:48 -06:00
Qing Huang 2577188edc IB/mlx5: avoid excessive warning msgs when creating VFs on 2nd port
When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."

So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.

Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 16:02:28 -06:00
Bart Van Assche 7cfcc71eb0 RDMA/usnic: Suppress a compiler warning
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

drivers/infiniband/hw/usnic/usnic_fwd.c:95:2: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
  strncpy(ufdev->name, netdev_name(ufdev->netdev),
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(ufdev->name) - 1);
    ~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 11:39:06 -06:00
Parav Pandit 643d213a9a RDMA/cma: Do not ignore net namespace for unbound cm_id
Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.

Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA CM")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 09:47:47 -06:00
Parav Pandit d274e45ce1 RDMA/cma: Consider netdevice for RoCE ports
When netdevice is not found for a request, and if it for RoCE port,
currently it allows matching the listener as long as port number matches
by ignoring the netdevice.

Now that we always prefer to have netdevice associated with RoCE, when
netdevice is not found, don't consider RoCE ports.

In other words, a NULL netdevice with RoCE is not acceptable. Therefore,
remove this confusing RoCE port ignorance check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 09:47:47 -06:00
Parav Pandit cee104334c IB/core: Introduce and use sgid_attr in CM requests
For RoCE, when CM requests are received for RC and UD connections,
netdevice of the incoming request is unavailable. Because of that CM
requests are always forwarded to init_net namespace.

Now that we have the GID attribute available, introduce SGID attribute in
incoming CM requests and refer to the netdevice of it.  This is similar to
existing SGID attribute field in outgoing CM requests for RC and UD
transports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-26 09:47:47 -06:00
Jason Gunthorpe d238ca0981 IB/usnic: usnic should not select INFINIBAND_USER_ACCESS
This driver doesn't provide any kernel services, it only provides
an interface via uverbs, so it should depend on, not select, uverbs
support.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 21:25:54 -06:00
Raju Rangoju 6a0b6174d3 rdma/cxgb4: Add support for kernel mode SRQ's
This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.

Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.

This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 21:08:04 -06:00
Raju Rangoju 7fc7a7cffa rdma/cxgb4: Add support for srq functions & structs
This patch adds kernel mode t4_srq structures and support functions,
uapi structures and defines, as well as firmware work request structures.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 21:08:04 -06:00
Varsha Rao 076dd53be5 IB/core: Remove extra parentheses
Remove unnecessary parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:36:29 -06:00
Bart Van Assche 9491a1edba RDMA/ocrdma: Suppress a compiler warning
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:

In function 'ocrdma_mbx_get_ctrl_attribs',
    inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
   strncpy(dev->model_number,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~
    hba_attribs->controller_model_number, 31);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:35:11 -06:00
Jason Gunthorpe 22fa27fbc6 IB/uverbs: Fix locking around struct ib_uverbs_file ucontext
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.

Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
  while holding a READ or WRITE lock on the uobject.
  This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL

This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.

As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:46 -06:00
Jason Gunthorpe c36ee46daf IB/mlx5: Use the ucontext from the uobj, not the file
This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext.  Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe aba94548c9 IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit
Allocating the struct file during alloc_begin creates this strange
asymmetry with IDR, where the FD has two krefs pointing at it during the
pre-commit phase. In particular this makes the abort process for FD very
strange and confusing.

For instance abort currently calls the type's destroy_object twice, and
the fops release once if abort is done. This is very counter intuitive. No
fops should be called until alloc_commit succeeds, and destroy_object
should only ever be called once.

Moving the struct file allocation to the alloc_commit is now simple, as we
already support failure of rdma_alloc_commit_uobject, with all the
required rollback pieces.

This creates an understandable symmetry with IDR and simplifies/fixes the
abort handling for FD types.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe 2c96eb7d62 IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject()
The ioctl framework already does this correctly, but the write path did
not. This is trivially fixed by simply using a standard pattern to return
uobj_alloc_commit() as the last statement in every function.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe e951747a08 IB/uverbs: Rework the locking for cleaning up the ucontext
The locking here has always been a bit crazy and spread out, upon some
careful analysis we can simplify things.

Create a single function uverbs_destroy_ufile_hw() that internally handles
all locking. This pulls together pieces of this process that were
sprinkled all over the places into one place, and covers them with one
lock.

This eliminates several duplicate/confusing locks and makes the control
flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
simple.

Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
logically part of the rwsem and provides the 'down write, fail if write
locked, wait if read locked' semantic we require.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe 87064277c4 IB/uverbs: Revise and clarify the rwsem and uobjects_lock
Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
to the type destroy function (aka 'hw' destroy). The main purpose of this
lock is to prevent normal add and destroy from running concurrently with
uverbs_cleanup_ufile()

Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
we can eliminate the uobjects_lock in the cleanup function. This allows
converting that lock to a very simple spinlock with a narrow critical
section.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe e6d5d5ddd0 IB/uverbs: Clarify and revise uverbs_close_fd
The locking requirements here have changed slightly now that we can rely
on the ib_uverbs_file always existing and containing all the necessary
locking infrastructure.

That means we can get rid of the cleanup_mutex usage (this was protecting
the check on !uboj->context).

Otherwise, follow the same pattern that IDR uses for destroy, acquire
exclusive write access, then call destroy and the undo the 'lookup'.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe 5671f79b42 IB/uverbs: Revise the placement of get/puts on uobject
This wasn't wrong, but the placement of two krefs didn't make any
sense. Follow some simple rules.

- A kref is held inside uobjects_list
- A kref is held inside the IDR
- A kref is held inside file->private
- A stack based kref is passed bettwen alloc_begin and
  alloc_abort/alloc_commit

Any place we destroy one of the above pointers, we stick a put,
or 'move' the kref into another pointer.

The key functions have sensible semantics:
- alloc_uobj fully initializes the common members in uobj, including
  the list
- Get rid of the uverbs_idr_remove_uobj helper since IDR remove
  does require put, but it depends on the situation. Later
  patches will re-consolidate this differently.
- alloc_abort always consumes the passed kref, done in the type
- alloc_commit always consumes the passed kref, done in the type
- rdma_remove_commit_uobject always pairs with a lookup_get

After it is all done the only control flow change is to:
- move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
- add a put to remove_commit_idr_uobject
- Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
  the right place

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:22 -06:00
Jason Gunthorpe c561c28846 IB/uverbs: Clarify the kref'ing ordering for alloc_commit
The alloc_commit callback makes the uobj visible to other threads,
and it does so using a 'move' semantic of the uobj kref on the stack
into the public storage (eg the IDR, uobject list and file_private_data)

Once this is done another thread could start up and trigger deletion
of the kref. Fortunately cleanup_rwsem happens to prevent this from
being a bug, but that is a fantastically unclear side effect.

Re-organize things so that alloc_commit is that last thing to touch
the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
and add a comment reminding that uobj is no longer kref'd after
alloc_commit.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe 1250c3048c IB/uverbs: Handle IDR and FD types without truncation
Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
we ended up implicitly casting these ABI values into an 'int'. For ioctl()
we use a s64 for FDs and a u64 for IDRs, again casting to an int.

The various casts to int are all missing range checks which can cause
userspace values that should be considered invalid to be accepted.

Fix this by making the generic lookup routine accept a s64, which does not
truncate the write API's u32/s32 or the ioctl API's s64. Then push the
detailed range checking down to the actual type implementations to be
shared by both interfaces.

Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
if we ever wish to return a negative value for a FD it is carried
properly.

This ensures that userspace values are never weirdly interpreted due to
the various trunctations and everything that is really out of range gets
an EINVAL.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-25 14:21:21 -06:00
Jason Gunthorpe 3df593bfe6 IB/uverbs: Get rid of null_obj_type
If the method fails after calling rdma_explicit_destroy (eg if
copy_to_user faults) then it will trigger a kernel oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
SMP PTI
CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
 ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
 ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
 ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
 ? find_held_lock+0x2d/0x90
 ? __might_fault+0x39/0x90
 ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
 ? do_vfs_ioctl+0xa0/0x6d0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x24/0x40
 ? syscall_trace_enter+0x138/0x1d0
 ? ksys_ioctl+0x35/0x60
 ? __x64_sys_ioctl+0x11/0x20
 ? do_syscall_64+0x5b/0x1c0
 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is because the type was replaced with the null_type during explicit
destroy that cannot complete the destruction.

One of the side effects of replacing the type is to make the object
handle totally unreachable - so no other command could attempt to use
it, even though it remains on the uboject list.

We can get the same end result by just fully destroying the object inside
rdma_explicit_destroy and leaving the caller the residual kref for the
uobj with no attached HW object, and no presence in the ubojects list.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-25 14:21:21 -06:00
Bart Van Assche 9b32a59687 IB/srpt: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Bart Van Assche 71347b0c64 IB/srp: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Bart Van Assche e01a76743a IB/isert: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Bart Van Assche 604dbdc4a7 IB/iser: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Bart Van Assche 4b4671a0f2 IB/IPoIB: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Bart Van Assche 1fec77bf8f RDMA/core: Simplify ib_post_(send|recv|srq_recv)() calls
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 16:06:36 -06:00
Jack Morgenstein addb8a6559 RDMA/uverbs: Expand primary and alt AV port checks
The commit cited below checked that the port numbers provided in the
primary and alt AVs are legal.

That is sufficient to prevent a kernel panic. However, it is not
sufficient for correct operation.

In Linux, AVs (both primary and alt) must be completely self-described.
We do not accept an AV from userspace without an embedded port number.
(This has been the case since kernel 3.14 commit dbf727de74
("IB/core: Use GID table in AH creation and dmac resolution")).

For the primary AV, this embedded port number must match the port number
specified with IB_QP_PORT.

We also expect the port number embedded in the alt AV to match the
alt_port_num value passed by the userspace driver in the modify_qp command
base structure.

Add these checks to modify_qp.

Cc: <stable@vger.kernel.org> # 4.16
Fixes: 5d4c05c3ee ("RDMA/uverbs: Sanitize user entered port numbers prior to access it")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 15:30:34 -06:00
Zhu Yanjun 536ca245c5 IB/rxe: Drop QP0 silently
According to "Annex A16: RDMA over Converged Ethernet (RoCE)":

A16.4.3 MANAGEMENT INTERFACES

As defined in the base specification, a special Queue Pair, QP0 is defined
solely for communication between subnet manager(s) and subnet management
agents. Since such an IB-defined subnet management architecture is outside
the scope of this annex, it follows that there is also no requirement that
a port which conforms to this annex be associated with a QP0. Thus, for
end nodes designed to conform to this annex, the concept of QP0 is
undefined and unused for any port connected to an Ethernet network.

CA16-8: A packet arriving at a RoCE port containing a BTH with the
destination QP field set to QP0 shall be silently dropped.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 15:28:31 -06:00
Wei Yongjun 99a7e2bf70 IB/ipoib: Fix error return code in ipoib_dev_init()
Fix to return a negative error code from the ipoib_neigh_hash_init()
error handling case instead of 0, as done elsewhere in this function.

Fixes: 515ed4f3aa ("IB/IPoIB: Separate control and data related initializations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:56:37 -06:00
Yishai Hadas cb80fb1892 IB/mlx5: Enable driver uapi commands for flow steering
Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:33:52 -06:00
Yishai Hadas 6346f0bfa0 IB/mlx5: Add support for a flow table destination for driver flow steering
Add support to set a destination that is a flow table, this can come from
the DEVX destination.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:56 -06:00
Yishai Hadas d4be3f4466 IB/mlx5: Support adding flow steering rule by raw description
Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.

The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.

This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:56 -06:00
Yishai Hadas 3226944124 IB/mlx5: Introduce driver create and destroy flow methods
Introduce driver create and destroy flow methods on the uverbs flow
object.

This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.

The IB object's attributes are set via the ib_set_flow() helper function.

The specific implementation for the given specification is added in
downstream patches.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 14:03:49 -06:00
Yishai Hadas 6cd080a674 IB: Support ib_flow creation in drivers
This patch considers the case that ib_flow is created by some device
driver with its specific parameters using the KABI infrastructure.

In that case both QP and ib_uflow_resources might not be applicable.
Downstream patches from this series use the above functionality.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:34:55 -06:00
Yishai Hadas fd44e3853c IB/mlx5: Introduce flow steering matcher uapi object
Introduce flow steering matcher object and its create and destroy methods.

This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.

It will be used in downstream patches to be part of mlx5 specific create
flow method.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:34:37 -06:00
Jason Gunthorpe eda98779f7 Merge branch 'mellanox/mlx5-next' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Add support for flow table destination number
  net/mlx5: Add forward compatible support for the FTE match data
  net/mlx5: Fix tristate and description for MLX5 module
  net/mlx5: Better return types for CQE API
  net/mlx5: Use ERR_CAST() instead of coding it
  net/mlx5: Add missing SET_DRIVER_VERSION command translation
  net/mlx5: Add XRQ commands definitions
  net/mlx5: Add core support for double vlan push/pop steering action
  net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
  net/mlx5: FW tracer, add hardware structures
  net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-24 13:10:23 -06:00
Saeed Mahameed 7854ac44fe Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 core infrastructure updates and fixes.

From Eran:
 - Add MPEGC (Management PCIe General Configuration) registers and btis
 - Fix tristate and description for MLX5 module

rom Feras:
 - Add hardware structures for the firmware tracer

From Jainbo:
 - Core support for double vlan push/pop steering action

From Max:
 - Add XRQ commands definitions

From Noa:
 - Add missing SET_DRIVER_VERSION command translation

From Roi:
 - Use ERR_CAST() instead of coding it

From Tariq:
 - Better return types for CQE API

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 14:58:46 -07:00
Bart Van Assche acd4307a21 RDMA/bnxt_re: Modify a fall-through annotation
This patch avoids that gcc reports the following warning when building
with W=1:

drivers/infiniband/hw/bnxt_re/ib_verbs.c:2404:4: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-23 15:36:23 -06:00
Kamal Heib aa09ea6e6b RDMA/mlx5: Remove set but not used variables
Remove "uctx" and "pa" variables that were set but not used.

Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Fixes: 8f06228733 ("RDMA/mlx5: Remove debug prints of VMA pointers")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-23 15:34:54 -06:00
Jan Dakinevich 259e19145e IPoIB: use kvzalloc to allocate an array of bucket pointers
This table by default takes 32KiB which is 3rd memory order. Meanwhile,
this memory is not aimed for DMA operation and could be safely allocated
by vmalloc.

Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-23 15:08:20 -06:00
Sakari Ailus 977d5ad39f ACPI: Convert ACPI reference args to generic fwnode reference args
Convert all users of struct acpi_reference_args to more generic
fwnode_reference_args. This will

 1) avoid an ACPI specific references to device nodes with integer
    arguments as well as

 2) allow making references to nodes other than device nodes in ACPI.

As a by-product, convert the fwnode interger arguments to u64. The
arguments were 64-bit integers on ACPI but the fwnode arguments were
just 32-bit.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-07-23 12:44:52 +02:00
David S. Miller c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Sinan Kaya c6a44ba950 PCI: Rename pci_try_reset_bus() to pci_reset_bus()
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().

Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Sinan Kaya 811c5cb37d PCI: Unify try slot and bus reset API
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not.  A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.

Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Sinan Kaya 409888e096 IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
Getting ready to hide pci_reset_bridge_secondary_bus() from the drivers.
pci_reset_bridge_secondary_bus() should only be used internally by the
PCI code itself.

Other drivers should rely on higher level pci_try_reset_bus() API.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Eran Ben Elisha 048f31437a net/mlx5: Fix tristate and description for MLX5 module
Current description did not include new devices. Fix that by proving the
correct generic description.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Ingo Molnar 52b544bd38 Linux 4.18-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltLpVUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGWisH/ikONMwV7OrSk36Y
 5rxzTFUoBk0Qffct88gtSNuRVCxaVb1ofCndvFJE6A6HfJkWpbBzH6eq90aakmJi
 f7uFcu4YmsQpeQaf9lpftWmY2vDf2fIadVTV0RnSMXks57wMax1cpBe7LJGpz13e
 f+g5XRVs1MdlZVtr6tG2SU3Y5AqVVVsYe/0DBPonEqeh9/JJbPFCuNkFOxxzAqPu
 VTnjyoOqG8qtZzjklNtR5rZn0Gv592tWX36eiWTQdThNmVFkGEAJwsHCQlY4OQYK
 61QN4UhOHiu8e1ZuGDNEDhNVRnKtaaYUPFeWL1wLRW73ul4P3ZkpvpS8QTMwcFJI
 JjzNOkI=
 =ckcO
 -----END PGP SIGNATURE-----

Merge tag 'v4.18-rc5' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17 09:27:43 +02:00
Linus Torvalds 4659fc8484 Regression, user visible bugs, and crashing fixes:
- cxgb4 could wrongly fail MR creation due to a typo
 
 - Various crashes if the wrong QP type is mixed in with APIs that expect
   other types
 
 - Syzkaller oops
 
 - Using ERR_PTR and NULL together cases HFI1 to crash in some cases
 
 - mlx5 memory leak in error unwind
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbSNzSAAoJEDht9xV+IJsarfQP/38i9pbDqWdniEhv42CisS1D
 aZYygJ6yAsoEPmEI1oIGtJURte44wxHWmWO2Jbz8aTFl19tSh2cZgFfsNKImSU5A
 5ON19gxOeQ/KiZPmfbgP8cgunU41DpYq3twW8NvW9u5JTl1nbFKtpWfJxKVvjlsu
 PwiFrQxG43/9BrooHJc4eogJHmB77iypR3NmkagAM3oSx/d35zt+Wnw45bybIl8e
 T6OvyEvNHlGLQoqE8j4JYDN6whLwr7uqtcJXv/ukjhkD4WMb8ti9QZH6FPA+8pGG
 oRO5AlbWpcSHThu4tYYoThEdVMLjS5RnzOOUyMiHr7CS+MlEPrKtR5D23f3egSyU
 lUETkyPfkVRSrfD9LnOrj+W6xjwNMJm2Rq/zexVnoKjRrl5asfmDTTAhWJxu7CMK
 Oeb39ue6r7A3wEpaLWlqqrVnzIS+rKKKFCxqc+ni/JUTX0EPr+OcVJav+JmfjpKd
 Q24sbiSkoAp7HRJnsenCX4Kv2x55ClhLktGnltgoJtzsMbT/EOO38d2LrYpWkCkX
 aoi4Xpg1rOvY9biy4dDtGmawsKgEKwJ2j/xr6RQdKZZAifeabnoro/ekSj5/SquQ
 vUfATYG+JEKGethWgO2A7/tgdo7artrN77w0eM0yKtNuJwUGLHCuXbDFUgFxapvm
 RIXvnUEp9Q27bc2JjX3z
 =WDR8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Things have been quite slow, only 6 RC patches have been sent to the
  list. Regression, user visible bugs, and crashing fixes:

   - cxgb4 could wrongly fail MR creation due to a typo

   - various crashes if the wrong QP type is mixed in with APIs that
     expect other types

   - syzkaller oops

   - using ERR_PTR and NULL together cases HFI1 to crash in some cases

   - mlx5 memory leak in error unwind"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
  RDMA/uverbs: Don't fail in creation of multiple flows
  IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
  RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
  RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  iw_cxgb4: correctly enforce the max reg_mr depth
2018-07-13 12:42:14 -07:00
Jason Gunthorpe c012691508 IB/cm: Remove cma_multicast->igmp_joined
This variable isn't read and written to with proper locking, so it is
racy. Instead of using an unlocked bool use presence in the mc->list

The caller could race rdma_join_multicast with rdma_leave_multicast which
would leak a mc join and cause a use after free of mc.

Instead, do not add the mc to the list until it has completed
initialization, all mcs on the list require leaving.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-13 12:18:55 -06:00
Leon Romanovsky 1215cb7c88 RDMA/umem: Refactor exit paths in ib_umem_get
Simplify exit paths in ib_umem_get to use the standard goto unwind
pattern.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 12:15:05 -06:00
Leon Romanovsky 40ddacf2dd RDMA/umem: Don't hold mmap_sem for too long
DMA mapping is time consuming operation and doesn't need to be performed
with mmap_sem semaphore is held.

The semaphore only needs to be held for accounting and get_user_pages
related activities.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 12:09:57 -06:00
Bart Van Assche 6869e0004f IB/srpt: Fix srpt_cm_req_recv() error path (2/2)
If a login request was received through the RDMA/CM and if an error occurs
during login, clear rdma_cm_id->context instead of ib_cm_id->context.

Fixes: 63cf1a902c ("IB/srpt: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:58:45 -06:00
Bart Van Assche 847462de3a IB/srpt: Fix srpt_cm_req_recv() error path (1/2)
Once a target session has been allocated, if an error occurs, the session
must be freed. Since it is not safe to call blocking code from the context
of an connection manager callback, trigger target session release in this
case by calling srpt_close_ch().

Fixes: db7683d7de ("IB/srpt: Fix login-related race conditions")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:55:32 -06:00
Leon Romanovsky 05f58ceba1 RDMA/mlx5: Check that supplied blue flame index doesn't overflow
User's supplied index is checked again total number of system pages, but
this number already includes num_static_sys_pages, so addition of that
value to supplied index causes to below error while trying to access
sys_pages[].

BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
Read of size 4 at addr ffff880065561904 by task syz-executor446/314

CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xef/0x17e
 print_address_description+0x83/0x3b0
 kasan_report+0x18d/0x4d0
 bfregn_to_uar_index+0x34f/0x400
 create_user_qp+0x272/0x227d
 create_qp_common+0x32eb/0x43e0
 mlx5_ib_create_qp+0x379/0x1ca0
 create_qp.isra.5+0xc94/0x22d0
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0xc2c/0x1010
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433679
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006

Allocated by task 314:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x1a9/0x510
 mlx5_ib_alloc_ucontext+0x966/0x2620
 ib_uverbs_get_context+0x23f/0xa60
 ib_uverbs_write+0xc2c/0x1010
 __vfs_write+0x10d/0x720
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1:
 __kasan_slab_free+0x12e/0x180
 kfree+0x159/0x630
 kvfree+0x37/0x50
 single_release+0x8e/0xf0
 __fput+0x2d8/0x900
 task_work_run+0x102/0x1f0
 exit_to_usermode_loop+0x159/0x1c0
 do_syscall_64+0x408/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff880065561100
 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2052 bytes inside of
 4096-byte region [ffff880065561100, ffff880065562100)
The buggy address belongs to the page:
page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 1ee47ab3e8 ("IB/mlx5: Enable QP creation with a given blue flame index")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Leon Romanovsky ffaf58def0 RDMA/mlx5: Melt consecutive calls to alloc_bfreg() in one call
There is no need for three consecutive calls to alloc_bfreg(). It can be
implemented with one function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:54:50 -06:00
Raju Rangoju 65ca8d9670 rdma/cxgb4: Add support for 64Byte cqes
This patch adds support for iw_cxb4 to extend cqes from existing 32Byte
size to 64Byte.

Also includes adds backward compatibility support (for 32Byte) to work
with older libraries.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-13 11:52:55 -06:00
Bart Van Assche 15039efadd hns: Remove a set-but-not-used variable
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:21:52 -06:00
Bart Van Assche 522628ed1a IB/hfi1: Suppress a compiler warning
Avoid that the following compiler warning is reported when building
with gcc 8:

drivers/infiniband/hw/hfi1/verbs.c:1896:2: warning: 'strncpy' output may be truncated copying 64 bytes from a string of length 64 [-Wstringop-truncation]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:20:45 -06:00
Kamal Heib d63c46734c RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
to free the allocated srq.

Fixes: c2b37f7648 ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:16:13 -06:00
oulijun e8e8b65224 RDMA/hns: Update the implementation of set_mac
This patch updates the implementation of set_mac by using
command queue instead of directly writing registers.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:09:25 -06:00
oulijun 4db134a3dd RDMA/hns: Update the implementation of set_gid
This patch updates the implementation of set_gid by using
command queue instead of directly writing registers.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:09:25 -06:00
oulijun ded58ff987 RDMA/hns: Add TPQ link table support
In hip08, the TPQ(Timer Poll Queue) should be extended
to host memory. This patch adds the support of TPQ.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:09:25 -06:00
oulijun 6b63597d35 RDMA/hns: Add TSQ link table support
In hip08, TSQ(Transport Service Queue) should be extended
to host memory to store the doorbells. This patch adds the
support of creating TSQ, and then configured to the hardware.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:09:25 -06:00
oulijun 0576cbde14 RDMA/hns: Fix endian conversions and annotations
This patch removes the warnings reported by sparse.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 14:09:25 -06:00
Bart Van Assche beae9eb555 RDMA/ocrdma: Make ocrdma_destroy_qp() easier to analyze
This patch does not change any functionality but avoids that sparse
reports the following:

drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1818:31: warning: context imbalance in 'ocrdma_destroy_qp' - different lock contexts for basic block

Compile-tested only.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Cc: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 12:12:57 -06:00
Arnd Bergmann 07f3355df7 infiniband: i40iw, nes: don't use wall time for TCP sequence numbers
The nes infiniband driver uses current_kernel_time() to get a nanosecond
granunarity timestamp to initialize its tcp sequence counters. This is
one of only a few remaining users of that deprecated function, so we
should try to get rid of it.

Aside from using a deprecated API, there are several problems I see here:

- Using a CLOCK_REALTIME based time source makes it predictable in
  case the time base is synchronized.
- Using a coarse timestamp means it only gets updated once per jiffie,
  making it even more predictable in order to avoid having to access
  the hardware clock source
- The upper 2 bits are always zero because the nanoseconds are at most
  999999999.

For the Linux TCP implementation, we use secure_tcp_seq(), which appears
to be appropriate here as well, and solves all the above problems.

i40iw uses a variant of the same code, so I do that same thing there
for ipv4. Unlike nes, i40e also supports ipv6, which needs to call
secure_tcpv6_seq instead.

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11 12:10:19 -06:00
Bart Van Assche 59b851dbf7 RDMA/nes: Avoid complaints about unused variables
Avoid that the compiler reports the following when building with W=1:

drivers/infiniband/hw/nes/nes_utils.c: In function 'nes_arp_table':
drivers/infiniband/hw/nes/nes_utils.c:689:9: warning: variable 'tmp_addr' set but not used [-Wunused-but-set-variable]
  __be32 tmp_addr;
         ^~~~~~~~
drivers/infiniband/hw/nes/nes_hw.c: In function 'flush_wqes':
drivers/infiniband/hw/nes/nes_hw.c:3840:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
  int ret;
      ^~~
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
drivers/infiniband/hw/nes/nes_verbs.c:811:6: warning: variable 'pbl_entries' set but not used [-Wunused-but-set-variable]
  u32 pbl_entries;
      ^~~~~~~~~~~
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_dereg_mr':
drivers/infiniband/hw/nes/nes_verbs.c:2487:6: warning: variable 'minor_code' set but not used [-Wunused-but-set-variable]
  u16 minor_code;
      ^~~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'mini_cm_recv_pkt':
drivers/infiniband/hw/nes/nes_cm.c:2570:20: warning: variable 'tmp_saddr' set but not used [-Wunused-but-set-variable]
  __be32 tmp_daddr, tmp_saddr;
                    ^~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c:2570:9: warning: variable 'tmp_daddr' set but not used [-Wunused-but-set-variable]
  __be32 tmp_daddr, tmp_saddr;
         ^~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'cm_event_connected':
drivers/infiniband/hw/nes/nes_cm.c:3578:22: warning: variable 'raddr' set but not used [-Wunused-but-set-variable]
  struct sockaddr_in *raddr;
                      ^~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'cm_event_reset':
drivers/infiniband/hw/nes/nes_cm.c:3753:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
  int ret;
      ^~~

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 21:01:29 -06:00
Jason Gunthorpe 23ff6ba8fe RDMA/cxgb4: Restore the dropped uninitialized_var
In some configurations even gcc 7 cannot unravel this complexity and still
throws a warning.

Fixes: 4ab39e2f98 ("RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 20:58:28 -06:00
Yishai Hadas 528922afd4 IB: Enable uverbs_destroy_def_handler to be used by drivers
Enable uverbs_destroy_def_handler to be used by drivers and replace
current code to use it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 11:52:06 -06:00
Jan Dakinevich 781a4016be ib_srpt: use kvmalloc to allocate ring pointers
An array of pointers to SRPT contexts in ib_device is over 30KiB even
in default case, in which an amount of contexts is 4095. The patch
is intended to weed out large contigous allocation for non-DMA memory.

Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 11:13:05 -06:00
Artemy Kovalyov 8942acea37 IB/uverbs: Pass IB_UVERBS_QPF_GRH_REQUIRED to user space
Userspace also needs to know if the port requires GRHs to properly form
the AVs it creates.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-10 11:13:04 -06:00
Artemy Kovalyov b02289b3d6 RDMA: Validate grh_required when handling AVs
Extend the existing grh_required flag to check when AV's are handled that
a GRH is present.

Since we don't want to do query_port during the AV checks for performance
reasons move the flag into the immutable_data.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10 11:13:04 -06:00
Jason Gunthorpe 958200ad8e RDMA/hfi1: Move grh_required into update_sm_ah
grh_required is intended to be a global setting where all AV's will
require a GRH, not just the sm_lid. Move the special logic to the creation
of the SM AH.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-10 11:13:04 -06:00
Jason Gunthorpe 2f944c0fbf RDMA: Fix storage of PortInfo CapabilityMask in the kernel
The internal flag IP_BASED_GIDS was added to a field that was being used
to hold the port Info CapabilityMask without considering the effects this
will have. Since most drivers just use the value from the HW MAD it means
IP_BASED_GIDS will also become set on any HW that sets the IBA flag
IsOtherLocalChangesNoticeSupported - which is not intended.

Fix this by keeping port_cap_flags only for the IBA CapabilityMask value
and store unrelated flags externally. Move the bit definitions for this to
ib_mad.h to make it clear what is happening.

To keep the uAPI unchanged define a new set of flags in the uapi header
that are only used by ib_uverbs_query_port_resp.port_cap_flags which match
the current flags supported in rdma-core, and the values exposed by the
current kernel.

Fixes: b4a26a2728 ("IB: Report using RoCE IP based gids in port caps")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-10 11:06:45 -06:00
Kamal Heib 3fda243245 RDMA/ipoib: Fix return code from ipoib_cm_dev_init
The proper return code is -EOPNOTSUPP and not -ENOSYS when the function
isn't supported, also make sure to return the right error code
from ipoib_transport_dev_init() when ipoib_cm_dev_init() is supported.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 15:19:08 -06:00
Parav Pandit 07e7056aff IB/core: Simplify check for RoCE route resolve
roce_resolve_route_from_path() resolves the route based on the netdevice
of the GID attribute, therefore there is no point in checking again if
the route is resolved matches the same interface it arrived.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 15:11:53 -06:00
Alexander Duyck 4f49dec907 net: allow ndo_select_queue to pass netdev
This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09 13:41:34 -07:00
Parav Pandit 921c0f5ba5 IB/mlx5: Honor cnt_set_id_valid flag instead of set_id
It is incorrect to depend on set_id value to know if counters were
allocated or not. set_id_valid field is set to true when counters
were allocated. Therefore, use set_id_valid while deciding to
free counters.

Cc: <stable@vger.kernel.org> # 4.15
Fixes: aac4492ef2 ("IB/mlx5: Update counter implementation for dual port RoCE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:30:25 -06:00
Leon Romanovsky e3f1ed1f5a RDMA/mlx5: Remove unused port number parameter
Clean up a little bit code to drop unused port_num parameter.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:24:21 -06:00
Jason Gunthorpe 97202bbe22 IB/uverbs: Do not use uverbs_cmd_mask in the ioctl path
Instead we are now checking the function pointers directly. Get rid of
both cases in ioctl and drop the nonsense idea that destroy can fail.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 13:19:01 -06:00
Jann Horn 60e6627f12 IB/mlx5: fix uaccess beyond "count" in debugfs read/write handlers
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.

In this case, the affected files are in debugfs (and should therefore only
be accessible to root), and the read handlers check that *pos is zero
(meaning that at least sys_splice() can't trigger kernel memory
corruption). Because of the root requirement, this is not a security fix,
but rather a cleanup.

For the read handlers, fix it by using simple_read_from_buffer() instead
of custom logic. Add min() calls to the write handlers.

Fixes: 4a2da0b8c0 ("IB/mlx5: Add debug control parameters for congestion control")
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:15:12 -06:00
Bart Van Assche 222c7b1fd4 RDMA/rw: Fix rdma_rw_ctx_signature_init() kernel-doc header
Fixes: 0e353e34e1 ("IB/core: add RW API support for signature MRs")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:07:23 -06:00
Bart Van Assche 4ab39e2f98 RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze
Introduce the function __c4iw_poll_cq_one() such that c4iw_poll_cq_one()
becomes easier to analyze for static source code analyzers. This patch
avoids that sparse reports the following:

drivers/infiniband/hw/cxgb4/cq.c:401:36: warning: context imbalance in 'c4iw_flush_hw_cq' - unexpected unlock
drivers/infiniband/hw/cxgb4/cq.c:824:9: warning: context imbalance in 'c4iw_poll_cq_one' - different lock contexts for basic block

Compile-tested only.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 13:07:23 -06:00
Bart Van Assche cbd8e988eb RDMA/cxgb3: Make iwch_poll_cq_one() easier to analyze
Introduce the function __iwch_poll_cq_one() to make iwch_poll_cq_one()
easier to analyze for static source code analyzers. This patch avoids
that sparse reports the following:

drivers/infiniband/hw/cxgb3/iwch_cq.c:187:9: warning: context imbalance in 'iwch_poll_cq_one' - different lock contexts for basic block

Compile-tested only.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:55:28 -06:00
Bart Van Assche 2f229bcf25 RDMA/rxe: Simplify the error handling code in rxe_create_ah()
This patch not only simplifies the error handling code in rxe_create_ah()
but also removes the dead code that was left behind by commit 47ec386662
("RDMA: Convert drivers to use sgid_attr instead of sgid_index").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:55:28 -06:00
Bart Van Assche efdbda81d9 IB/iser: Remove set-but-not-used variables
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:11:22 -06:00
Bart Van Assche aa9d5ffbb7 RDMA/ocrdma: Remove a set-but-not-used variable
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:11:22 -06:00
Bart Van Assche 4c5743bc4f IB/nes: Fix a compiler warning
Avoid that the following compiler warning is reported when building with
W=1:

drivers/infiniband/hw/nes/nes_hw.c:646:51: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:11:22 -06:00
Bart Van Assche f8c2d2280c RDMA/core: Remove set-but-not-used variables
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:11:22 -06:00
Bart Van Assche 28e39894ed RDMA/core: Remove ib_find_cached_gid() and ib_find_cached_gid_by_port()
Remove these two functions since all their callers have been removed.
See also commit ea8c2d8f60 ("RDMA/core: Remove unused ib cache
functions").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:11:22 -06:00
Kamal Heib b1b639708f RDMA/ipoib: Fix use of sizeof()
Make sure to use sizeof(...) instead of sizeof ... which is more
preferred.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:02:42 -06:00
Kamal Heib 0578cdad19 RDMA/ipoib: Prefer unsigned int to bare use of unsigned
This commit replaces all the unsigned definitions in favour of 'unsigned
int' which is preferred.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:02:42 -06:00
Kamal Heib 299c36b1ef RDMA/ipoib: Use min_t() macro instead of min()
Use min_t() macro to avoid the casting when using min() macro, also fix
the type of "length" and "wc->byte_len" to be "unsigned int" and
"u32" which is the right type for each one of them.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 12:02:42 -06:00
Håkon Bugge 87a37ce9e4 IB/cm: Remove unused and erroneous msg sequence encoding
In cm_form_tid(), a two bit message sequence number is OR'ed into bit
31-30 of the lower TID value.

After commit f06d265375 ("IB/cm: Randomize starting comm ID"), the
local_id is XOR'ed with a 32-bit random value. Hence, bit 31-30 in the
lower TID now has an arbitrarily value and it makes no sense to OR in
the message sequence number.

Adding to that, the evolution in use of IDR routines in cm_alloc_id()
has always had the possibility of returning a value with bit 30 set.

In addition, said bits are never checked.

Hence, remove the encoding and the corresponding enum.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 11:39:28 -06:00
Jason Gunthorpe 76bc79ccce IB/uverbs: Replace ib_ucq_object uverbs_file with the one in ib_uobject
Now that ib_uobject has a ib_uverbs_file we don't need this extra one in
ib_ucq_object.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe d0259e82e7 IB/uverbs: Remove ib_uobject_file
The only purpose for this structure was to hold the ib_uobject_file
pointer, but now that is part of the standard ib_uobject the structure
no longer makes any sense, so get rid of it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe 6f258884dd IB/uverbs: Tidy up remaining references to ucontext
Unnecessary clutter, to indirect through ucontext when the ufile would do.
Generally most of the code code should only be working with ufile, except
for a few places that touch the driver interface.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe 2cc1e3b809 IB/uverbs: Replace file->ucontext with file in uverbs_cmd.c
The ucontext isn't needed any more, just pass the uverbs_file directly.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe 6ef1c82821 IB/uverbs: Replace ib_ucontext with ib_uverbs_file in core function calls
The correct handle to refer to the idr/etc is ib_uverbs_file, revise all
the core APIs to use this instead. The user API are left as wrappers
that automatically convert a ucontext to a ufile for now.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe 6a5e9c8841 IB/uverbs: Move non driver related elements from ib_ucontext to ib_ufile
The IDR is part of the ib_ufile so all the machinery to lock it, handle
closing and disassociation rightly belongs to the ufile not the ucontext.

This changes the lifetime of that data to match the lifetime of the file
descriptor which is always strictly longer than the lifetime of the
ucontext.

We need the entire locking machinery to continue to exist after ucontext
destruction to allow us to return the destroy data after a device has been
disassociated.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe c33e73af21 IB/uverbs: Add a uobj_perform_destroy helper
This consolidates a bunch of repeated code patterns into a helper.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-09 11:26:17 -06:00
Jason Gunthorpe 422e3d37ed RDMA/uverbs: Combine MIN_SZ_OR_ZERO with UVERBS_ATTR_STRUCT
After all the rework is done it is now possible to include single flags in
the type macros. Any user of UVERBS_ATTR_STRUCT needs to zero check data
past the end of the known struct to be correct, so make this mandatory,
and get rid of MIN_SZ_OR_ZERO as a user flag.

This changes UVERBS_ATTR_TYPE to refer to a struct of exact size with not
possibility of extension, convert the few users of UVERBS_ATTR_TYPE and
MIN_SZ_OR_ZERO to use UVERBS_ATTR_STRUCT.

The one user of UVERBS_ATTR_STRUCT without MIN_SZ_OR_ZERO is just
confused. There is some padding at the end of that struct, but userspace
always provides it with the padding. The construction doesn't test if the
padding is zero, so it is pointless. Just use UVERBS_ATTR_TYPE.

Finally, rename min_sz_or_zero to zero_trailing to better reflect what it
does and hopefully avoid such mis-uses in the future.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 540cd69209 RDMA/uverbs: Use UVERBS_ATTR_MIN_SIZE correctly and uniformly
This newer macro allows specifying a lower bound on the accepted size, and
has an 'unlimited' upper bound. Due to this it never checks for trailing
zeroing so it doesn't make any sense to combine it with MIN_SZ_OR_ZERO, so
drop MIN_SZ_OR_ZERO when they are used together

There were a couple of places that open coded this pattern, switch them to
use the clearer UVERBS_ATTR_MIN_SIZE for clarity.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 83bb444233 RDMA/uverbs: Remove UA_FLAGS
This bit of boilerplate isn't really necessary, we can use bitfields
instead of a flags enum and the macros can then individually initialize
them through the __VA_ARGS__ like everything else.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 9a119cd597 RDMA/uverbs: Get rid of the & in method specifications
Hide it inside the macros. The & is confusing and interferes with using
this as a generic DSL in later patches.

Since this also touches almost every line, also run the specs through
clang-format (with 'BinPackParameters: false') to make the maintenance
easier.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 6c61d2a55c RDMA/uverbs: Simplify UVERBS_OBJECT and _TREE family of macros
Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_oject_tree_def and
associated objects list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 595c7736d4 RDMA/uverbs: Simplify method definition macros
Instead of the large set of indirecting macros, define the few needed
macros to directly instantiate the struct uverbs_method_def and associated
attributes list.

This is small amount of code duplication but the readability is far
better.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe d108dac080 RDMA/uverbs: Simplify UVERBS_ATTR family of macros
Instead of using a complex cascade of macros, just directly provide the
initializer list each of the declarations is trying to create.

Now that the macros are simplified this also reworks the uverbs_attr_spec
to be friendly to older compilers by eliminating any unnamed
structures/unions inside, and removing the duplication of some fields. The
structure size remains at 16 bytes which was the original motivation for
some of this oddness.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe ad544cfe54 RDMA/uverbs: Split UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE
Two methods are sharing the same attribute constant, but the attribute
definitions are not the same. This should not have been done, instead
split them into two attributes with the same number.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 87fc2a620a RDMA/uverbs: Store the specs_root in the struct ib_uverbs_device
The specs are required to operate the uverbs file, so they belong inside
the ib_uverbs_device, not inside the ib_device. The spec passed in the
ib_device is just a communication from the driver and should not be used
during runtime.

This also changes the lifetime of the spec memory to match the
ib_uverbs_device, however at this time the spec_root can still contain
driver pointers after disassociation, so it cannot be used if ib_dev is
NULL. This is preparation for another series.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 13:47:01 -06:00
Jason Gunthorpe 8193abb6a8 Merge branch 'mlx5-dump-fill-mkey' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Dump and fill MKEY from Leon Romanovsky:

====================
MLX5 IB HCA offers the memory key, dump_fill_mkey to increase performance,
when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access, while
keeping track of the processed length in the ibv_sge handling.

In this three patch series, we expose various bits in our HW spec
file (mlx5_ifc.h), move unneeded for mlx5_core FW command and export such
memory key to user space thought our mlx5-abi header file.
====================

Botched auto-merge in mlx5_ib_alloc_ucontext() resolved by hand.

* branch 'mlx5-dump-fill-mkey':
  IB/mlx5: Expose dump and fill memory key
  net/mlx5: Add hardware definitions for dump_fill_mkey
  net/mlx5: Limit scope of dump_fill_mkey function
  net/mlx5: Rate limit errors in command interface

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 13:23:46 -06:00
Yonatan Cohen 25bb36e75d IB/mlx5: Expose dump and fill memory key
MLX5 IB HCA offers the memory key, dump_fill_mkey to boost
performance, when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access,
while keeping track of the processed length in the ibv_sge handling.

Meaning, instead of a PCI write access the HCA leaves the target
memory untouched, and skips filling that packet section. Similar
behavior is done upon send, the HCA skips data in memory relevant
to this key and saves PCI bus access.

This functionality saves PCI read/write operations.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 13:16:04 -06:00
Yonatan Cohen 4d4fb5dc98 net/mlx5: Limit scope of dump_fill_mkey function
mlx5_core_dump_fill_mkey() is going to be used in next
patch in IB and doesn't need to be visible to whole
mlx5_core. Move that command to mlx5_ib.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 21:51:07 +03:00
Dan Carpenter c1dfc0114c RDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.c
The srq->swq[] is allocated in bnxt_qplib_create_srq().  It has
srq->hwq.max_elements elements so these tests should be > instead of >=
or we might go beyond the end of the array.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 12:06:26 -06:00
Dan Carpenter 474e5a8606 RDMA/bnxt_re: Fix a couple off by one bugs
The sgid_tbl->tbl[] array is allocated in bnxt_qplib_alloc_sgid_tbl().
It has sgid_tbl->max elements.  So the > should be >= to prevent
accessing one element beyond the end of the array.

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 12:06:26 -06:00
Dan Carpenter c2d7c8ff89 IB/core: type promotion bug in rdma_rw_init_one_mr()
"nents" is an unsigned int, so if ib_map_mr_sg() returns a negative
error code then it's type promoted to a high unsigned int which is
treated as success.

Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 12:05:28 -06:00
Leon Romanovsky 5d9a2b0e28 RDMA/i40w: Hold read semaphore while looking after VMA
VMA lookup is supposed to be performed while mmap_sem is held.

Fixes: f26c7c8339 ("i40iw: Add 2MB page support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 11:51:06 -06:00
Tarick Bedeir f1228867ad IB/mlx4: Test port number before querying type.
rdma_ah_find_type() can reach into ib_device->port_immutable with a
potentially out-of-bounds port number, so check that the port number is
valid first.

Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Tarick Bedeir <tarick@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 11:48:27 -06:00
Neil Horman 11e40f5c57 vmw_pvrdma: Release netdev when vmxnet3 module is removed
On repeated module load/unload cycles, its possible for the pvrmda driver
to encounter this crash:

...
[  297.032448] RIP: 0010:[<ffffffff839e4620>]  [<ffffffff839e4620>] netdev_walk_all_upper_dev_rcu+0x50/0xb0
[  297.034078] RSP: 0018:ffff95087780bd08  EFLAGS: 00010286
[  297.034986] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff95087a0c0000
[  297.036196] RDX: ffff95087a0c0000 RSI: ffffffff839e44e0 RDI: ffff950835d0c000
[  297.037421] RBP: ffff95087780bd40 R08: ffff95087a0e0ea0 R09: abddacd03f8e0ea0
[  297.038636] R10: abddacd03f8e0ea0 R11: ffffef5901e9dbc0 R12: ffff95087a0c0000
[  297.039854] R13: ffffffff839e44e0 R14: ffff95087a0c0000 R15: ffff950835d0c828
[  297.041071] FS:  0000000000000000(0000) GS:ffff95087fc00000(0000) knlGS:0000000000000000
[  297.042443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  297.043429] CR2: ffffffffffffffe8 CR3: 000000007a652000 CR4: 00000000003607f0
[  297.044674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  297.045893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  297.047109] Call Trace:
[  297.047545]  [<ffffffff839e4698>] netdev_has_upper_dev_all_rcu+0x18/0x20
[  297.048691]  [<ffffffffc05d31af>] is_eth_port_of_netdev+0x2f/0xa0 [ib_core]
[  297.049886]  [<ffffffffc05d3180>] ? is_eth_active_slave_of_bonding_rcu+0x70/0x70 [ib_core]
...

This occurs because vmw_pvrdma on probe stores a pointer to the netdev
that exists on function 0 of the same bus/device/slot (which represents
the vmxnet3 ethernet driver).  However, it never removes this pointer if
the vmxnet3 module is removed, leading to crashes resulting from use after
free dereferencing incidents like the one above.

The fix is pretty straightforward.  vmw_pvrdma should listen for
NETDEV_REGISTER and NETDEV_UNREGISTER events in its event listener code
block, and update the stored netdev pointer accordingly.  This solution
has been tested by myself and the reporter with successful results.  This
fix also allows the pvrdma driver to find its underlying ethernet device
in the event that vmxnet3 is loaded after pvrdma, which it was not able to
do before.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: ruquin@redhat.com
Tested-by: Adit Ranadive <aditr@vmware.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 15:52:31 -06:00
Maor Gottlieb a93b632c45 IB/mlx5: Fix GRE flow specification
Currently the driver sets the mask of the gre_protocol to 0xffff
without consideration in the user request.

Fix it by copy the mask from the verbs spec.

Fixes: da2f22ae77 ("IB/mlx5: Add support for GRE flow specification")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 15:46:07 -06:00
Michael J. Ruhl e3091644bf IB/hfi1: Remove incorrect call to do_interrupt callback
The general interrupt handler is_rcv_avail_int() has two paths,
do_interrupt() (callback) and handle_user_interrupt().  The
do_interrupt() callback is for the threaded receive handling.
is_rcv_avail_int() cannot handle threaded IRQs.

If the do_interrupt() path is taken, and the IRQ returns
IRQ_WAKE_THREAD, the IRQ behavior will be indeterminate.

Remove incorrect call to do_interrupt() from is_rcv_avail_int(),
leaving the un-threaded (handle_user_interrupt()) path.

Fixes: f4f30031c3 ("staging/rdma/hfi1: Thread the receive interrupt.")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:29:12 -06:00
Michael J. Ruhl d108c60d3d IB/hfi1: Set in_use_ctxts bits for user ctxts only
The in_use_ctxts bitmask is for user receive contexts only.  Setting it for
any other type of receive context is incorrect.

Move initial set of in_use_ctxts bits from the general context init to the
user context specific init. Having this bit set can allow contexts to be
incorrectly identified by some IRQ handlers. This will allow
handle_user_interrupt() will now filter user contexts correctly.

Clean up redundant is_rcv_urgent_int() user context check.

A follow on patch will clean up an incorrect code path in the
is_rcv_avail_int().

Fixes: 8737ce95c4 ("IB/hfi1: Fix an assign/ordering issue with shared context IDs")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:29:12 -06:00
Bart Van Assche 14d15c2b27 ib_srpt: Fix a use-after-free in __srpt_close_all_ch()
BUG: KASAN: use-after-free in srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
Read of size 4 at addr ffff8801269d23f8 by task check/29726

CPU: 4 PID: 29726 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_set_enabled+0x1a9/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f235cfe6154

Fixes: aaf45bd83e ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:14:21 -06:00
Bart Van Assche 995250959d ib_srpt: Fix a use-after-free in srpt_close_ch()
Avoid that KASAN reports the following:

BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt]
Read of size 4 at addr ffff880151180cb8 by task check/4681

CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xa4/0xf5
 print_address_description+0x6f/0x270
 kasan_report+0x241/0x360
 __asan_load4+0x78/0x80
 srpt_close_ch+0x4f/0x1b0 [ib_srpt]
 srpt_set_enabled+0xf7/0x1e0 [ib_srpt]
 srpt_tpg_enable_store+0xb8/0x120 [ib_srpt]
 configfs_write_file+0x14e/0x1d0 [configfs]
 __vfs_write+0xd2/0x3b0
 vfs_write+0x101/0x270
 ksys_write+0xab/0x120
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x77/0x230
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: aaf45bd83e ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:14:21 -06:00
Bart Van Assche 7496a511a0 IB/mlx5: Remove set-but-not-used variables
Avoid that the compiler complains about set-but-not-used variables when
building with W=1. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:14:21 -06:00
Bart Van Assche af7b641ed4 IB/srp: Remove driver version and release data information
Remove the driver version and release date information because such
information is not relevant for an upstream driver. See also commit
e1267b0124 ("RDMA: Remove useless MODULE_VERSION").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:14:21 -06:00
Leon Romanovsky fe48aecb4d RDMA/uverbs: Don't fail in creation of multiple flows
The conversion from offsetof() calculations to sizeof()
wrongly behaved for missed exact size and in scenario with
more than one flow.

In such scenario we got "create flow failed, flow 10: 8 bytes
left from uverb cmd" error, which is wrong because the size of
kern_spec is exactly 8 bytes, and we were not supposed to fail.

Cc: <stable@vger.kernel.org> # 3.12
Fixes: 4fae7f1704 ("RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow")
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-03 14:08:00 -06:00
Bart Van Assche aa090eabcb scsi: target: Remove second argument from fabric_make_tpg()
Since most target drivers do not use the second fabric_make_tpg() argument
("group") and since it is trivial to derive the group pointer from the wwn
pointer, do not pass the group pointer to fabric_make_tpg().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-02 16:44:32 -04:00
Yishai Hadas 1c77483e4c IB: Improve uverbs_cleanup_ucontext algorithm
Improve uverbs_cleanup_ucontext algorithm to work properly when the
topology graph of the objects cannot be determined at compile time.  This
is the case with objects created via the devx interface in mlx5.

Typically uverbs objects must be created in a strict topologically sorted
order, so that LIFO ordering will generally cause them to be freed
properly. There are only a few cases (eg memory windows) where objects can
point to things out of the strict LIFO order.

Instead of using an explicit ordering scheme where the HW destroy is not
allowed to fail, go over the list multiple times and allow the destroy
function to fail. If progress halts then a final, desperate, cleanup is
done before leaking the memory. This indicates a driver bug.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 14:35:46 -06:00
Bart Van Assche e620ebfc22 IB/srpt: Support HCAs with more than two ports
Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:

    WARN_ON(sdev->device->phys_port_cnt > ARRAY_SIZE(sdev->port));

Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 14:35:45 -06:00
Sagi Grimberg 68348441ef IB/iser: set can_queue earlier to allow setting higher queue depth
We need to set can_queue earlier than when enabling the scsi host.
in a blk-mq enabled environment, the tagset allocation is taken
from can_queue which cannot be modified later. Also, pass an updated
.can_queue to iscsi_session_setup to have enough iscsi tasks allocated
in the session kfifo.

Reported-by: Karandeep Chahal <karandeepchahal@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 12:26:13 -06:00
Vijay Immanuel 24c937b39d IB/rxe: don't clear the tx queue on every transfer
Do not call sk_dst_set() on every packet transfer because
that calls sk_tx_queue_clear(), which clears the tx queue.
A QP must stay on the same tx queue to maintain packet order.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 12:26:13 -06:00
Jason Gunthorpe 92ebb6a0a1 IB/cm: Remove now useless rcu_lock in dst_fetch_ha
This lock used to be protecting a call to dst_get_neighbour_noref,
however the below commit changed it to dst_neigh_lookup which no longer
requires rcu.

Access to nud_state, neigh_event_send or rdma_copy_addr does not require
RCU, so delete the lock.

Fixes: 02b619555a ("infiniband: Convert dst_fetch_ha() over to dst_neigh_lookup().")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-29 11:45:50 -06:00
Leon Romanovsky 1517799965 RDMA/mlx5: Don't leak UARs in case of free fails
The failure in releasing one UAR doesn't mean that we can't continue to
release rest of system pages, so don't return too early.

As part of cleanup, there is no need to print warning if
mlx5_cmd_free_uar() fails because such warning will be printed as part of
mlx5_cmd_exec().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-29 11:45:50 -06:00
David S. Miller 04c6faa175 mlx5-fixes-2018-06-26
Fixes for mlx5 core and netdev driver:
 
 Two fixes from Alex Vesker to address command interface issues
  - Race in command interface polling mode
  - Incorrect raw command length parsing
 
 From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.
 
 From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
 Eswitch capability is not advertised for the PF host driver
     - Fix required capability for manipulating MPFS
     - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
     - Avoid dealing with vport IB/eth representors if not being e-switch manager
     - E-Switch, Avoid setup attempt if not being e-switch manager
     - Don't attempt to dereference the ppriv struct if not being eswitch manager
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbMtMEAAoJEEg/ir3gV/o+NjAIAMGrerpwg8ADBj+b9tSWm4WV
 2yAJ561kBObwhA+uDJtH7mGUO3+AnkcWz9vynGqFdmkOikUcbpPkBb9D+rmFbkX2
 E585pwR3pH7lEzYEG4xO6SwuQcQ4OytFNxz94AT6CgNEXqrmbrD7A5Vsgk265yZq
 pJzL1OVfkXKOtb2x5PpCOh19/28OxAzyMQfoklsE2Wn7j8/2RWX0UUDxuF8jS+He
 9loaurT4Fsfo5JYE+o+k38knHFBkdTUZBD9/bZrtaMcrD68bZdJTpZm6eYwRXW3S
 7J88SmH/xTy74f1KY4qf0JOTxnaWtm/r4YaCXf1QD05W2/U9FQpIW1ipMKH51vk=
 =te2H
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2018-06-26

Fixes for mlx5 core and netdev driver:

Two fixes from Alex Vesker to address command interface issues
 - Race in command interface polling mode
 - Incorrect raw command length parsing

From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.

From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
Eswitch capability is not advertised for the PF host driver
    - Fix required capability for manipulating MPFS
    - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
    - Avoid dealing with vport IB/eth representors if not being e-switch manager
    - E-Switch, Avoid setup attempt if not being e-switch manager
    - Don't attempt to dereference the ppriv struct if not being eswitch manager
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 16:21:35 +09:00
Yuval Shaia 4e1077f720 RDMA/vmw_pvrdma: Delete unused function
This function is not in use - delete it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-27 15:08:49 -06:00
Jason Gunthorpe 7a5c938b9e IB/core: Check for rdma_protocol_ib only after validating port_num
port_num is untrusted data from the user, so it should be checked after
calling fill_sgid_attr, which validates it.

Fixes: 8d9ec9addd ("IB/core: Add a sgid_attr pointer to struct rdma_ah_attr")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-27 15:05:58 -06:00
Or Gerlitz aff2252a2a IB/mlx5: Avoid dealing with vport representors if not being e-switch manager
In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the switchdev mode representors are running on
the embedded cpu (EC) and not at the host.

As such, we should avoid dealing with vport representors if
not being esw manager.

Fixes: b5ca15ad7e ('IB/mlx5: Add proper representors support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:30 -07:00
Jason Gunthorpe 5e62d5ff1b IB/mlx4: Create slave AH's directly
Since slave GID's do not exist in the core gid table we can no longer use
the core code to help do this without creating inconsistencies. Directly
create the AH using mlx4 internal APIs.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-26 14:37:26 -06:00
Leon Romanovsky d9c44040ed RDMA/uverbs: Remove redundant check
kern_spec->reserved is checked prior to calling
kern_spec_to_ib_spec_filter() which makes this second check redundant.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-26 14:37:26 -06:00
Leon Romanovsky 3a2e791c94 RDMA/umem: Don't check for a negative return value of dma_map_sg_attrs()
dma_map_sg_attrs() returns 0 on error and can't return a negative number
(ensured by BUG_ON), so don't check.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-26 14:37:26 -06:00
Leon Romanovsky a5cc9831af RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
Number of specs is provided by user and in valid case can be equal to zero.
Such argument causes to call to kcalloc() with zero-length request and in
return the ZERO_SIZE_PTR is assigned. This pointer is different from NULL
and makes various if (..) checks to success.

Fixes: b6ba4a9aa5 ("IB/uverbs: Add support for flow counters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-26 14:37:26 -06:00
Michael J. Ruhl b697d7d8c7 IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL.
All of the relevant call sites look for IS_ERR, so the NULL return would
lead to a NULL pointer exception.

Do not use the ERR_PTR mechanism for this function.

Update all call sites to handle the return value correctly.

Clean up error paths to reflect return value.

Fixes: 45842abbb2 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org> # 4.9.x+
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-26 14:35:55 -06:00
Leon Romanovsky 4fae7f1704 RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
The check of cmd.flow_attr.size should check into account the size of the
reserved field (2 bytes), otherwise user can provide a size which will
cause a slab-out-of-bounds warning below.

==================================================================
BUG: KASAN: slab-out-of-bounds in ib_uverbs_ex_create_flow+0x1740/0x1d00
Read of size 2 at addr ffff880068dff1a6 by task syz-executor775/269

CPU: 0 PID: 269 Comm: syz-executor775 Not tainted 4.18.0-rc1+ #245
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0xef/0x17e
 print_address_description+0x83/0x3b0
 kasan_report+0x18d/0x4d0
 ib_uverbs_ex_create_flow+0x1740/0x1d00
 ib_uverbs_write+0x923/0x1010
 __vfs_write+0x10d/0x720
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433899
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d
89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66
2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc2724db58 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000020006880 RCX: 0000000000433899
RDX: 00000000000000e0 RSI: 0000000020002480 RDI: 0000000000000003
RBP: 00000000006d7018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000

R13: 000000000040cd20 R14: 000000000040cdb0 R15: 0000000000000006

Allocated by task 269:
 kasan_kmalloc+0xa0/0xd0
 __kmalloc+0x1a9/0x510
 ib_uverbs_ex_create_flow+0x26c/0x1d00
 ib_uverbs_write+0x923/0x1010
 __vfs_write+0x10d/0x720
 vfs_write+0x1b0/0x550
 ksys_write+0xc6/0x1a0
 do_syscall_64+0xa7/0x590
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 0:
 __kasan_slab_free+0x12e/0x180
 kfree+0x159/0x630
 detach_buf+0x559/0x7a0
 virtqueue_get_buf_ctx+0x3cc/0xab0
 virtblk_done+0x1eb/0x3d0
 vring_interrupt+0x16d/0x2b0
 __handle_irq_event_percpu+0x10a/0x980
 handle_irq_event_percpu+0x77/0x190
 handle_irq_event+0xc6/0x1a0
 handle_edge_irq+0x211/0xd80
 handle_irq+0x3d/0x60
 do_IRQ+0x9b/0x220

The buggy address belongs to the object at ffff880068dff180
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 38 bytes inside of
 64-byte region [ffff880068dff180, ffff880068dff1c0)
The buggy address belongs to the page:
page:ffffea0001a37fc0 count:1 mapcount:0 mapping:ffff88006c401780
index:0x0
flags: 0x4000000000000100(slab)
raw: 4000000000000100 ffffea0001a31100 0000001100000011 ffff88006c401780
raw: 0000000000000000 00000000802a002a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880068dff080: fb fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb
 ffff880068dff100: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
>ffff880068dff180: 00 00 00 00 07 fc fc fc fc fc fc fc fb fb fb fb
                               ^
 ffff880068dff200: fb fb fb fb fc fc fc fc 00 00 00 00 00 00 fc fc
 ffff880068dff280: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
==================================================================

Cc: <stable@vger.kernel.org> # 3.12
Fixes: f884827438 ("IB/core: clarify overflow/underflow checks on ib_create/destroy_flow")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:27:51 -06:00
Leon Romanovsky 940efcc888 RDMA/uverbs: Protect from attempts to create flows on unsupported QP
Flows can be created on UD and RAW_PACKET QP types. Attempts to provide
other QP types as an input causes to various unpredictable failures.

The reason is that in order to support all various types (e.g. XRC), we
are supposed to use real_qp handle and not qp handle and expect to
driver/FW to fail such (XRC) flows. The simpler and safer variant is to
ban all QP types except UD and RAW_PACKET, instead of relying on
driver/FW.

Cc: <stable@vger.kernel.org> # 3.11
Fixes: 436f2ad05a ("IB/core: Export ib_create/destroy_flow through uverbs")
Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:27:51 -06:00
Steve Wise 7b72717a20 iw_cxgb4: correctly enforce the max reg_mr depth
The code was mistakenly using the length of the page array memory instead
of the depth of the page array.

This would cause MR creation to fail in some cases.

Fixes: 8376b86de7 ("iw_cxgb4: Support the new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:27:51 -06:00
Leon Romanovsky 1ccddc42da RDMA/verbs: Drop kernel variant of destroy_flow
Following the removal of ib_create_flow(), adjust the code to get rid of
ib_destroy_flow() too.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:22:01 -06:00
Leon Romanovsky ca576fbbdc RDMA/verbs: Drop kernel variant of create_flow
There are no kernel users of this interface so lets drop it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:22:01 -06:00
Jason Gunthorpe e99028ad76 RDMA/uverbs: Check existence of create_flow callback
In the accepted series "Refactor ib_uverbs_write path", we presented the
roadmap to get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask fields in
favor of simple check of function pointer. So let's put NULL check of
create_flow function callback despite the fact that uverbs_ex_cmd_mask
still exists.

Link: https://www.spinics.net/lists/linux-rdma/msg60753.html
Suggested-by: Michael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 15:21:56 -06:00
Jason Gunthorpe 43cbd64b1f IB/usnic: Update with bug fixes from core code
usnic has a modified version of the core codes' ib_umem_get() and
related, and the copy misses many of the bug fixes done over the years:

Commit bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
Commit 87773dd56d ("IB: ib_umem_release() should decrement mm->pinned_vm
                      from ib_umem_get")
Commit 8494057ab5 ("IB/uverbs: Prevent integer overflow in ib_umem_get
                      address arithmetic")
Commit 8abaae62f3 ("IB/core: disallow registering 0-sized memory region")
Commit 66578b0b2f ("IB/core: don't disallow registering region starting
                      at 0x0")
Commit 53376fedb9 ("RDMA/core: not to set page dirty bit if it's already
                      set.")
Commit 8e907ed488 ("IB/umem: Use the correct mm during ib_umem_release")

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 14:38:28 -06:00
Yishai Hadas 1975acd9f3 IB/mlx4: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically, Upon internal error state modify QP to an error state can
be assumed to be success as each in-progress WR going to be flushed in
error in any case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks
and actions to make sure that the completion handler for that WR will
always be called after that the post_send/recv were issued but not in
parallel to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 14:32:36 -06:00
Yishai Hadas d0e84c0ad3 IB/mlx5: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.

Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.

In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.

In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.

As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25 14:32:36 -06:00
Jason Gunthorpe ea8c2d8f60 RDMA/core: Remove unused ib cache functions
Now that all users have been converted to use the version of these APIs
that returns a gid_attr pointer we can delete the old entry points.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:57 -06:00
Parav Pandit a8872d53e9 IB/cm: Use sgid_attr from the AV
Prior patches now ensure that the AV has a sgid_attr, if one would have
been required.  Instead of querying for one, take it directly from the AH.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:57 -06:00
Parav Pandit 398391071f IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'
While processing a path record entry in CM messages the associated GID
attribute is now also supplied.

Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.

Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.

This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:57 -06:00
Parav Pandit 815d456ef2 IB/cm: Pass the sgid_attr through various events
Make the sgid_attr available along with path information to the event
consumer, this allows the consumer to keep using the same GID table entry
as the event is related to.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:57 -06:00
Parav Pandit 4ed13a5f2d IB/cm: Keep track of the sgid_attr that created the cm id
Hold reference to the the sgid_attr which is used in a cm_id until the
cm_id is destroyed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:56 -06:00
Parav Pandit aa74f4878d IB: Make init_ah_attr_grh_fields set sgid_attr
Use the sgid and other information from the path record to figure out the
sgid_attrs.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:56 -06:00
Parav Pandit f685c19529 IB: Make ib_init_ah_from_mcmember set sgid_attr
This is really just a CM support function, normally a multicast address
does not have a specific SGID - but the RDMA CM usage model does restrict
things to the netdevice the CM id is bound to, at least for roce case.

Store the selected table entry in the sgid_attr for everything else to
use.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:56 -06:00
Parav Pandit b740321765 IB: Make ib_init_ah_attr_from_wc set sgid_attr
The work completion is inspected to determine what dgid table entry was
used to receieve the packet, produces a sgid_attr that matches and sticks
it in the ah_attr.

All callers of this function are now required to release the ah_attr on
success.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25 14:19:56 -06:00
Michael J. Ruhl 70324739ac IB/hfi1: Remove INTx support and simplify MSIx usage
The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).

Remove all supporting code for the INTx IRQ.

If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn 071e4fec8e IB/hfi1: Reorg ctxtdata and rightsize fields
Many fields in ctxtdata are incorrectly sized and the organization of the
fields within the structure is a jumble.

Fix by:
- Correcting oversize fields.
- Putting fields common to all contexts at the top with hot fields
  at the top.
- Moving PSM fields to the bottom of the structure.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn 06e81e3e92 IB/hfi1: Remove caches of chip CSRs
Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn 15d063d5db IB/hfi1: Remove unused/writeonly devdata fields
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn 4b0b76bd37 IB/hfi1: Rightsize ctxt_eager_bufs fields
Fields in this structure are sized excessively based on hardware
limitations and input values.

Fix by reducing fields as appropriate and repositioning to close holes in
the structure.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn b67bbc5923 IB/hfi1: Remove rcvctrl from ctxtdata
It is only ever written.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Mike Marciniszyn b257843128 IB/hfi1: Remove rcvhdrq_size
The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:12:17 -06:00
Parav Pandit 59d4081332 IB/core: Free GID table entry during GID deletion
If we already hold the table->lock when doing the kref_put it means we are
in a context where it is safe to do the deletion synchronously, with no
need for the work queue.

This helps to eliminate issues when GID change is requested as part of MAC
address change or bonding event change where expectation is to replace the
GID almost immediately.

Fixes: b150c3862d ("IB/core: Introduce GID entry reference counts")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:02:59 -06:00
Parav Pandit 8814567892 RDMA/cma: Consider net namespace while leaving multicast group
When sending multicast leave request, consider the net ns in which this
cm_id is created.

Code was duplicated in cma_leave_mc_groups() and rdma_leave_multicast(),
which is now done using a helper function cma_leave_roce_mc_group().

Fixes: bee3c3c918 ("IB/cma: Join and leave multicast groups with IGMP")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:02:59 -06:00
Jason Gunthorpe 321d7863ac IB/uverbs: Delete type and id from uverbs_obj_attr
In this context the uobject is not allowed to be NULL, so type is the same
as uobject->type, and at least for IDR, id is the same as uobject->id.

FD objects should never handle the FD number outside the uAPI boundary
code.

Suggested-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 09:02:59 -06:00
Jason Gunthorpe 4d7dff2b8b Merge branch 'icrc-counter' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull RoCE ICRC counters from Leon Romanovsky:

====================
This series exposes RoCE ICRC counter through existing RDMA hw_counters
sysfs interface.

The first patch has all HW definitions in mlx5_ifc.h file and second patch
is the actual counter implementation.
====================

* branch 'icrc-counter':
  IB/mlx5: Support RoCE ICRC encapsulated error counter
  net/mlx5: Add RoCE RX ICRC encapsulated counter
2018-06-22 08:53:27 -06:00
Talat Batheesh 9f876f3de6 IB/mlx5: Support RoCE ICRC encapsulated error counter
This patch adds support to query the counter that counts the
RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code).

This counter will be under
/sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/

rx_icrc_encapsulated - The number of RoCE packets with ICRC
error.

Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22 08:51:14 -06:00
Mark Rutland bfc18e389c atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()
While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().

This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().

This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:

  ----
  git grep -w __atomic_add_unless | while read line; do
  sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}";
  done
  git grep -w __arch_atomic_add_unless | while read line; do
  sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}";
  done
  ----

Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:22:32 +02:00
Linus Torvalds 1abd8a8f39 4.18-rc
Regression and crashing bug fixes:
 
 - mlx4/5: Fixes for issues found from various checkers
 - A resource tracking and uverbs regression in the core code
 - qedr: NULL pointer regression found during testing
 - rxe: Various small bugs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbKr/pAAoJEDht9xV+IJsasIoP/2yyHUHjBp3vVNJ3A2qRnzAJ
 Yt4DHVo+lWfAhtEY+1rqRQx432aa+gv7e9TUA/Y9Llj0+C2nrOIsNniJvyjF7UrF
 djtAua66p5L+TxmeQPbQP+RsE8pUoczxtPWvpTP6dJ5pkp+/0IJl4P7aZNG+WlYT
 t/4pW1zBejhA9nXfHCFej4A3HM3/6oW3narmIldrNhW1EH7+5jeidyyLKueY6c1Q
 MJ8zfLQM/ZdP1hFwrzfZPMsFmGI4WD7P0F4jWVa+JvpeedV/jOTVVBLKrjHfF1JS
 7JMEeVlK/Mqsu4hCu/BJqHsh8kpFs4aTGfHUOyusZ1xsOx92X1QWCTtGEwi/ZKZh
 PvZMkbWU6Syd1IFwtMRHrKMxGQYrErwXf9V3xHxVn4bIFEAWTT8qn/T1w+tiUcJY
 gBtfqpLuIdzjZ4JtNGBRtfxOvhzqBkHdZO7sd1ARmuIf6Euzvas9AEz9qH893Oun
 rfeLOL70hoz2TrJIpnDApndo9LFEGUB+ypUpax9e99nVHVdbPh/PSdRze/2khoj3
 oJ8z8oh6KAimiW1sMkJ89fefDfUnkkOFOYrxH3nTYfkdrOHyiEtpLuE424pZwVKM
 uWqQ+yoXRuab4X58Gw2ezYq2/UIILn4hJEJ/VdTgJomb41nd0iZtKNlgw2uk8G8M
 WhOCed7yvYsp6hDi8pSq
 =Gjuy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Here are eight fairly small fixes collected over the last two weeks.

  Regression and crashing bug fixes:

   - mlx4/5: Fixes for issues found from various checkers

   - A resource tracking and uverbs regression in the core code

   - qedr: NULL pointer regression found during testing

   - rxe: Various small bugs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/rxe: Fix missing completion for mem_reg work requests
  RDMA/core: Save kernel caller name when creating CQ using ib_create_cq()
  IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write
  IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
  RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM
  IB/mlx5: Fix return value check in flow_counters_set_data()
  IB/mlx5: Fix memory leak in mlx5_ib_create_flow
  IB/rxe: avoid double kfree skb
2018-06-21 07:22:30 +09:00
Leon Romanovsky cfdeb8934b RDMA/mlx5: Refactor transport domain checks
Put all relevant checks for transport domain in the
mlx5_ib_alloc/dealloc_transport_domain functions.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 13:32:17 -06:00
Mike Marciniszyn 2e2ba09e48 IB/rdmavt, IB/hfi1: Create device dependent s_flags
Move some s_flags defines out of rdmavt and into hfi1 because they are
hfi1 specific and therefore should remain in the driver instead of
bubbling up to rdmavt.

Document device specific ranges in rdmavt and remap
those in hfi1.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:46 -06:00
Mike Marciniszyn 32e3d97079 IB/hfi1: Remove rcvhdrsize
The field is based on a constant that can never change.

Use the define to assign the register instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:46 -06:00
Mike Marciniszyn 40442b30aa IB/hfi1: Move rhf_offset from devdata to ctxtdata
This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.

The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.

Both fields are now correctly sized as u8.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:45 -06:00
Mike Marciniszyn b0ba3c18d6 IB/hfi1: Move normal functions from hfi1_devdata to const array
The current implementation precludes having receive context specific
packet type receive handlers.

Fix this by adding adding c99 const array for the existing handlers and
remove the current 72 bytes of pointers from devdata.

A new pointer in hfi1_ctxtdata will point to the const array.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 11:49:45 -06:00
Yishai Hadas c59450c463 IB/mlx5: Expose DEVX tree
Expose DEVX tree to be used by upper layers.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas f6fe01b718 IB/mlx5: Add DEVX query EQN support
Return the matching device EQN for a given user vector number via the
DEVX interface.

Note:
EQs are owned by the kernel and shared by all user processes.
Basically, a user CQ can point to any EQ.
The kernel doesn't enforce any such limitation today either.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas aeae94579c IB/mlx5: Add DEVX support for memory registration
Add support to register a memory with the firmware via the DEVX
interface.

The driver translates a given user address to ib_umem then it will
register the physical addresses with the firmware and get a unique id
for this registration to be used for this virtual address.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas 7c043e908a IB/mlx5: Add support for DEVX query UAR
Return a device UAR index for a given user index via the DEVX interface.

Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.

If no match the doorbell is silently ignored by the hardware.  Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.

Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.

The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas e662e14d80 IB/mlx5: Add DEVX support for modify and query commands
Add support in DEVX for modify and query commands, the required lock is
taken (i.e. READ/WRITE) by the KABI infrastructure accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas 7efce3691d IB/mlx5: Add obj create and destroy functionality
Add support to create and destroy firmware objects via the DEVX
interface.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas 8aa8c95ce4 IB/mlx5: Add support for DEVX general command
Add support to run general firmware command via the DEVX interface.

A command that works on some object (e.g. CQ, WQ, etc.) will be added
in next patches while maintaining the required object lock.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas a8b92ca1b0 IB/mlx5: Introduce DEVX
Introduce DEVX to enable direct device commands in downstream patches
from this series.

In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.

A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas 7dc08dcfc8 IB/core: Expose ib_ucontext from a given ib_uverbs_file
Drivers that use the IOCTL API may have the ib_uverbs_file and need a
way to get the related ib_ucontext from it, this is enabled by this
patch.

Downstream patches from this series will use it.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Matan Barak 19b9def258 IB/uverbs: Allow an empty namespace in ioctl() framework
The ioctl parser framework wrongly assumed that each namespace is
populated. This could lead to NULL dereferences. Fix the parser to
always check that a given namespace indeed exists.

Fixes: fac9658cab ("IB/core: Add new ioctl interface")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Matan Barak 2d9c1bd7e1 IB/uverbs: Add a macro to define a type with no kernel known size
Sometimes the uverbs uAPI  doesn't really care about the structure it gets
from user-space. All it wants to do is to allocate enough space and send
it to the hardware/provider driver. Adding a UVERBS_ATTR_MIN_SIZE that
could be used for this scenarios. We use USHRT_MAX as the kernel known
size to bypass any zero validations.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Matan Barak 8762d149e8 IB/uverbs: Add PTR_IN attributes that are allocated/copied automatically
Adding UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY flag to PTR_IN attributes.
By using this flag, the parse automatically allocates and copies the
user-space data. This data is accessible by using uverbs_attr_get_len
and uverbs_attr_get_alloced_ptr inline accessor functions from the
handler.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Matan Barak 9442d8bf1d IB/uverbs: Refactor uverbs_finalize_objects
uverbs_finalize_objects is currently used only to commit or abort
objects. Since we want to add automatic allocation/free of PTR_IN
attributes, moving it to uverbs_ioctl.c and renamit it to
uverbs_finalize_attrs.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Matan Barak 1114b0a8a8 IB/uverbs: Export uverbs idr and fd types
As provider drivers could use UVERBS_ATTR_FD and UVERBS_ATTR_IDR macros
need to export them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Bharat Potnuri 3cba33d311 iw_cxgb4: remove duplicate memcpy() in c4iw_create_listen()
memcpy() of mapped addresses is done twice in c4iw_create_listen(),
removing the duplicate memcpy().

Fixes: 170003c894 ("iw_cxgb4: remove port mapper related code")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 16:06:01 -06:00
Steve Wise 33023fb85a IB/core: add max_send_sge and max_recv_sge attributes
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:17:28 -06:00
Zhu Yanjun b90575ce7b IB/rxe: avoid unnecessary NULL check
Before goto err2, the variable qp is checked. So it is not necessary
to check qp in label err2.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:16:30 -06:00
Vijay Immanuel 92cf36eec2 IB/rxe: support for 802.1q VLAN on the listener
Set the vlan flag and vlan_id field in the wc for rdma_listen()
to work over VLAN. This is required by ib_init_ah_attr_from_wc()
which is called by the CM REQ handler.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:16:04 -06:00
Vijay Immanuel 6a965ee57d IB/rxe: increase max MR limit
Increase the max MR limit to support more I/O queues
for NVMe over Fabrics hosts.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 13:15:48 -06:00
Vijay Immanuel 375dc53d03 IB/rxe: Fix missing completion for mem_reg work requests
Run the completer task to post a work completion after processing
a memory registration or invalidate work request. This covers the
case where the memory registration or invalidate was the last work
request posted to the qp.

Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:36:13 -06:00
Bharat Potnuri 7350cdd025 RDMA/core: Save kernel caller name when creating CQ using ib_create_cq()
Few kernel applications like SCST-iSER create CQ using ib_create_cq(),
where accessing CQ structures using rdma restrack tool leads to below NULL
pointer dereference. This patch saves caller kernel module name similar to
ib_alloc_cq().

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8132ca70>] skip_spaces+0x30/0x30
PGD 738bac067 PUD 8533f0067 PMD 0
Oops: 0000 [#1] SMP
R10: ffff88017fc03300 R11: 0000000000000246 R12: 0000000000000000
R13: ffff88082fa5a668 R14: ffff88017475a000 R15: 0000000000000000
FS:  00002b32726582c0(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000008491a1000 CR4: 00000000003607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 [<ffffffffc05af69c>] ? fill_res_name_pid+0x7c/0x90 [ib_core]
 [<ffffffffc05af79f>] fill_res_cq_entry+0xef/0x170 [ib_core]
 [<ffffffffc05af4c4>] res_get_common_dumpit+0x3c4/0x480 [ib_core]
 [<ffffffffc05af5d3>] nldev_res_get_cq_dumpit+0x13/0x20 [ib_core]
 [<ffffffff815bc1e7>] netlink_dump+0x117/0x2e0
 [<ffffffff815bcb8b>] __netlink_dump_start+0x1ab/0x230
 [<ffffffffc059fead>] ibnl_rcv_msg+0x11d/0x1f0 [ib_core]
 [<ffffffffc05af5c0>] ? nldev_res_get_mr_dumpit+0x20/0x20 [ib_core]
 [<ffffffffc059fd90>] ? rdma_nl_multicast+0x30/0x30 [ib_core]
 [<ffffffff815bea49>] netlink_rcv_skb+0xa9/0xc0
 [<ffffffffc05a0018>] ibnl_rcv+0x98/0xb0 [ib_core]
 [<ffffffff815be132>] netlink_unicast+0xf2/0x1b0
 [<ffffffff815be50f>] netlink_sendmsg+0x31f/0x6a0
 [<ffffffff8156b580>] sock_sendmsg+0xb0/0xf0
 [<ffffffff816ace9e>] ? _raw_spin_unlock_bh+0x1e/0x20
 [<ffffffff8156f998>] ? release_sock+0x118/0x170
 [<ffffffff8156b731>] SYSC_sendto+0x121/0x1c0
 [<ffffffff81568340>] ? sock_alloc_file+0xa0/0x140
 [<ffffffff81221265>] ? __fd_install+0x25/0x60
 [<ffffffff8156c2ce>] SyS_sendto+0xe/0x10
 [<ffffffff816b6c2a>] system_call_fastpath+0x16/0x1b
RIP  [<ffffffff8132ca70>] skip_spaces+0x30/0x30
RSP <ffff88072be97760>
CR2: 0000000000000000

Cc: <stable@vger.kernel.org>
Fixes: f66c8ba4c9 ("RDMA/core: Save kernel caller name when creating PD and CQ objects")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:32:58 -06:00
willy@infradead.org 9a41e38a46 IB/mad: Use IDR for agent IDs
Allocate agent IDs from a global IDR instead of an atomic variable.
This eliminates the possibility of reusing an ID which is already in
use after 4 billion registrations.  We limit the assigned ID to be less
than 2^24 as the mlx4 driver uses the most significant byte of the agent
ID to store the slave number.  Users unlucky enough to see a collision
between agent numbers and slave numbers see messages like:

 mlx4_ib: egress mad has non-null tid msb:1 class:4 slave:0

and the MAD layer stops working.

We look up the agent under protection of the RCU lock, which means we
have to free the agent using kfree_rcu, and only increment the reference
counter if it is not 0.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Tested-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:22:54 -06:00
Parav Pandit 89af969a66 RDMA: Convert drivers to use the AH's sgid_attr in post_wr paths
For UD the drivers were doing a sgid_index lookup into the cache to get
the attrs, however we can now directly access the same attrs stores in
the ib_ah instead and remove the lookup.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:27 -06:00
Jason Gunthorpe 1a1f460ff1 RDMA: Hold the sgid_attr inside the struct ib_ah/qp
If the AH has a GRH then hold a reference to the sgid_attr inside the
common struct.

If the QP is modified with an AV that includes a GRH then also hold a
reference to the sgid_attr inside the common struct.

This informs the cache that the sgid_index is in-use so long as the AH or
QP using it exists.

This also means that all drivers can access the sgid_attr directly from
the ah_attr instead of querying the cache during their UD post-send paths.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:27 -06:00
Parav Pandit 7492052a18 IB/mlx4: Use GID attribute from ah attribute
While converting GID index from attribute to that of the HCA, GID
attribute is available from the ah_attr. Make use of GID attribute
to simplify the code and also avoid avoid GID query.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:27 -06:00
Parav Pandit 47ec386662 RDMA: Convert drivers to use sgid_attr instead of sgid_index
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.

Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Jason Gunthorpe d97099fe53 IB{cm, core}: Introduce and use ah_attr copy, move, replace APIs
Introduce AH attribute copy, move and replace APIs to be used by core and
provider drivers.

In CM code flow when ah attribute might be re-initialized twice while
processing incoming request, or initialized once while from path record
while sending out CM requests. Therefore use rdma_move_ah_attr API to
handle such scenarios instead of memcpy().

Provider drivers keeps a copy ah_attr during the lifetime of the ah.
Therefore, use rdma_replace_ah_attr() which conditionally release
reference to old ah_attr and holds reference to new attribute whose
referrence is released when the AH is freed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Parav Pandit 947c99ecfc IB/core: Tidy ib_resolve_eth_dmac
No reason to call rdma_ah_retrieve_grh, tidy whitespace, and add a
function comment block.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Jason Gunthorpe 8d9ec9addd IB/core: Add a sgid_attr pointer to struct rdma_ah_attr
The sgid_attr will ultimately replace the sgid_index in the ah_attr.
This will allow for all layers to have a consistent view of what
gid table entry was selected as processing runs through all stages of the
stack.

This commit introduces the pointer and ensures it is set before calling
any driver callback that includes a struct ah_attr callback, allowing
future patches to adjust both the drivers and the callers to use
sgid_attr instead of sgid_index.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Parav Pandit fb51eecaa5 IB: Ensure that all rdma_ah_attr's are zero initialized
Since we are adding some new fields to this structure it is safest if all
users reliably initialize the struct to zero.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18 11:11:26 -06:00
Matthew Wilcox 0c271c433c IB/mad: Agent registration is process context only
Document this (it's implicitly true due to sleeping operations already
in use in both registration and deregistration).  Use this fact to use
spin_lock_irq instead of spin_lock_irqsave.  This improves performance
slightly.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Jason Gunthorpe 7f3ee8e030 IB/rxe: Do not hide uABI stuff in memcpy
struct rxe_global_route and struct ib_global_route are not the same thing
and should not be memcpy'd over each other, do a member by member copy
instead. This allows the layout of the in-kernel struct ib_global_route to
be changed without breaking rxe.

Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Shiraz Saleem aaf5e003b1 i40iw: Reorganize acquire/release of locks in i40iw_manage_apbvt
Commit f43c00c04b ("i40iw: Extend port reuse support for listeners")
introduces a sparse warning:

include/linux/spinlock.h:365:9: sparse: context imbalance in
'i40iw_manage_apbvt' - unexpected unlock

Fix this by reorganizing the acquire/release of locks in
i40iw_manage_apbvt and add a new function i40iw_cqp_manage_abvpt_cmd
to perform the CQP command. Also, use __clear_bit and __test_and_set_bit
as we do not need atomic versions.

Fixes: f43c00c04b ("i40iw: Extend port reuse support for listeners")
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Leon Romanovsky de7498147d RDMA/uverbs: Refactor flow_resources_alloc() function
Simplify the flow_resources_alloc() function call by reducing
number of goto statements.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Leon Romanovsky dd8028f1e9 RDMA/nldev: Return port capability flag for IB only
Port capability flag represents IBTA PortInfo:CapabilityMask,
but was mistakenly mixed with non-relevant fields. Return that
information for IB only.

Link: https://patchwork.kernel.org/patch/10386245/
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit 82f82ceb8e IB/rxe: Use rdma GID API
rxe_netdev_from_av can now be done by the core code directly from the
gid_attrs, no need for a helper in the driver.

ib_find_cached_gid_by_port can be switched to use the rdma version here as
well.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit 1dfce29457 IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gid
If the gid_attr argument is NULL then the functions behave identically to
rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
can be consolidated to one function.

Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
ib_query_gid() API is removed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Jason Gunthorpe 83f6f8d29d IB/core: Make rdma_find_gid_by_filter support all protocols
There is no reason to restrict this function to roce only these days,
allow the filter function to be called on any protocol.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Jason Gunthorpe c3d71b69a7 IB/core: Provide rdma_ versions of the gid cache API
These versions are functionally similar but all return gid_attrs and
related information via reference instead of via copy.

The old API is preserved, implemented as wrappers around the new, until
all callers can be converted.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit 77e786fcbe IB/core: Replace ib_query_gid with rdma_get_gid_attr
These call sites have a use of ib_query_gid with a simple lifetime for the
struct gid_attr pointer, with an easy conversion.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit bf399c2cad IB/core: Introduce GID attribute get, put and hold APIs
This patch introduces three APIs, rdma_get_gid_attr(),
rdma_put_gid_attr(), and rdma_hold_gid_attr() which expose the reference
counting for GID table entries to the entire stack. The kref counting is
based on the struct ib_gid_attr pointer

Later patches will convert more cache query function to return struct
ib_gid_attrs.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit f4df9a7c34 RDMA: Use GID from the ib_gid_attr during the add_gid() callback
Now that ib_gid_attr contains the GID, make use of that in the add_gid()
callback functions for the provider drivers to simplify the add_gid()
implementations.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit b150c3862d IB/core: Introduce GID entry reference counts
In order to be able to expose pointers to the ib_gid_attrs in the GID
table we need to make it so the value of the pointer cannot be
changed. Thus each GID table entry gets a unique piece of kref'd memory
that is written only during initialization and remains constant for its
lifetime.

This eventually will allow the struct ib_gid_attrs to be returned without
copy from many of query the APIs, but it also provides a way to track when
all users of a HW table index go away.

For roce we no longer allow an in-use HW table index to be re-used for a
new an different entry. When a GID table entry needs to be removed it is
hidden from the find API, but remains as a valid HW index and all
ib_gid_attr points remain valid. The HW index is not relased until all
users put the kref.

Later patches will broadly replace the use of the sgid_index integer with
the kref'd structure.

Ultimately this will prevent security problems where the OS changes the
properties of a HW GID table entry while an active user object is still
using the entry.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Parav Pandit 1c36cf912a IB/core: Store default GID property per-table instead of per-entry
There are at max one or two default GIDs for RoCE. Instead of storing
a default GID property for all the GIDs, store default GID indices as
individual bit per table.

This allows a future simplification to get rid of the GID property field.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-17 22:32:04 -06:00
Parav Pandit a1a4caeeba IB/core: Do not set the gid type when reserving default entries
When default GIDs are added, their gid type is set by
ib_cache_gid_set_default_gid().  There is no need to set the gid type of a
free GID entry during GID table initialization.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-17 22:32:04 -06:00
Kees Cook 84ca176bf5 treewide: Use array_size() in kvzalloc_node()
The kvzalloc_node() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

        kvzalloc_node(a * b, gfp, node)

with:
        kvzalloc_node(array_size(a, b), gfp, node)

as well as handling cases of:

        kvzalloc_node(a * b * c, gfp, node)

with:

        kvzalloc_node(array3_size(a, b, c), gfp, node)

This does, however, attempt to ignore constant size factors like:

        kvzalloc_node(4 * 1024, gfp, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  kvzalloc_node(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc_node(C1 * C2 * C3, ...)
|
  kvzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  kvzalloc_node(C1 * C2, ...)
|
  kvzalloc_node(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook fd7becedb1 treewide: Use array_size() in vzalloc_node()
The vzalloc_node() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

        vzalloc_node(a * b, node)

with:
        vzalloc_node(array_size(a, b), node)

as well as handling cases of:

        vzalloc_node(a * b * c, node)

with:

        vzalloc_node(array3_size(a, b, c), node)

This does, however, attempt to ignore constant size factors like:

        vzalloc_node(4 * 1024, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc_node(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc_node(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc_node(C1 * C2 * C3, ...)
|
  vzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc_node(C1 * C2, ...)
|
  vzalloc_node(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook fad953ce0b treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vzalloc(a * b)

with:
        vzalloc(array_size(a, b))

as well as handling cases of:

        vzalloc(a * b * c)

with:

        vzalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vzalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc(C1 * C2 * C3, ...)
|
  vzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc(C1 * C2, ...)
|
  vzalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 778e1cdd81 treewide: kvzalloc() -> kvcalloc()
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
patch replaces cases of:

        kvzalloc(a * b, gfp)

with:
        kvcalloc(a * b, gfp)

as well as handling cases of:

        kvzalloc(a * b * c, gfp)

with:

        kvzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kvcalloc(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kvzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvzalloc
+ kvcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kvzalloc(sizeof(THING) * C2, ...)
|
  kvzalloc(sizeof(TYPE) * C2, ...)
|
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(C1 * C2, ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 590b5b7d86 treewide: kzalloc_node() -> kcalloc_node()
The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This
patch replaces cases of:

        kzalloc_node(a * b, gfp, node)

with:
        kcalloc_node(a * b, gfp, node)

as well as handling cases of:

        kzalloc_node(a * b * c, gfp, node)

with:

        kzalloc_node(array3_size(a, b, c), gfp, node)

as it's slightly less ugly than:

        kcalloc_node(array_size(a, b), c, gfp, node)

This does, however, attempt to ignore constant size factors like:

        kzalloc_node(4 * 1024, gfp, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc_node
+ kcalloc_node
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc_node(C1 * C2 * C3, ...)
|
  kzalloc_node(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc_node(sizeof(THING) * C2, ...)
|
  kzalloc_node(sizeof(TYPE) * C2, ...)
|
  kzalloc_node(C1 * C2 * C3, ...)
|
  kzalloc_node(C1 * C2, ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Matthew Wilcox 7654cb1ba7 Convert infiniband uverbs to struct_size
The flows were hidden from the C compiler; expose them as a zero-length
array to allow struct_size to work.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Jason Gunthorpe 1eb9364ce8 IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write
During disassociation the ucontext will become NULL, however due to how
the SRCU locking works the ucontext must only be examined after looking
at the ib_dev, which governs the RCU control flow.

With the wrong ordering userspace will see EINVAL instead of EIO for a
disassociated uverbs FD, which breaks rdma-core.

Cc: stable@vger.kernel.org
Fixes: 491d5c6a30 ("RDMA/uverbs: Move uncontext check before SRCU read lock")
Reported-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-12 14:39:32 -06:00
Christophe Jaillet 3dc7c7badb IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'
Before returning -EPERM we should release some resources, as already done
in the other error handling path of the function.

Fixes: d8f9cc328c ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 12:43:41 -06:00
Kalderon, Michal 425cf5c135 RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM
Some RoCE specific code in qedr_modify_qp was run over an iWARP device
when running perftest benchmarks without the -R option.

The commit 3e44e0ee08 ("IB/providers: Avoid null netdev check for RoCE")
exposed this. Dropping the check for NULL pointer on ndev in
qedr_modify_qp lead to a null pointer dereference when running over
iWARP. Before the code would identify ndev as being NULL and return an
error.

Fixes: 3e44e0ee08 ("IB/providers: Avoid null netdev check for RoCE")
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:04:14 -06:00
weiyongjun (A) e31abf76f4 IB/mlx5: Fix return value check in flow_counters_set_data()
In case of error, the function mlx5_fc_create() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().

Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Gustavo A. R. Silva 299eafee39 IB/mlx5: Fix memory leak in mlx5_ib_create_flow
In case memory resources for *ucmd* were allocated, release them
before return.

Addresses-Coverity-ID: 1469857 ("Resource leak")
Fixes: 3b3233fbf0 ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Zhu Yanjun 828d810550 IB/rxe: avoid double kfree skb
In rxe_send, when network_type is not RDMA_NETWORK_IPV4 or
RDMA_NETWORK_IPV6, skb is freed and -EINVAL is returned.
Then rxe_xmit_packet will return -EINVAL, too. In rxe_requester,
this skb is double freed.
In rxe_requester, kfree_skb is needed only after fill_packet fails.
So kfree_skb is moved from label err to test fill_packet.

Fixes: 5793b46521 ("IB/rxe: remove unnecessary skb_clone in xmit")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Linus Torvalds a1cdde8c41 4.18 Merge window pull request
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
 driver updates and bug fixes. About four new uAPI related things. Not as much
 Szykaller patches this time, the bugs it finds are getting harder to fix.
 
 - More work cleaning up the RDMA CM code
 - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
 - Driver specific resource tracking and reporting via netlink
 - Continued work for name space support from Parav
 - MPLS support for the verbs flow steering uAPI
 - A few tricky IPoIB fixes improving robustness
 - HFI1 driver support for the '16B' management packet format
 - Some auditing to not print kernel pointers via %llx or similar
 - Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
   entirely. The user space side of this was long ago replaced with RDMA-CM and
   syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
   nobody uses it.
 - Purge more bogus BUG_ON's from Leon
 - 'flow counters' verbs uAPI
 - T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
   tree due to dependencies
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
 YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
 mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
 RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
 7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
 V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
 TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
 euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
 U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
 4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
 BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
 8T6TXYi2yF+f+vZBTFlV
 =Mb8m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a quiet cycle for RDMA, the big bulk is the usual
  smallish driver updates and bug fixes. About four new uAPI related
  things. Not as much Szykaller patches this time, the bugs it finds are
  getting harder to fix.

  Summary:

   - More work cleaning up the RDMA CM code

   - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
     i40iw, iw_cxgb4, mlx5, rxe

   - Driver specific resource tracking and reporting via netlink

   - Continued work for name space support from Parav

   - MPLS support for the verbs flow steering uAPI

   - A few tricky IPoIB fixes improving robustness

   - HFI1 driver support for the '16B' management packet format

   - Some auditing to not print kernel pointers via %llx or similar

   - Mark the entire 'UCM' user-space interface as BROKEN with the
     intent to remove it entirely. The user space side of this was long
     ago replaced with RDMA-CM and syzkaller is finding bugs in the
     residual UCM interface nobody wishes to fix because nobody uses it.

   - Purge more bogus BUG_ON's from Leon

   - 'flow counters' verbs uAPI

   - T10 fixups for iser/isert, these are Acked by Martin but going
     through the RDMA tree due to dependencies"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
  RDMA/mlx5: Update SPDX tags to show proper license
  RDMA/restrack: Change SPDX tag to properly reflect license
  IB/hfi1: Fix comment on default hdr entry size
  IB/hfi1: Rename exp_lock to exp_mutex
  IB/hfi1: Add bypass register defines and replace blind constants
  IB/hfi1: Remove unused variable
  IB/hfi1: Ensure VL index is within bounds
  IB/hfi1: Fix user context tail allocation for DMA_RTAIL
  IB/hns: Use zeroing memory allocator instead of allocator/memset
  infiniband: fix a possible use-after-free bug
  iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
  IB/isert: use T10-PI check mask definitions from core layer
  IB/iser: use T10-PI check mask definitions from core layer
  RDMA/core: introduce check masks for T10-PI offload
  IB/isert: fix T10-pi check mask setting
  IB/mlx5: Add counters read support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add counters create and destroy support
  IB/uverbs: Add support for flow counters
  ...
2018-06-07 13:04:07 -07:00
Linus Torvalds 3a3869f1c4 pci-v4.18-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlsZdg0UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwJOBAAsuuWsOdiJRRhQLU5WfEMFgzcL02R
 gsumqZkK7E8LOq0DPNMtcgv9O0KgYZyCiZyTMJ8N7sEYohg04lMz8mtYXOibjcwI
 p+nVMko8jQXV9FXwSMGVqigEaLLcrbtkbf/mPriD63DDnRMa/+/Jh15SwfLTydIH
 QRTJbIxkS3EiOauj5C8QY3UwzjlvV9mDilzM/x+MSK27k2HFU9Pw/3lIWHY716rr
 grPZTwBTfIT+QFZjwOm6iKzHjxRM830sofXARkcH4CgSNaTeq5UbtvAs293MHvc+
 v/v/1dfzUh00NxfZDWKHvTUMhjazeTeD9jEVS7T+HUcGzvwGxMSml6bBdznvwKCa
 46ynePOd1VcEBlMYYS+P4njRYBLWeUwt6/TzqR4yVwb0keQ6Yj3Y9H2UpzscYiCl
 O+0qz6RwyjKY0TpxfjoojgHn4U5ByI5fzVDJHbfr2MFTqqRNaabVrfl6xU4sVuhh
 OluT5ym+/dOCTI/wjlolnKNb0XThVre8e2Busr3TRvuwTMKMIWqJ9sXLovntdbqE
 furPD/UnuZHkjSFhQ1SQwYdWmsZI5qAq2C9haY8sEWsXEBEcBGLJ2BEleMxm8UsL
 KXuy4ER+R4M+sFtCkoWf3D4NTOBUdPHi4jyk6Ooo1idOwXCsASVvUjUEG5YcQC6R
 kpJ1VPTKK1XN64I=
 =aFAi
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

  - unify AER decoding for native and ACPI CPER sources (Alexandru
    Gagniuc)

  - add TLP header info to AER tracepoint (Thomas Tai)

  - add generic pcie_wait_for_link() interface (Oza Pawandeep)

  - handle AER ERR_FATAL by removing and re-enumerating devices, as
    Downstream Port Containment does (Oza Pawandeep)

  - factor out common code between AER and DPC recovery (Oza Pawandeep)

  - stop triggering DPC for ERR_NONFATAL errors (Oza Pawandeep)

  - share ERR_FATAL recovery path between AER and DPC (Oza Pawandeep)

  - disable ASPM L1.2 substate if we don't have LTR (Bjorn Helgaas)

  - respect platform ownership of LTR (Bjorn Helgaas)

  - clear interrupt status in top half to avoid interrupt storm (Oza
    Pawandeep)

  - neaten pci=earlydump output (Andy Shevchenko)

  - avoid errors when extended config space inaccessible (Gilles Buloz)

  - prevent sysfs disable of device while driver attached (Christoph
    Hellwig)

  - use core interface to report PCIe link properties in bnx2x, bnxt_en,
    cxgb4, ixgbe (Bjorn Helgaas)

  - remove unused pcie_get_minimum_link() (Bjorn Helgaas)

  - fix use-before-set error in ibmphp (Dan Carpenter)

  - fix pciehp timeouts caused by Command Completed errata (Bjorn
    Helgaas)

  - fix refcounting in pnv_php hotplug (Julia Lawall)

  - clear pciehp Presence Detect and Data Link Layer Status Changed on
    resume so we don't miss hotplug events (Mika Westerberg)

  - only request pciehp control if we support it, so platform can use
    ACPI hotplug otherwise (Mika Westerberg)

  - convert SHPC to be builtin only (Mika Westerberg)

  - request SHPC control via _OSC if we support it (Mika Westerberg)

  - simplify SHPC handoff from firmware (Mika Westerberg)

  - fix an SHPC quirk that mistakenly included *all* AMD bridges as well
    as devices from any vendor with device ID 0x7458 (Bjorn Helgaas)

  - assign a bus number even to non-native hotplug bridges to leave
    space for acpiphp additions, to fix a common Thunderbolt xHCI
    hot-add failure (Mika Westerberg)

  - keep acpiphp from scanning native hotplug bridges, to fix common
    Thunderbolt hot-add failures (Mika Westerberg)

  - improve "partially hidden behind bridge" messages from core (Mika
    Westerberg)

  - add macros for PCIe Link Control 2 register (Frederick Lawler)

  - replace IB/hfi1 custom macros with PCI core versions (Frederick
    Lawler)

  - remove dead microblaze and xtensa code (Bjorn Helgaas)

  - use dev_printk() when possible in xtensa and mips (Bjorn Helgaas)

  - remove unused pcie_port_acpi_setup() and portdrv_acpi.c (Bjorn
    Helgaas)

  - add managed interface to get PCI host bridge resources from OF (Jan
    Kiszka)

  - add support for unbinding generic PCI host controller (Jan Kiszka)

  - fix memory leaks when unbinding generic PCI host controller (Jan
    Kiszka)

  - request legacy VGA framebuffer only for VGA devices to avoid false
    device conflicts (Bjorn Helgaas)

  - turn on PCI_COMMAND_IO & PCI_COMMAND_MEMORY in pci_enable_device()
    like everybody else, not in pcibios_fixup_bus() (Bjorn Helgaas)

  - add generic enable function for simple SR-IOV hardware (Alexander
    Duyck)

  - use generic SR-IOV enable for ena, nvme (Alexander Duyck)

  - add ACS quirk for Intel 7th & 8th Gen mobile (Alex Williamson)

  - add ACS quirk for Intel 300 series (Mika Westerberg)

  - enable register clock for Armada 7K/8K (Gregory CLEMENT)

  - reduce Keystone "link already up" log level (Fabio Estevam)

  - move private DT functions to drivers/pci/ (Rob Herring)

  - factor out dwc CONFIG_PCI Kconfig dependencies (Rob Herring)

  - add DesignWare support to the endpoint test driver (Gustavo
    Pimentel)

  - add DesignWare support for endpoint mode (Gustavo Pimentel)

  - use devm_ioremap_resource() instead of devm_ioremap() in dra7xx and
    artpec6 (Gustavo Pimentel)

  - fix Qualcomm bitwise NOT issue (Dan Carpenter)

  - add Qualcomm runtime PM support (Srinivas Kandagatla)

  - fix DesignWare enumeration below bridges (Koen Vandeputte)

  - use usleep() instead of mdelay() in endpoint test (Jia-Ju Bai)

  - add configfs entries for pci_epf_driver device IDs (Kishon Vijay
    Abraham I)

  - clean up pci_endpoint_test driver (Gustavo Pimentel)

  - update Layerscape maintainer email addresses (Minghuan Lian)

  - add COMPILE_TEST to improve build test coverage (Rob Herring)

  - fix Hyper-V bus registration failure caused by domain/serial number
    confusion (Sridhar Pitchai)

  - improve Hyper-V refcounting and coding style (Stephen Hemminger)

  - avoid potential Hyper-V hang waiting for a response that will never
    come (Dexuan Cui)

  - implement Mediatek chained IRQ handling (Honghui Zhang)

  - fix vendor ID & class type for Mediatek MT7622 (Honghui Zhang)

  - add Mobiveil PCIe host controller driver (Subrahmanya Lingappa)

  - add Mobiveil MSI support (Subrahmanya Lingappa)

  - clean up clocks, MSI, IRQ mappings in R-Car probe failure paths
    (Marek Vasut)

  - poll more frequently (5us vs 5ms) while waiting for R-Car data link
    active (Marek Vasut)

  - use generic OF parsing interface in R-Car (Vladimir Zapolskiy)

  - add R-Car V3H (R8A77980) "compatible" string (Sergei Shtylyov)

  - add R-Car gen3 PHY support (Sergei Shtylyov)

  - improve R-Car PHYRDY polling (Sergei Shtylyov)

  - clean up R-Car macros (Marek Vasut)

  - use runtime PM for R-Car controller clock (Dien Pham)

  - update arm64 defconfig for Rockchip (Shawn Lin)

  - refactor Rockchip code to facilitate both root port and endpoint
    mode (Shawn Lin)

  - add Rockchip endpoint mode driver (Shawn Lin)

  - support VMD "membar shadow" feature (Jon Derrick)

  - support VMD bus number offsets (Jon Derrick)

  - add VMD "no AER source ID" quirk for more device IDs (Jon Derrick)

  - remove unnecessary host controller CONFIG_PCIEPORTBUS Kconfig
    selections (Bjorn Helgaas)

  - clean up quirks.c organization and whitespace (Bjorn Helgaas)

* tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (144 commits)
  PCI/AER: Replace struct pcie_device with pci_dev
  PCI/AER: Remove unused parameters
  PCI: qcom: Include gpio/consumer.h
  PCI: Improve "partially hidden behind bridge" log message
  PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() doc
  PCI: Move resource distribution for single bridge outside loop
  PCI: Account for all bridges on bus when distributing bus numbers
  ACPI / hotplug / PCI: Drop unnecessary parentheses
  ACPI / hotplug / PCI: Mark stale PCI devices disconnected
  ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug
  PCI: hotplug: Add hotplug_is_native()
  PCI: shpchp: Add shpchp_is_native()
  PCI: shpchp: Fix AMD POGO identification
  PCI: mobiveil: Add MSI support
  PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver
  PCI/AER: Decode Error Source Requester ID
  PCI/AER: Remove aer_recover_work_func() forward declaration
  PCI/DPC: Use the generic pcie_do_fatal_recovery() path
  PCI/AER: Pass service type to pcie_do_fatal_recovery()
  PCI/DPC: Disable ERR_NONFATAL handling by DPC
  ...
2018-06-07 12:45:58 -07:00
Linus Torvalds 1c8c5a9d38 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add Maglev hashing scheduler to IPVS, from Inju Song.

 2) Lots of new TC subsystem tests from Roman Mashak.

 3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

 5) Add ttl inherit support to vxlan, from Hangbin Liu.

 6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

 8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

11) Plumb extack down into fib_rules, from Roopa Prabhu.

12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

13) Add UDP GSO support, from Willem de Bruijn.

14) Add documentation for eBPF helpers, from Quentin Monnet.

15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

24) Add TCP SACK compression, from Eric Dumazet.

25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

28) Add lots of forwarding selftests, from Petr Machata.

29) Add generic network device failover driver, from Sridhar Samudrala.

* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
  strparser: Add __strp_unpause and use it in ktls.
  rxrpc: Fix terminal retransmission connection ID to include the channel
  net: hns3: Optimize PF CMDQ interrupt switching process
  net: hns3: Fix for VF mailbox receiving unknown message
  net: hns3: Fix for VF mailbox cannot receiving PF response
  bnx2x: use the right constant
  Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
  net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
  enic: fix UDP rss bits
  netdev-FAQ: clarify DaveM's position for stable backports
  rtnetlink: validate attributes in do_setlink()
  mlxsw: Add extack messages for port_{un, }split failures
  netdevsim: Add extack error message for devlink reload
  devlink: Add extack to reload and port_{un, }split operations
  net: metrics: add proper netlink validation
  ipmr: fix error path when ipmr_new_table fails
  ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
  net: hns3: remove unused hclgevf_cfg_func_mta_filter
  netfilter: provide udp*_lib_lookup for nf_tproxy
  qed*: Utilize FW 8.37.2.0
  ...
2018-06-06 18:39:49 -07:00
Linus Torvalds 2857676045 - Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
 - Introduce overflow test module (Rasmus, Kees)
 - Introduce saturating size helper functions (Matthew, Kees)
 - Treewide use of struct_size() for allocators (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
 8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
 2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
 jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
 YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
 u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
 m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
 bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
 jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
 S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
 3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
 KCDVLYPxwQQqK1Mqig==
 =/3L8
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull overflow updates from Kees Cook:
 "This adds the new overflow checking helpers and adds them to the
  2-factor argument allocators. And this adds the saturating size
  helpers and does a treewide replacement for the struct_size() usage.
  Additionally this adds the overflow testing modules to make sure
  everything works.

  I'm still working on the treewide replacements for allocators with
  "simple" multiplied arguments:

     *alloc(a * b, ...) -> *alloc_array(a, b, ...)

  and

     *zalloc(a * b, ...) -> *calloc(a, b, ...)

  as well as the more complex cases, but that's separable from this
  portion of the series. I expect to have the rest sent before -rc1
  closes; there are a lot of messy cases to clean up.

  Summary:

   - Introduce arithmetic overflow test helper functions (Rasmus)

   - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

   - Introduce overflow test module (Rasmus, Kees)

   - Introduce saturating size helper functions (Matthew, Kees)

   - Treewide use of struct_size() for allocators (Kees)"

* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use struct_size() for devm_kmalloc() and friends
  treewide: Use struct_size() for vmalloc()-family
  treewide: Use struct_size() for kmalloc()-family
  device: Use overflow helpers for devm_kmalloc()
  mm: Use overflow helpers in kvmalloc()
  mm: Use overflow helpers in kmalloc_array*()
  test_overflow: Add memory allocation overflow tests
  overflow.h: Add allocation size calculation helpers
  test_overflow: Report test failures
  test_overflow: macrofy some more, do more tests for free
  lib: add runtime test of check_*_overflow functions
  compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06 17:27:14 -07:00
Kees Cook acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
Leon Romanovsky c1191a19fe RDMA/mlx5: Update SPDX tags to show proper license
Mellanox code is supposed to be OpenIB compliant code,
so let's update SPDX tags to show it.

Fixes: fc385b7ac4 ("IB/mlx5: Add basic regiser/unregister representors code")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-05 14:04:20 -06:00
Leon Romanovsky 33edc3b2db RDMA/restrack: Change SPDX tag to properly reflect license
Resource tracking is supposed to be dual licensed: GPL-2.0 and
OpenIB, but the SPDX tag was not compliant to it. Update the tag to
properly reflect license.

Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-05 14:04:20 -06:00
Michal Kalderon d52c89f120 qed*: Utilize FW 8.37.2.0
This FW contains several fixes and features.

RDMA
- Several modifications and fixes for Memory Windows
- drop vlan and tcp timestamp from mss calculation in driver for
  this FW
- Fix SQ completion flow when local ack timeout is infinite
- Modifications in t10dif support

ETH
- Fix aRFS for tunneled traffic without inner IP.
- Fix chip configuration which may fail under heavy traffic conditions.
- Support receiving any-VNI in VXLAN and GENEVE RX classification.

iSCSI / FcoE
- Fix iSCSI recovery flow
- Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0

Misc
- Several registers (split registers) won't read correctly with
  ethtool -d

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 10:48:09 -04:00
Mike Marciniszyn d9a6ce68a0 IB/hfi1: Fix comment on default hdr entry size
The comment for the default header queue entry size is incorrect.

Correct the comment and fix the resulting S_IRUGO warning that shows
up in the widened patch context.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 15:29:48 -06:00
Kaike Wan ed71e86a8d IB/hfi1: Rename exp_lock to exp_mutex
The mutex exp_lock in struct hfi1_ctxtdata is used to protect all
Expected TID data of a user context. This patch renames it to exp_mutex
to better reflect its identity and prepare for upcoming patches.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 15:25:27 -06:00
Mike Marciniszyn dc2b2a917c IB/hfi1: Add bypass register defines and replace blind constants
These registers were not added in the 16B work.

Add them and replace blind constants with the correct defines.

Fixes: 72c07e2b67 ("IB/hfi1: Add support to receive 16B bypass packets")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 14:59:21 -06:00
Kaike Wan 5465f11083 IB/hfi1: Remove unused variable
The variable extended_psn was not used any more.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 14:59:21 -06:00
Kaike Wan f9458bc2c1 IB/hfi1: Ensure VL index is within bounds
Improve the safety of the code and ensure the array cannot be indexed
out of bounds when picking the CPU for a given SDMA engine.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04 13:33:16 -06:00