This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Updates to mlx5
- Updates to mlx4 (two conflicts, both minor and easily resolved)
- Updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper
resolution is to keep the code in cxgb4_main.c as it is in Linus'
tree as attach_uld was refactored and moved into cxgb4_uld.c)
- Improvements to uAPI (moved vendor specific API elements to uAPI area)
- Add hns-roce driver and hns and hns-roce ACPI reset support
- Conversion of all rdma code away from deprecated
create_singlethread_workqueue
- Security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX+AwSAAoJELgmozMOVy/d0WkQAKxPzVccMWwHv28iZI4ey13u
JwE+VoCNpCAZAVuEgzK5zzFdNHPvAk2jU93H4apA7dfXJBXPatVuj9Lnk+ieEEnW
tbFwJjBpbQ3Zol3+SPfAHnsVMbtax+xmd6WDKExPXXEDl1L6rutwL3KKfmgWEitg
ysX7XOJCiSdyM0hcg4T6UPB9a3jGPff9NLu0oGamV+yoUk5Y0WGoVFxHZ4MKcw8t
OkFBYIxGz4SGwq2tulStuH03HteURX594KngtrA8dyq6l1R2GlGRv+bkJAUEIWUv
aA0ow3VWusOM6fT+jLXPCv8iUwIXM8tR/U6F7X+cmORUUtWvCl+uCUVid113j/aN
BK+Af2nJnfoJ5cDBPsD+bC76l5gQycNZO/Qh8op2kmgJtD+6OpGM3cBXsHx53+kk
0wloJ2lKCGShWxNj+ig8n8rR/rhhs/x3vV3ouCVWNMbOUgOSN3eYHxmK3wGFW4nd
Qx+WYCjj9Yi/J6nmUDcfEQ4NWPR22Q2+0ENAabfhLhV6mDloAO5ILHd4GDqC3IA9
UtxlVjf4ZonaiLnTQQzCnDMGVVk6tT8FJ9D42s0ScwjbdYwjyCW9/rs/g2EhcprR
Cc+AmjqLviCWGtzBSFO0SijqQon8lcQOwdLw61CdFFvPa/mlLdf1rbx9ArIyNVKn
JSrbr3CGyoqyYj6qaEO5
=LC+S
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull main rdma updates from Doug Ledford:
"This is the main pull request for the rdma stack this release. The
code has been through 0day and I had it tagged for linux-next testing
for a couple days.
Summary:
- updates to mlx5
- updates to mlx4 (two conflicts, both minor and easily resolved)
- updates to iw_cxgb4 (one conflict, not so obvious to resolve,
proper resolution is to keep the code in cxgb4_main.c as it is in
Linus' tree as attach_uld was refactored and moved into
cxgb4_uld.c)
- improvements to uAPI (moved vendor specific API elements to uAPI
area)
- add hns-roce driver and hns and hns-roce ACPI reset support
- conversion of all rdma code away from deprecated
create_singlethread_workqueue
- security improvement: remove unsafe ib_get_dma_mr (breaks lustre in
staging)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits)
staging/lustre: Disable InfiniBand support
iw_cxgb4: add fast-path for small REG_MR operations
cxgb4: advertise support for FR_NSMR_TPTE_WR
IB/core: correctly handle rdma_rw_init_mrs() failure
IB/srp: Fix infinite loop when FMR sg[0].offset != 0
IB/srp: Remove an unused argument
IB/core: Improve ib_map_mr_sg() documentation
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets
IB/mthca: Move user vendor structures
IB/nes: Move user vendor structures
IB/ocrdma: Move user vendor structures
IB/mlx4: Move user vendor structures
IB/cxgb4: Move user vendor structures
IB/cxgb3: Move user vendor structures
IB/mlx5: Move and decouple user vendor structures
IB/{core,hw}: Add constant for node_desc
ipoib: Make ipoib_warn ratelimited
IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue
IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue
IB/ipoib: Remove deprecated create_singlethread_workqueue
...
Document that ib_map_mr_sg() is able to map physically discontiguous
sg-lists as a single MR. Change IB_MR_TYPE_SG_GAPS_REG into
IB_MR_TYPE_SG_GAPS.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "iwcm_wq" queues work item &work(maps to cm_work_handler).
It has been identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The workqueue "addr_wq" queues a single work item &work and hence
doesn't require ordering. Also, it is being used on a memory reclaim
path. Hence, it has been converted to use alloc_workqueue with
WQ_MEM_RECLAIM set.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "cma_wq" queues work item cma_work_handler. It has been
identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "close_wq" queues work items &ctx->close_work (maps to
ucma_close_id) and &con_req_eve->close_work (maps to
ucma_close_event_id). It has been identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "mcast_wq" queues work item &group->work. It has been
identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The workqueue "ib_nl" queues work items &ib_nl_timed_work and
&mad_agent_priv->local_work. It has been identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "ib_nl" queues work item &ib_nl_timed_work. It has been
identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under memory
pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the following fields to IPv6 flow filter specification:
1. Traffic Class
2. Flow Label
3. Next Header
4. Hop Limit
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Flow steering specifications structures were implemented as in an
extensible way that allows one to add new filters and new fields
to existing filters.
These specifications have never been extended, therefore the
kernel flow specifications size and the user flow specifications size
were must to be equal.
In downstream patch, the IPv4 flow specifications type is extended to
support TOS and TTL fields.
To support an extension we change the flow specifications size
condition test to be as following:
* If the user flow specifications is bigger than the kernel
specifications, we verify that all the bits which not in the kernel
specifications are zeros and the flow is added only with the kernel
specifications fields.
* Otherwise, we add flow rule only with the user specifications fields.
User space filters must be aligned with 32bits.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Query RSS related attributes and return them to user-space via the
extended query device uverbs command.
It includes both direct ones (i.e. struct ib_uverbs_rss_caps) and
max_wq_type_rq which may be used in both RSS and non RSS flows.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We now only use it from ib_alloc_pd to create a local DMA lkey if the
device doesn't provide one, or a global rkey if the ULP requests it.
This patch removes ib_get_dma_mr and open codes the functionality in
ib_alloc_pd so that we can simplify the code and prevent abuse of the
functionality. As a side effect we can also simplify things by removing
the valid access bit check, and the PD refcounting.
In the future I hope to also remove the per-PD global MR entirely by
shifting this work into the HW drivers, as one step towards avoiding
the struct ib_mr overload for various different use cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
less unchecked, this moves the capability of creating a global rkey into
the RDMA core, where it can be easily audited. It also prints a warning
everytime this feature is used as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This has two reasons: a) to clearly mark that drivers don't have any
business using it, and b) because we're going to use it for the
(dangerous) global rkey soon, so that drivers don't create on themselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.
Additionally, this patch gets rid of group->query_id variable which is
not used.
Fixes: faec2f7b96 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
'work' and 'route->path_rec' are malloced in cma_resolve_iboe_route()
and should be freed before leaving from the error handling cases,
otherwise it will cause memory leak.
Fixes: 200298326b ('IB/core: Validate route when we init ah')
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- hfi1 driver updates
- Fix for max SGEs allowed via RDMA R/W API
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXoqUzAAoJELgmozMOVy/dNKAP/1/Rzn/k97eda1qFqzWpqsPl
lMaxDiZZnRIAFJEqEF9Iwo1JLiFIzjpDJnqHB++CKuXZQT0NY6sHW0yrcyUwzsx7
5gui92ldkVg4vY7PTco171vyzG+79KKRZ1dFS14z7oC8XAg48zQ7yJmfb1op3dEw
mgxyoLaaMwMF5aLwPoWG4+aPkBMtKUGB/ARb4ehq6M2p71c43lb18GaarJuWLdAz
1HxakXL/uzttyvGDyJGKDrT6ktXXSyvdCTRO60OrrPFJ67P2xRYXce85TLRr8srp
Q5RNjyR5fP8uN0qtrQz+hl09mtBeBQHKomyFIOVwkB2r53OKqsR5g5roz3BlpA1X
7PF/MO0pKy4t8XQnLfohEwtNWgszupvxkyAAISI8MwzLOPra/V8smQ9CpTltx1UB
hTu3tpAMy1auAjh8TWzzzII1ZoRZz6YCTziWnTaC3bqAljufjt1mnvjrtNmQ1sNi
MCLeA3yr8HjlKWdwYr+gVfhSR1wEoOxwHZdLsvBsxmC32hFLlh6rbg2x8wceqTlR
4T8l0AERV1YPjsoSe3/pWVImKUA97qppIfeFcCZiBCBHBPlhpw3ebVt6B1mLVUCV
hTMuZeFVcV75D+qr0kR5ZuVn4jgEn9zB1VH3tCV9LJnhBfySZFcP4yhATqiELaHG
RVoVAiTBxq5RgNVOH4Zo
=cQcp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull second round of rdma updates from Doug Ledford:
"This can be split out into just two categories:
- fixes to the RDMA R/W API in regards to SG list length limits
(about 5 patches)
- fixes/features for the Intel hfi1 driver (everything else)
The hfi1 driver is still being brought to full feature support by
Intel, and they have a lot of people working on it, so that amounts to
almost the entirety of this pull request"
* tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits)
IB/hfi1: Add cache evict LRU list
IB/hfi1: Fix memory leak during unexpected shutdown
IB/hfi1: Remove unneeded mm argument in remove function
IB/hfi1: Consistently call ops->remove outside spinlock
IB/hfi1: Use evict mmu rb operation
IB/hfi1: Add evict operation to the mmu rb handler
IB/hfi1: Fix TID caching actions
IB/hfi1: Make the cache handler own its rb tree root
IB/hfi1: Make use of mm consistent
IB/hfi1: Fix user SDMA racy user request claim
IB/hfi1: Fix error condition that needs to clean up
IB/hfi1: Release node on insert failure
IB/hfi1: Validate SDMA user iovector count
IB/hfi1: Validate SDMA user request index
IB/hfi1: Use the same capability state for all shared contexts
IB/hfi1: Prevent null pointer dereference
IB/hfi1: Rename TID mmu_rb_* functions
IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister()
IB/hfi1: Restructure hfi1_file_open
IB/hfi1: Make iovec loop index easy to understand
...
- Updates/fixes for iw_cxgb4 driver
- Updates/fixes for mlx5 driver
- Add flow steering and RSS API
- Add hardware stats to mlx4 and mlx5 drivers
- Add firmware version API for RDMA driver use
- Add the rxe driver (this is a software RoCE driver that makes any
Ethernet device a RoCE device)
- Fixes for i40iw driver
- Support for send only multicast joins in the cma layer
- Other minor fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
kwy96TvYtT43SkyNmmBf
=oZOO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull base rdma updates from Doug Ledford:
"Round one of 4.8 code: while this is mostly normal, there is a new
driver in here (the driver was hosted outside the kernel for several
years and is actually a fairly mature and well coded driver). It
amounts to 13,000 of the 16,000 lines of added code in here.
Summary:
- Updates/fixes for iw_cxgb4 driver
- Updates/fixes for mlx5 driver
- Add flow steering and RSS API
- Add hardware stats to mlx4 and mlx5 drivers
- Add firmware version API for RDMA driver use
- Add the rxe driver (this is a software RoCE driver that makes any
Ethernet device a RoCE device)
- Fixes for i40iw driver
- Support for send only multicast joins in the cma layer
- Other minor fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
Soft RoCE driver
IB/core: Support for CMA multicast join flags
IB/sa: Add cached attribute containing SM information to SA port
IB/uverbs: Fix race between uverbs_close and remove_one
IB/mthca: Clean up error unwind flow in mthca_reset()
IB/mthca: NULL arg to pci_dev_put is OK
IB/hfi1: NULL arg to sc_return_credits is OK
IB/mlx4: Add diagnostic hardware counters
net/mlx4: Query performance and diagnostics counters
net/mlx4: Add diagnostic counters capability bit
Use smaller 512 byte messages for portmapper messages
IB/ipoib: Report SG feature regardless of HW UD CSUM capability
IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
IB/hfi1: Disable by default
IB/rdmavt: Disable by default
IB/mlx5: Fix port counter ID association to QP offset
IB/mlx5: Fix iteration overrun in GSI qps
i40iw: Add NULL check for puda buffer
i40iw: Change dup_ack_thresh to u8
i40iw: Remove unnecessary check for moving CQ head
...
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:
1. Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
and receive messages from the MCG.
2. Send Only Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
but doesn't receive any messages from the MCG.
IB: Send Only Full Member requires a query of ClassPortInfo
to determine if SM/SA supports this option. If SM/SA
doesn't support Send-Only there will be no join request
sent and an error will be returned.
ETH: When Send Only Full Member is requested no IGMP join
will be sent.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Added a new SA port attribute containing SM ClassPortInfo fields,
(ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for
checking SM support for specific features. The attribute is cached
to avoid resending queries, caching is done when a successful
ClassPortInfo reply is received on the port. Invalidation of the
attribute is done on SM change events, SM re-registration events,
and SM LID change events. The fields in ClassPortInfo should not
change during SM runtime without an event.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fixes an oops that might happen if uverbs_close races with
remove_one.
Both contexts may run ib_uverbs_cleanup_ucontext, it depends
on the flow.
Currently, there is no protection for a case that remove_one
didn't make the cleanup it runs to its end, the underlying
ib_device was freed then uverbs_close will call
ib_uverbs_cleanup_ucontext and OOPs.
Above might happen if uverbs_close deleted the file from the list
then remove_one didn't find it and runs to its end.
Fixes to protect against that case by a new cleanup lock so that
ib_uverbs_cleanup_ucontext will be called always before that
remove_one is ended.
Fixes: 35d4a0b63d ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Portmapper messages are short and do not occupy more than 512 bytes.
Lower portmapper message size to 512 bytes. This change significantly
reduces the amount of memory needed when trying to establish a large
number of connections simultaneously. The old value is based on page
size.
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove the complicated logic to free the iw_cm_id inside iw_cm
event handlers vs when an application thread destroys the cm_id.
Also remove the block in iw_destroy_cm_id() to block the application
until all references are removed. This block can cause a deadlock when
disconnecting or destroying cm_ids inside an rdma_cm event handler.
Simply allowing the last deref of the iw_cm_id to free the memory
is cleaner and avoids this potential deadlock. Also a flag is added,
IW_CM_DROP_EVENTS, that is set when the cm_id is marked for destruction.
If any events are pending on this iw_cm_id, then as they are processed
they will be dropped vs posted upstream if IW_CM_DROP_EVENTS is set.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
During connection establishment with a large number of
connections, it is possible that the connection requests
might fail. Adding flow control prevents this failure.
Change ibnl_unicast to use blocking to enable flow control.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Compute the SGE limit for RDMA READ and WRITE requests in
ib_create_qp(). Use that limit in the RDMA RW API implementation.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org> #v4.7+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some but not all callers of rdma_rw_ctx_init() zero-initialize
struct rdma_rw_ctx. Hence make rdma_rw_ctx_init() initialize all
work request fields that will be read by ib_post_send().
Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: <stable@vger.kernel.org> #v4.7+
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the missing port_xmit_wait counter. This counter is displayed through
some tools like perfquery but is not available via sysfs.
For the PORT_PMA_ATTR macro the _counter field is set to zero
allowing us to specify the offset directly like with PORT_PMA_ATTR_EXT
See also the earlier work in 2008 by Vladimir Skolovsky
https://www.mail-archive.com/general@lists.openfabrics.org/msg20313.html
Signed-off-by: Vladimir Sokolvsky <vlad@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Now that all the devices have stopped exporting their own sysfs
entry points we can have the core export this on their behalf.
Eventually this may be removed but this provides for backwards
compatibility.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Allow for a common core function to get firmware version strings
from the individual devices.
In later patches this format can then then be used to pass a
properly formated version string through the IPoIB layer.
The problem with the current code in the IPoIB layer is that it is
specific to certain hardware types.
Furthermore, this gives us a common function through which the core
can provide a common sysfs entry. Eventually we may want to
remove the sysfs export but this provides for user space backwards
compatibility.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
User applications that want to spread incoming traffic between several WQs
should create a QP which contains an indirection table.
When such a QP is created other receive side parameters are not valid
and should not be given. Its send side is optional and assumed active
based on max_send_wr capability value.
Extend create QP to work accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Extend create QP to get Receive Work Queue (WQ) indirection table.
QP can be created with external Receive Work Queue indirection table,
in that case it is ready to receive immediately.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
User applications that want to spread traffic on several WQs, need to
create an indirection table, by using already created WQs.
Adding uverbs API in order to create and destroy this table.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce Receive Work Queue (WQ) indirection table.
This object can be used to spread incoming traffic to different
receive Work Queues.
A Receive WQ indirection table points to variable size of WQs.
This table is given to a QP in downstream patches.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimerg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
User space applications which use RSS functionality need to create
a work queue object (WQ). The lifetime of such an object is:
* Create a WQ
* Modify the WQ from reset to init state.
* Use the WQ (by downstream patches).
* Destroy the WQ.
These commands are added to the uverbs API.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce Work Queue object and its create/destroy/modify verbs.
QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.
WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple
Receive Work Queues.
WQ associated (many to one) with Completion Queue and it owns WQ
properties (PD, WQ size, etc.).
WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue),
it may be extend to others such as IB_WQT_SQ. (send queue).
WQ from type IB_WQT_RQ contains receive work requests.
PD is an attribute of a work queue (i.e. send/receive queue), it's used
by the hardware for security validation before scattering to a memory
region which is pointed by the WQ. For that, an external WQ object
needs a PD, letting the hardware makes that validation.
When accessing a memory region that is pointed by the WQ its PD
is used and not the QP's PD, this behavior is similar
to a SRQ and a QP.
WQ context is subject to a well-defined state transitions done by
the modify_wq verb.
When WQ is created its initial state becomes IB_WQS_RESET.
>From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY.
>From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET
or to IB_WQS_ERR.
>From IB_WQS_ERR it can be modified to IB_WQS_RESET.
Note: transition to IB_WQS_ERR might occur implicitly in case there
was some HW error.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Initialize ib_qp_init_attr with zeros in order to avoid from garbage
in fields that won't be set with user values.
Fixes: a060b5629a ('IB/core: generic RDMA READ/WRITE API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When virtualziation is supported, VFs may send SA MADs to a GID formed
by the concatenation of the subnet prefix with the
IB_SA_WELL_KNOWN_GUID. When a response is required, the current code
will search the local HCA's port for the received GID to figure out the
GID index of the entry containing this GID. However, since this is not a
real GID it will not be found and error will be printed.
We change the logic to check if the destination GID is this special GID
and avoid lookup in this case and use GID index 0.
Fixes: a0c1b2a350 ('IB/core: Support accessing SA in virtualized environment')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
During multicast join of RoCEv1, IGMP join state and max hop limit
were updated incorrectly. IGMP join should be sent and marked as
joined only on RoCEv2 after a successful join. Max hops should be
updated to the hop limit on RoCEv2 regardless of the join state.
Fixes: bee3c3c918 ('IB/cma: Join and leave multicast groups...')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently, when the netdevice returned by get_netdev is unregistered,
we delete all GIDs (including the default GIDs) and reset their
attributes. Therefore, when we re-register it, no default GIDs
will be assigned (as their "default GID") attribute will be reset.
Fixing this by keeping "default GID" attribute.
Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>