This paves the way for another patch that reacts to a
flush sdma completion for RC.
Fixes: 81cd3891f0 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.
Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Previously, EQ joined the chain notifier on creation.
This forced the caller to be ready to handle events before creating
the EQ through eq_create_generic interface.
To help the caller control when the created EQ will be attached to the
IRQ, add enable/disable API.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The patch modifies the IRQ allocation so that all async EQs are
assigned to the same IRQ resulting in more available IRQs for
completion EQs.
The changes are using the support for IRQ sharing and EQ polling budget
that was introduced in previous patches so when the shared interrupt is
triggered, the kernel will serially call the handler of each of the
sharing EQs with a certain budget of EQEs to poll in order to prevent
starvation.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of requesting IRQ with eq creation, IRQs will be requested
before EQ table creation.
Instead of freeing the IRQs after EQ destroy, free IRQs after eq
table destroy.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Multiple EQs may share the same IRQ in subsequent patches.
Instead of calling the IRQ handler directly, the EQ will register
to an atomic chain notfier.
The Linux built-in shared IRQ is not used because it forces the caller
to disable the IRQ and clear affinity before free_irq() can be called.
This patch is the first step in the separation of IRQ and EQ logic.
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This driver was first merged over 10 years ago and has not seen major
activity by the authors in the last 7 years. However, in that time it has
been patched 150 times to adapt it to changing kernel APIs.
Further, the hardware has several issues, like not supporting 64 bit DMA,
that make it rather uninteresting for use with modern systems and RDMA.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ensure that CQ is allocated and freed by IB/core and not by drivers.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Like all other destroy commands, .destroy_cq() call is not supposed
to fail. In all flows, the attempt to return earlier caused to memory
leaks.
This patch converts .destroy_cq() to do not return any errors.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The memory allocation call can fail and cause to early return
from nes_desotroy_cq() function. This situation will cause to
memory leak of struct nes_cq. Rewrite function to avoid memory
allocation.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The call to sdma_progress() is called outside the wait lock.
In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle. If that happens, there will be no
more sdma interrupts to cause the wakeup and the user_sdma xmit will hang.
Fix by moving the lock to enclose the sdma_progress() call.
Also, delete busycount. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")
Cc: <stable@vger.kernel.org>
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The opcode range for fault injection from user should be validated before
it is applied to the fault->opcodes[] bitmap to avoid out-of-bound
error.
Cc: <stable@vger.kernel.org>
Fixes: a74d5307ca ("IB/hfi1: Rework fault injection machinery")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This more closely follows how other subsytems work, with owner being a
member of the structure containing the function pointers.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When user post recv a srq with multiple sges, the hardware will get the
last correct sge and count the sge numbers according to the specific
identifier with lkey. For example, when the driver fills the sges with
every wr less than the max sge that the user configured when creating srq,
the hardware will stop getting the sge according to the specific lkey in
the sge. However, it will always end with the first sge in the current
post srq recv interface implementation.
Fixes: c7bcb13442 ("RDMA/hns: Add SRQ support for hip08 kernel mode")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous change incorrectly changed the inverted logic and logically
negated the readl rather than the shifted readl result. Fix this by
adding in missing parentheses around the expression that needs to be
logically negated.
Addresses-Coverity: ("Logically dead code")
Fixes: 669cefb654 ("RDMA/hns: Remove jiffies operation in disable interrupt context")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The usual driver bug fixes and fixes for a couple of regressions introduced in
5.2:
- Fix a race on bootup with RDMA device renaming and srp. SRP also needs to
rename its internal sys files
- Fix a memory leak in hns
- Don't leak resources in efa on certain error unwinds
- Don't panic in certain error unwinds in ib_register_device
- Various small user visible bug fix patches for the hfi and efa drivers
- Fix the 32 bit compilation break
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g
mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3
374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE
GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH
jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp
XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg
+Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ
X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac
Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX
fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR
SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/
aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0=
=RmxY
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Things are looking pretty quiet here in RDMA, not too many bug fixes
rolling in right now. The usual driver bug fixes and fixes for a
couple of regressions introduced in 5.2:
- Fix a race on bootup with RDMA device renaming and srp. SRP also
needs to rename its internal sys files
- Fix a memory leak in hns
- Don't leak resources in efa on certain error unwinds
- Don't panic in certain error unwinds in ib_register_device
- Various small user visible bug fix patches for the hfi and efa
drivers
- Fix the 32 bit compilation break"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/efa: Remove MAYEXEC flag check from mmap flow
mlx5: avoid 64-bit division
IB/hfi1: Validate page aligned for a given virtual address
IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
IB/rdmavt: Fix alloc_qpn() WARN_ON()
RDMA/core: Fix panic when port_data isn't initialized
RDMA/uverbs: Pass udata on uverbs error unwind
RDMA/core: Clear out the udata before error unwind
RDMA/hns: Fix PD memory leak for internal allocation
RDMA/srp: Rename SRP sysfs name after IB device rename trigger
This series provides some updates to mlx5 core and netdevice driver.
1) use __netdev_tx_sent_queue() to improve performance under GSO workload
2) Allow matching only enc_key_id/enc_dst_port for decapsulation action
3) Geneve support:
This patchset adds support for GENEVE tunnel encap/decap flows offload:
encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams.
The driver supports 6081 destination UDP port number, which is the
default IANA-assigned port.
Encap:
ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is
provided by the mlx5 driver to the outgoing packet.
Decap:
Geneve header is matched and the packet is decapsulated.
Notes about decap flows with Geneve TLV Options:
- Support offloading of 32-bit options data only
- At any given time, only one combination of class/type parameters
can be offloaded, but the same class/type combination can have
many different flows offloaded with different 32-bit option data
- Options with value of 0 can't be offloaded
Managing Geneve TLV options:
Matching (on receive) is done by ConnectX-5 flex parser.
Geneve TLV options are managed using General Object of type
“Geneve TLV Options”.
When the first flow with a certain class/type values is requested
to be offloaded, the driver creates a FW object with FW command
(Geneve TLV Options general object) and starts counting the number
of flows using this object.
During this time, any request with a different class/type values
will fail to be offloaded.
Once the refcount reaches 0, the driver destroys the TLV options
general object, and can now offload a flow with any class/type parameters.
Geneve TLV Options object is added to core device.
It is currently used to manage Geneve TLV options general
object allocation in FW and its reference counting only.
In the future it will also be used for managing geneve ports
by registering callbacks for ndo_udp_tunnel_add/del.
TC tunnel code refactoring:
As a preparation for Geneve code, the TC tunnel code in mlx5
was rearranged in a modular way, so that it would be easier
to add future tunnels:
- Defined tc tunnel object with the fields and callbacks that
any tunnel must implement.
- Define tc UDP tunnel object for UDP tunnels, such as VXLAN
- Move each tunnel code (GRE, VXLAN) to its own separate file
- Rewrite tc tunnel implementation in a general way – using
only the objects and their callbacks.
4) Termination tables:
Actions in tables set with the termination flag are guaranteed to terminate
the action list. Thus, potential looping functionality (e.g. haripin) can safely be
executed without potential loops.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAlzxiMsACgkQSD+KveBX
+j708ggAwjhVpazLCbo4kXfutln1eeQ6uImb2ivBDEIXjri3uK+GN5fWtqZVhg5v
oRaTwdWAMZJFmEdvFKPOvAaqJwy3l3M1mXIjHYfQXpP8WYXYvteoq5AuSxqfEFcE
wK127DRe2zcH75Q5Q8ObL1lMBVvYeu6xBnr3EQUaPFDF9hi4np+r5bJvhHwJzt7z
lxdsGdxdTmqz3hw+rkp/Uuvx2Nniy5Tkm4zuNeQdoCtlYtqEs3dVFUpZqIfYgjdx
hCZC1GEqKfLpdRU3qCW6HRaO2Yeok6a9QYbb70KUEeCVbwMXDnjohlz+61XJEd+M
gp92vmf11tjSBruv56O8KfokFBIxUw==
=oum3
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-05-31
This series provides some updates to mlx5 core and netdevice driver.
1) use __netdev_tx_sent_queue() to improve performance under GSO workload
2) Allow matching only enc_key_id/enc_dst_port for decapsulation action
3) Geneve support:
This patchset adds support for GENEVE tunnel encap/decap flows offload:
encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams.
The driver supports 6081 destination UDP port number, which is the
default IANA-assigned port.
Encap:
ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is
provided by the mlx5 driver to the outgoing packet.
Decap:
Geneve header is matched and the packet is decapsulated.
Notes about decap flows with Geneve TLV Options:
- Support offloading of 32-bit options data only
- At any given time, only one combination of class/type parameters
can be offloaded, but the same class/type combination can have
many different flows offloaded with different 32-bit option data
- Options with value of 0 can't be offloaded
Managing Geneve TLV options:
Matching (on receive) is done by ConnectX-5 flex parser.
Geneve TLV options are managed using General Object of type
“Geneve TLV Options”.
When the first flow with a certain class/type values is requested
to be offloaded, the driver creates a FW object with FW command
(Geneve TLV Options general object) and starts counting the number
of flows using this object.
During this time, any request with a different class/type values
will fail to be offloaded.
Once the refcount reaches 0, the driver destroys the TLV options
general object, and can now offload a flow with any class/type parameters.
Geneve TLV Options object is added to core device.
It is currently used to manage Geneve TLV options general
object allocation in FW and its reference counting only.
In the future it will also be used for managing geneve ports
by registering callbacks for ndo_udp_tunnel_add/del.
TC tunnel code refactoring:
As a preparation for Geneve code, the TC tunnel code in mlx5
was rearranged in a modular way, so that it would be easier
to add future tunnels:
- Defined tc tunnel object with the fields and callbacks that
any tunnel must implement.
- Define tc UDP tunnel object for UDP tunnels, such as VXLAN
- Move each tunnel code (GRE, VXLAN) to its own separate file
- Rewrite tc tunnel implementation in a general way – using
only the objects and their callbacks.
4) Termination tables:
Actions in tables set with the termination flag are guaranteed to terminate
the action list. Thus, potential looping functionality (e.g. haripin) can safely be
executed without potential loops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit:
4b53a3412d ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")
the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.
As an alternative implementation Ingo suggested to use:
struct task_struct {
const cpumask_t *cpus_ptr;
cpumask_t cpus_mask;
};
with
t->cpus_ptr = &t->cpus_mask;
In -RT we then can switch the cpus_ptr to:
t->cpus_ptr = &cpumask_of(task_cpu(p));
in a migration disabled region. The rules are simple:
- Code that 'uses' ->cpus_allowed would use the pointer.
- Code that 'modifies' ->cpus_allowed would use the direct mask.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ifa_list is protected by rcu, yet code doesn't reflect this.
Add the __rcu annotations and fix up all places that are now reported by
sparse.
I've done this in the same commit to not add intermediate patches that
result in new warnings.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like previous patches, use the new iterator macros to avoid sparse
warnings once proper __rcu annotations are added.
Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series provides some low level updates for mlx5 driver needed for
both rdma and netdev trees.
1) Termination flow steering table bits and hardware definitions.
2) Introduce the core dump HW access registers definitions.
3) Refactor and cleans-up VF representors functions handlers.
4) Renames host_params bits to function_changed bits and add the
support for eswitch functions change event in the eswitch general case.
(for both legacy and switchdev modes).
5) Potential error pointer dereference in error handling
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently for every representor type and for every single vport,
representer function pointers copy is stored even though they don't
change from one to other vport.
Additionally priv data entry for the rep is not passed during
registration, but its copied. It is used (set and cleared) by the user
of the reps.
As we want to scale vports, to simplify and also to split constants
from data,
1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops
prefix with other standard netdev, ibdev ops.
2. Constify the IB and Ethernet rep ops structure.
3. Instead of storing copy of all rep function pointers, store copy
per eswitch rep type.
4. Split data and function pointers to mlx5_eswitch_rep_ops and
mlx5_eswitch_rep_data.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv*
as it is not needed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When the user submits more than 32 work request to a srq queue
at a time, it needs to find the corresponding number of entries
in the bitmap in the idx queue. However, the original lookup
function named ffs only processes 32 bits of the array element,
When the number of srq wqe issued exceeds 32, the ffs will only
process the lower 32 bits of the elements, it will not be able
to get the correct wqe index for srq wqe.
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace the following form:
sizeof(struct opa_port_status_rsp) + num_vls * sizeof(struct _vls_pctrs)
with:
struct_size(rsp, vls, num_vls)
and so on...
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace the following form:
sizeof(*pkt) + sizeof(pkt->addr[0])*n
with:
struct_size(pkt, addr, n)
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove leftover includes that are no longer used from the driver.
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When creating the chunks list the rdma_for_each_block() iterator is used
in order to iterate over the payload in EFA_CHUNK_PAYLOAD_SIZE (device
defined) strides.
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The admin commands abort flow is buggy (use-after-free) and not really
necessary as it is guaranteed that after ib_unregister_device() is called
there are no user verbs threads running in parallel, delete it.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use kvzalloc which attempts to allocate a physically continuous buffer and
fallbacks to virtually continuous on failure instead of open coding it in
the driver.
The is_vmalloc_addr function is used to determine whether the buffer is
physically continuous or not (which determines direct vs indirect MR
registration mode).
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
MAYEXEC test was mistakenly added, remove it. Checking MAYEXEC in the
driver prevents it from working with userspace that uses things like EXEC
STACK. (ie some Fortran and other runtimes)
Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit 25c13324d0 ("IB/mlx5: Add steering SW ICM device memory type")
breaks i386 build by introducing three 64-bit divisions. As the divisor is
MLX5_SW_ICM_BLOCK_SIZE() which is always a power of 2, we can replace the
division with bit operations.
Fixes: 25c13324d0 ("IB/mlx5: Add steering SW ICM device memory type")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A recent patch to hfi1 left behind a checkpatch error.
Fixes: fb24ea52f7 ("drivers: Remove explicit invocations of mmiowb()")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
User applications can register memory regions for TID buffers that are not
aligned on page boundaries. Hfi1 is expected to pin those pages in memory
and cache the pages with mmu_rb. The rb tree will fail to insert pages
that are not aligned correctly.
Validate whether a given virtual address is page aligned before pinning.
Fixes: 7e7a436ecb ("staging/hfi1: Add TID entry program function body")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The command 'ibv_devinfo -v' reports 0 for max_mr.
Fix by assigning the query values after the mr lkey_table has been built
rather than early on in the driver.
Fixes: 7b1e2099ad ("IB/rdmavt: Move memory registration into rdmavt")
Reviewed-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
By code inspection, the freeze_work is never canceled.
Fix by adding a cancel_work_sync in the shutdown path to insure it is no
longer running.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For infiniband code that retains pages via get_user_pages*(), release
those pages via the new put_user_page(), or put_user_pages*(), instead of
put_page()
This is a tiny part of the second step of fixing the problem described in
[1]. The steps are:
1) Provide put_user_page*() routines, intended to be used for releasing
pages that were pinned via get_user_pages*().
2) Convert all of the call sites for get_user_pages*(), to invoke
put_user_page*(), instead of put_page(). This involves dozens of call
sites, and will take some time.
3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.
4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem. Again, [1] provides details as to why that is
desirable.
[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/infiniband/hw/hfi1/tid_rdma.c: In function tid_rdma_rcv_error:
drivers/infiniband/hw/hfi1/tid_rdma.c:2029:7: warning: variable offset set but not used [-Wunused-but-set-variable]
drivers/infiniband/hw/hfi1/tid_rdma.c: In function hfi1_rc_rcv_tid_rdma_ack:
drivers/infiniband/hw/hfi1/tid_rdma.c:4555:35: warning: variable fspsn set but not used [-Wunused-but-set-variable]
'offset' is never used since introduction in
commit d0d564a1ca ("IB/hfi1: Add functions to receive TID RDMA READ request")
'fspsn' is never used since introduciotn in
commit 9e93e967f7 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch makes the code more readable by removing magic numbers.
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In some functions, the jiffies operation is unnecessary, and we can
control delay using mdelay and udelay functions only. Especially, in
hns_roce_v1_clear_hem, the function calls spin_lock_irqsave, the context
disables interrupt, so we can not use jiffies and msleep functions.
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When hip08 set gid, it will call spin_unlock_bh when send cmq. if main.ko
call spin_lock_irqsave firstly, and the kernel is before commit
f71b74bca6 ("irq/softirqs: Use lockdep to assert IRQs are
disabled/enabled"), it will cause WARN_ON_ONCE because of calling
spin_unlock_bh in disable context.
In fact, the spin_lock_irqsave in main.ko is only used for hip06, and
should be placed in hns_roce_hw_v1.c. hns_roce_hw_v2.c uses its own
spin_unlock_bh and does not need main.ko manage spin_lock.
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to hip08 UM, the maximum number of CQEs supported by each CQ is
4M.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to print when communication is established, especially
while lots of qp used by application.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add await in destroy_qp() so that all references to qp are dereferenced
and qp is freed in destroy_qp() itself. This ensures freeing of all QPs
before invocation of dealloc_ucontext(), which prevents loss of in use
qpids stored in the ucontext.
Signed-off-by: Nirranjan Kirubaharan <nirranjan@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Change unconditional print of DMA address to be printed with special
printk format type specifier.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Convert various sizeof call sites to be written in standard format
sizeof().
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Resize CQ implementation was guarded by undeclared "notyet" define while
cxgb3 was added to the kernel. Twelve years later, this call is still
unimplemented, so safely delete it and fix improper return error code when
.resize_cq() is not implemented.
Fixes: b038ced7b3 ("RDMA/cxgb3: Add driver for Chelsio T3 RNIC")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
DMA addresses like all other kernel addresses should be printed with
special %p* formatter. It is needed to allow control of exposure of such
information through a dedicated knob.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
sizeof(a), sizeof a and sizeof (a) are all valid notations, but first is
more readable format recommended by checkpatch.pl.
Let's canonize it in cxgb3 drivers, so latter patches won't emit
checkpatch warnings. As part of this change, a redundant memset() was
removed.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add iWARP engine affinity setting for supporting iWARP over 100g.
iWARP cannot be distinguished by the LLH from L2, hence the
engine division will affect L2 as well. For this reason we add
a parameter to devlink to determine the engine division.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the msix vectors of the affined hwfn and not the
leading one.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When initializing status blocks use the affined hwfn
instead of the leading one for RDMA / Storage
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers cannot check the udata for validity when doing destroy as there
will be no way to report this error back to the uverbs.
Since udata is new for destroy no driver should start to use it - instead
drivers should opt for the ioctl interface and define it in a way where it
cannot fail due to incorrect data.
Remove the checks on udata construction so EFA is consistent with
everything else.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Here are series of patches that add SPDX tags to different kernel files,
based on two different things:
- SPDX entries are added to a bunch of files that we missed a year ago
that do not have any license information at all.
These were either missed because the tool saw the MODULE_LICENSE()
tag, or some EXPORT_SYMBOL tags, and got confused and thought the
file had a real license, or the files have been added since the last
big sweep, or they were Makefile/Kconfig files, which we didn't
touch last time.
- Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
tools can determine the license text in the file itself. Where this
happens, the license text is removed, in order to cut down on the
700+ different ways we have in the kernel today, in a quest to get
rid of all of these.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on the
patches are reviewers.
The reason for these "large" patches is if we were to continue to
progress at the current rate of change in the kernel, adding license
tags to individual files in different subsystems, we would be finished
in about 10 years at the earliest.
There will be more series of these types of patches coming over the next
few weeks as the tools and reviewers crunch through the more "odd"
variants of how to say "GPLv2" that developers have come up with over
the years, combined with other fun oddities (GPL + a BSD disclaimer?)
that are being unearthed, with the goal for the whole kernel to be
cleaned up.
These diffstats are not small, 3840 files are touched, over 10k lines
removed in just 24 patches.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOP8uw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynmGQCgy3evqzleuOITDpuWaxewFdHqiJYAnA7KRw4H
1KwtfRnMtG6dk/XaS7H7
=O9lH
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull SPDX update from Greg KH:
"Here is a series of patches that add SPDX tags to different kernel
files, based on two different things:
- SPDX entries are added to a bunch of files that we missed a year
ago that do not have any license information at all.
These were either missed because the tool saw the MODULE_LICENSE()
tag, or some EXPORT_SYMBOL tags, and got confused and thought the
file had a real license, or the files have been added since the
last big sweep, or they were Makefile/Kconfig files, which we
didn't touch last time.
- Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
tools can determine the license text in the file itself. Where this
happens, the license text is removed, in order to cut down on the
700+ different ways we have in the kernel today, in a quest to get
rid of all of these.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers.
The reason for these "large" patches is if we were to continue to
progress at the current rate of change in the kernel, adding license
tags to individual files in different subsystems, we would be finished
in about 10 years at the earliest.
There will be more series of these types of patches coming over the
next few weeks as the tools and reviewers crunch through the more
"odd" variants of how to say "GPLv2" that developers have come up with
over the years, combined with other fun oddities (GPL + a BSD
disclaimer?) that are being unearthed, with the goal for the whole
kernel to be cleaned up.
These diffstats are not small, 3840 files are touched, over 10k lines
removed in just 24 patches"
* tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3
...
The same wait queue is initialized a couple of lines above.
Fixes: 3c2d774cad ("RDMA/nes: Add a driver for NetEffect RNICs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to check existence of structures to be destroyed.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The destroy functions are always called with relevant structs, there is no
need to check their existence.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The function argument virt_addr is not in use - delete it.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
free_pd is allocated internally by the driver hence needs to be freed
internally too or it leaks.
Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This value has always been set to PAGE_SHIFT in the core code, the only
thing that does differently was the ODP path. Move the value into the ODP
struct and still use it for ODP, but change all the non-ODP things to just
use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly.
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Use the correct enum value introduced in commit 12113a35ad ("IB/core:
Add HDR speed enum") Prior to this change a 50Gbps port would show 40Gbps.
This patch also cleaned up the redundant redefiniton of ib speeds for
qedr.
Fixes: 12113a35ad ("IB/core: Add HDR speed enum")
Signed-off-by: Sagiv Ozeri <sagiv.ozeri@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet.
1) Use after free in __dev_map_entry_free(), from Eric Dumazet.
2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung
Cheng.
3) Orphan NFC, we'll take the patches directly into my tree. From
Johannes Berg.
4) We can't recycle cloned TCP skbs, from Eric Dumazet.
5) Some flow dissector bpf test fixes, from Stanislav Fomichev.
6) Fix RCU marking and warnings in rhashtable, from Herbert Xu.
7) Fix some potential fib6 leaks, from Eric Dumazet.
8) Fix a _decode_session4 uninitialized memory read bug fix that got
lost in a merge. From Florian Westphal.
9) Fix ipv6 source address routing wrt. exception route entries, from
Wei Wang.
10) The netdev_xmit_more() conversion was not done %100 properly in mlx5
driver, fix from Tariq Toukan.
11) Clean up botched merge on netfilter kselftest, from Florian
Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits)
of_net: fix of_get_mac_address retval if compiled without CONFIG_OF
net: fix kernel-doc warnings for socket.c
net: Treat sock->sk_drops as an unsigned int when printing
kselftests: netfilter: fix leftover net/net-next merge conflict
mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM
mlxsw: core: Prevent QSFP module initialization for old hardware
vsock/virtio: Initialize core virtio vsock before registering the driver
net/mlx5e: Fix possible modify header actions memory leak
net/mlx5e: Fix no rewrite fields with the same match
net/mlx5e: Additional check for flow destination comparison
net/mlx5e: Add missing ethtool driver info for representors
net/mlx5e: Fix number of vports for ingress ACL configuration
net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
net/mlx5e: Fix wrong xmit_more application
net/mlx5: Fix peer pf disable hca command
net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
net/mlx5: Add meaningful return codes to status_to_err function
net/mlx5: Imply MLXFW in mlx5_core
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
vsock/virtio: free packets during the socket release
...
To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.
vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.
vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.
Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.
Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.
Fixes: 5ae5162066 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This is being sent to get a fix for the gcc 9.1 build warnings, and I've
also pulled in some bug fix patches that were posted in the last two
weeks.
- Avoid the gcc 9.1 warning about overflowing a union member
- Fix the wrong callback type for a single response netlink to doit
- Bug fixes from more usage of the mlx5 devx interface
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzbYAsACgkQOG33FX4g
mxpzYw/9HxKMpU5QmHIpV17sVV5SSepfWVQ6YmrNMG5BTBI8by0zj58fJ9TLuNu+
OYMD6dS/baLeiN6jszec6zWufjUVfMU5aw1ja+iwF78fS8NmVXlrLz/xWmkLu4fi
pBN3PCt90ziCnVXOlsn55dKAcgmiaRws+TzGjGGvQP9IYpfO6kyj8HIrP6im910E
j41HcGrD1fMLy0js9Aq6OzMswbop8uFTV/UBp5onKASNPwAGlnigvjTKqnSlt+Vo
rswc/h8uIz1jnuH1s8EfggFY7nGqxNmq9G/UNBo/86JcLI97SaYN9pqQJ+HcEtDR
tJYoDr8PFDJcDaFpm0gbNK5pO9cS7X/I/NWZrdePywZAPAMFKXWgnUejLXVcPKd9
EdkWyg7sJxPHoo6CXrNECu7t/57q3E3qOG93HnXt64pJqv9C9lUmpGrvdv7PBVRK
6nVBysrkV0/27sBeZzul0teRbEqRii/RJ/iphE3w3hPx696Bi5uFzN/8M3tfavj1
pBX7eLAevA+yPlN7+sZiefPjeP0jsvwlzNdrP+9CmB5iIlj0yNlmTvT2rbv+hte0
0JTQvDilmC0e/W0KqQ6fGGfmPFBbHm/UDLu0h24qdw1qQXGOaDH6RRMslrtgNYNw
Mkc++uIC6/KdiehEzolht87FH4sMJrd0DS540WVqJqje7K3jyY8=
=Lo/s
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull more rdma updates from Jason Gunthorpe:
"This is being sent to get a fix for the gcc 9.1 build warnings, and
I've also pulled in some bug fix patches that were posted in the last
two weeks.
- Avoid the gcc 9.1 warning about overflowing a union member
- Fix the wrong callback type for a single response netlink to doit
- Bug fixes from more usage of the mlx5 devx interface"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
net/mlx5: Set completion EQs as shared resources
IB/mlx5: Verify DEVX general object type correctly
RDMA/core: Change system parameters callback from dumpit to doit
RDMA: Directly cast the sockaddr union to sockaddr
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.
Link: http://lkml.kernel.org/r/20190328084422.29911-8-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-8-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.
Link: http://lkml.kernel.org/r/20190328084422.29911-7-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-7-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.
[ira.weiny@intel.com: v3]
Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.
This patch does not change any functionality. New functionality will
follow in subsequent patches.
Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.
NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted. This breaks the current
GUP call site convention of having the returned pages be the final
parameter. So the suggestion was rejected.
Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pach series "Add FOLL_LONGTERM to GUP fast and use it".
HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages. These pages can be held for a significant time. But
get_user_pages_fast() does not protect against mapping FS DAX pages.
Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks. XDP has also
shown interest in using this functionality.[1]
In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.
[1] https://lkml.org/lkml/2019/3/19/939
"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move. I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem. Then I think we can change the flag to a
better name.
Secondly, it depends on how often you are registering memory. I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance. I don't have the numbers as the
tests for HFI1 were done a long time ago. But there was a significant
advantage. Some of which is probably due to the fact that you don't have
to hold mmap_sem.
Finally, architecturally I think it would be good for everyone to use
*_fast. There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well. Also
to this point others are looking to use *_fast.
As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same. I agree and I think further cleanup
will be coming. But I'm focused on getting the final solution for DAX at
the moment.
This patch (of 7):
This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast(). Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.
Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.
This patch does not change any functionality. In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked. However, callers of get_user_pages_fast()
were not "protected".
FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.
NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages. This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.
As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.
In review[1] it was asked:
<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?
A couple of points.
First "longterm" is a relative thing and at this point is probably a
misnomer. This is really flagging a pin which is going to be given to
hardware and can't move. I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem. Then I think we can
change the flag to a better name.
Second, It depends on how often you are registering memory. I have spoken
with some RDMA users who consider MR in the performance path... For the
overall application performance. I don't have the numbers as the tests
for HFI1 were done a long time ago. But there was a significant
advantage. Some of which is probably due to the fact that you don't have
to hold mmap_sem.
Finally, architecturally I think it would be good for everyone to use
*_fast. There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well. Also
to this point others are looking to use *_fast.
As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same. I agree and I think further cleanup
will be coming. But I'm focused on getting the final solution for DAX at
the moment.
</quote>
[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965
[ira.weiny@intel.com: v3]
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As the obj_id in the firmware is not globally unique in general_object,
the object type must be considered upon checking for a valid object id.
Fixes: 2351776e87 ("IB/mlx5: Verify DEVX object type")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
gcc 9 now does allocation size tracking and thinks that passing the member
of a union and then accessing beyond that member's bounds is an overflow.
Instead of using the union member, use the entire union with a cast to
get to the sockaddr. gcc will now know that the memory extends the full
size of the union.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This has been a smaller cycle than normal. One new driver was accepted,
which is unusual, and at least one more driver remains in review on the
list.
- Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma
- Many patches from MatthewW converting radix tree and IDR users to use
xarray
- Introduction of tracepoints to the MAD layer
- Build large SGLs at the start for DMA mapping and get the driver to
split them
- Generally clean SGL handling code throughout the subsystem
- Support for restricting RDMA devices to net namespaces for containers
- Progress to remove object allocation boilerplate code from drivers
- Change in how the mlx5 driver shows representor ports linked to VFs
- mlx5 uapi feature to access the on chip SW ICM memory
- Add a new driver for 'EFA'. This is HW that supports user space packet
processing through QPs in Amazon's cloud
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzTIU0ACgkQOG33FX4g
mxrGKQ/8CqpyvuCyZDW5ovO4DI4YlzYSPXehWlwxA4CWhU1AYTujutnNOdZdngnz
atTthOlJpZWJV26orvvzwIOi4qX/5UjLXEY3HYdn07JP1Z4iT7E3P4W2sdU3vdl3
j8bU7xM7ZWmnGxrBZ6yQlVRadEhB8+HJIZWMw+wx66cIPnvU+g9NgwouH67HEEQ3
PU8OCtGBwNNR508WPiZhjqMDfi/3BED4BfCihFhMbZEgFgObjRgtCV0M33SSXKcR
IO2FGNVuDAUBlND3vU9guW1+M77xE6p1GvzkIgdCp6qTc724NuO5F2ngrpHKRyZT
CxvBhAJI6tAZmjBVnmgVJex7rA8p+y/8M/2WD6GE3XSO89XVOkzNBiO2iTMeoxXr
+CX6VvP2BWwCArxsfKMgW3j0h/WVE9w8Ciej1628m1NvvKEV4AGIJC1g93lIJkRN
i3RkJ5PkIrdBrTEdKwDu1FdXQHaO7kGgKvwzJ7wBFhso8BRMrMfdULiMbaXs2Bw1
WdL5zoSe/bLUpPZxcT9IjXRxY5qR0FpIOoo6925OmvyYe/oZo1zbitS5GGbvV90g
tkq6Jb+aq8ZKtozwCo+oMcg9QPLYNibQsnkL3QirtURXWCG467xdgkaJLdF6s5Oh
cp+YBqbR/8HNMG/KQlCfnNQKp1ci8mG3EdthQPhvdcZ4jtbqnSI=
=TS64
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a smaller cycle than normal. One new driver was
accepted, which is unusual, and at least one more driver remains in
review on the list.
Summary:
- Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
vmw_pvrdma
- Many patches from MatthewW converting radix tree and IDR users to
use xarray
- Introduction of tracepoints to the MAD layer
- Build large SGLs at the start for DMA mapping and get the driver to
split them
- Generally clean SGL handling code throughout the subsystem
- Support for restricting RDMA devices to net namespaces for
containers
- Progress to remove object allocation boilerplate code from drivers
- Change in how the mlx5 driver shows representor ports linked to VFs
- mlx5 uapi feature to access the on chip SW ICM memory
- Add a new driver for 'EFA'. This is HW that supports user space
packet processing through QPs in Amazon's cloud"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
RDMA/ipoib: Allow user space differentiate between valid dev_port
IB/core, ipoib: Do not overreact to SM LID change event
RDMA/device: Don't fire uevent before device is fully initialized
lib/scatterlist: Remove leftover from sg_page_iter comment
RDMA/efa: Add driver to Kconfig/Makefile
RDMA/efa: Add the efa module
RDMA/efa: Add EFA verbs implementation
RDMA/efa: Add common command handlers
RDMA/efa: Implement functions that submit and complete admin commands
RDMA/efa: Add the ABI definitions
RDMA/efa: Add the com service API definitions
RDMA/efa: Add the efa_com.h file
RDMA/efa: Add the efa.h header file
RDMA/efa: Add EFA device definitions
RDMA: Add EFA related definitions
RDMA/umem: Remove hugetlb flag
RDMA/bnxt_re: Use core helpers to get aligned DMA address
RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
RDMA/umem: Add API to find best driver supported page size in an MR
...
Pull networking updates from David Miller:
"Highlights:
1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.
2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.
3) Make flow classifier more lockless, from Vlad Buslov.
4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.
5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.
6) Partial GSO offload support in XFRM, from Boris Pismenny.
7) Add fast link down support to ethtool, from Heiner Kallweit.
8) Use siphash for IP ID generator, from Eric Dumazet.
9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.
10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.
11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.
12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.
13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.
14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.
15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.
16) Remove SKB list implementation assumptions in SCTP, your's truly.
17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.
18) Add memory accounting on TX and RX path of SCTP, from Xin Long.
19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.
20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.
21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.
22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.
23) Allow DSA tag drivers to be modular, from Andrew Lunn.
24) Remove legacy DSA probing support, also from Andrew Lunn.
25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.
26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.
27) More indirect call optimizations, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
...
Add EFA Makefile and Kconfig.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add the main EFA module file which takes care of device
probe/initialization/registration/etc.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add a file that implements the EFA verbs.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
=aC8m
-----END PGP SIGNATURE-----
Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull mmiowb removal from Will Deacon:
"Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
architectures that need it, hide the barrier inside spin_unlock() when
MMIO has been performed inside the critical section.
The only relatively recent changes have been addressing review
comments on the documentation, which is in a much better shape thanks
to the efforts of Ben and Ingo.
I was initially planning to split this into two pull requests so that
you could run the coccinelle script yourself, however it's been plain
sailing in linux-next so I've just included the whole lot here to keep
things simple"
* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
arch: Remove dummy mmiowb() definitions from arch code
net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
i40iw: Redefine i40iw_mmiowb() to do nothing
scsi/qla1280: Remove stale comment about mmiowb()
drivers: Remove explicit invocations of mmiowb()
drivers: Remove useless trailing comments from mmiowb() invocations
Documentation: Kill all references to mmiowb()
riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
m68k/io: Remove useless definition of mmiowb()
nds32/io: Remove useless definition of mmiowb()
x86/io: Remove useless definition of mmiowb()
arm64/io: Remove useless definition of mmiowb()
ARM/io: Remove useless definition of mmiowb()
mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
...
Add the EFA common commands implementation.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Header file for the various commands that can be sent through admin queue.
This includes queue create/modify/destroy, setting up and remove
protection domains, address handlers, and memory registration, etc.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A helper header file for EFA admin queue, admin queue completion,
asynchronous notification queue, and various hardware configuration data
structures and functions.
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
EFA PCIe device implements a single Admin Queue (AQ) and Admin Completion
Queue (ACQ) pair to initialize and communicate configuration with the
device. Through this pair, we run set/get commands for querying and
configuring the device, create/modify/destroy queues, and IB specific
commands like Address Handler (AH), Memory Registration (MR) and
Protection Domains (PD).
In addition to admin (AQ/ACQ), we have data path queues that get
classified as Queue Pairs (QP) and Completion Queues (CQ).
Signed-off-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Call the core helpers to retrieve the HW aligned address to use for the
MR, within a supported bnxt_re page size.
Remove checking the umem->hugtetlb flag as it is no longer required. The
new DMA block iterator will return the 2M aligned address if the MR is
backed by 2M huge pages.
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Call the core helpers to retrieve the HW aligned address to use for the
MR, within a supported i40iw page size.
Remove code in i40iw to determine when MR is backed by 2M huge pages which
involves checking the umem->hugetlb flag and VMA inspection. The new DMA
iterator will return the 2M aligned address if the MR is backed by 2M
pages.
Fixes: f26c7c8339 ("i40iw: Add 2MB page support")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The work_item cancels that occur when a QP is destroyed can elicit the
following trace:
workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
Call Trace:
__flush_work.isra.29+0x8c/0x1a0
? __switch_to_asm+0x40/0x70
__cancel_work_timer+0x103/0x190
? schedule+0x32/0x80
iowait_cancel_work+0x15/0x30 [hfi1]
rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
rvt_destroy_qp+0x65/0x1f0 [rdmavt]
? _cond_resched+0x15/0x30
ib_destroy_qp+0xe9/0x230 [ib_core]
ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
process_one_work+0x171/0x370
worker_thread+0x49/0x3f0
kthread+0xf8/0x130
? max_active_store+0x80/0x80
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.
The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
entangled with memory reclaim, so this flag is appropriate.
Fixes: 0a226edd20 ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
MAYEXEC flag was mistakenly added in the commit cited in the fixes line.
Fixes: 4eb6ab13b9 ("RDMA: Remove rdma_user_mmap_page")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For DEVX users who have SYS_RAWIO capability, we set the internal device
resources capability when creating the UCTX. This will allow the device
to restrict the allocation of internal device resources such as SW ICM
memory to privileged DEVX users only.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds support for allocating, deallocating and registering a new
device memory type, STEERING_SW_ICM. This memory can be allocated and
used by a privileged user for direct rule insertion and management of the
device's steering tables.
The type is provided by the user via the dedicated attribute in the
alloc_dm ioctl command.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Adding a warning on allocated MEMIC buffers that weren't freed prior to
driver tear down.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch intoruduces a new mlx5_ib driver attribute to the DM allocation
method - the DM type.
In order to allow addition of new types in downstream patches this patch
also refactors the allocation, deallocation and registration handlers to
consider the requested type and perform the necessary actions according to
it.
Since not all future device memory types will be such that are mapped to
user memory, the mandatory page index output attribute is modified to be
optional.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5 misc updates:
1) Bodong Wang and Parav Pandit (6):
- Remove unused mlx5_query_nic_vport_vlans
- vport macros refactoring
- Fix vport access in E-Switch
- Use atomic rep state to serialize state change
2) Eli Britstein (2):
- prio tag mode support, added ACLs and replace TC vlan pop with
vlan 0 rewrite when prio tag mode is enabled.
3) Erez Alfasi (2):
- ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions
- mlx5e: ethtool, Add support for EEPROM high pages query
4) Masahiro Yamada (1):
- remove meaningless CFLAGS_tracepoint.o
5) Maxim Mikityanskiy (1):
- Put the common XDP code into a function
6) Tariq Toukan (2):
- Turn on HW tunnel offload in all TIRs
7) Vlad Buslov (1):
- Return error when trying to insert existing flower filter
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcyhIFAAoJEEg/ir3gV/o+LgsH/idNT42AQewm2gn1NAt/njRx
hA/ILH4ZmqYD8tgme5q3lByGrGRTweCPQ92+/tYP1i90PL8EJKNFbRPXuORp+hUk
m+ywoeyBHx0ZyDlAIGNDCFprY//jZV/3XQKuJhLUliGfN77lUSkVtIz2UY+cDr2U
XBn0B3Fy54+XP7EqVHXdxRkLiwDCsDwZBF6O9/1cw/rKsly6fIzw1b7UVjFaFA8f
1g5Ca/+v4X0Rsky1KOGLv8HVB4bxbiSZspAjKwVGJagPUNJMRR6xZyL+VNHWX71R
N68VMQQbwg7XDDFQNtYAFSpxOkAY+wilkRDe7+3A50cFE8ZYYskwVJunvb75fCA=
=oqb8
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-04-30
mlx5 misc updates:
1) Bodong Wang and Parav Pandit (6):
- Remove unused mlx5_query_nic_vport_vlans
- vport macros refactoring
- Fix vport access in E-Switch
- Use atomic rep state to serialize state change
2) Eli Britstein (2):
- prio tag mode support, added ACLs and replace TC vlan pop with
vlan 0 rewrite when prio tag mode is enabled.
3) Erez Alfasi (2):
- ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions
- mlx5e: ethtool, Add support for EEPROM high pages query
4) Masahiro Yamada (1):
- remove meaningless CFLAGS_tracepoint.o
5) Maxim Mikityanskiy (1):
- Put the common XDP code into a function
6) Tariq Toukan (2):
- Turn on HW tunnel offload in all TIRs
7) Vlad Buslov (1):
- Return error when trying to insert existing flower filter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of RoCE drivers figuring out vlan, smac fields while working on
QP/AH, provide a helper routine to read the L2 fields such as vlan_id and
source mac address.
This moves logic from mlx5 driver to core for wider usage for RoCE ports.
This is a preparation patch to allow detaching netdev in subsequent patch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Integrate iw_cm_verbs data members into ib_device_ops and ib_device
structs, this is done to achieve the following:
1) Avoid memory related bugs durring error unwind
2) Make the code more cleaner
3) Reduce code duplication
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The QP transition optional parameters for the various transition for XRC
QPs are identical to those for RC QPs.
Many of the XRC QP transition optional parameter bits are missing from the
QP optional mask table. These omissions caused failures when doing XRC QP
state transitions.
For example, when trying to change the response timer of an XRC receive QP
via the RTS2RTS transition, the new timer value was ignored because
MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask
for XRC qps for the RTS2RTS transition.
Fix this by adding the missing XRC optional parameters for all QP
transitions to the opt_mask table.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Fixes: a4774e9095 ("IB/mlx5: Fix opt param mask according to firmware spec")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.
1) From Aya: Enable general events on all physical link types and
restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma
driver to ethernet links only as it was intended.
2) From Eli: Introduce low level bits for prio tag mode
3) From Maor: Low level steering updates to support RDMA RX flow
steering and enables RoCE loopback traffic when switchdev is enabled.
4) From Vu and Parav: Two small mlx5 core cleanups
5) From Yevgeny add HW definitions of geneve offloads
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Subtype 'DELAY_DROP_TIMEOUT' (under 'GENERAL' event) is restricted to
Ethernet interfaces. This patch doesn't change functionality or breaks
current flow. In the downstream patch, non Ethernet (like IB) interfaces
will receive 'GENERAL' event.
Fixes: 5d3c537f90 ("net/mlx5: Handle event of power detection in the PCIE slot")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The mlx5 Sub-Function (SF) sub device will be introduced in
subsequent patches. It will be created as mediated device and
belong to mdev bus. It is necessary to treat dma operations on
PF, VF and SF in uniform way, hence reduce the dependency on
pdev pci dev struct and work directly out of newly introduced
'struct device' from previous patch.
This patch does not change any functionality.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.
Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().
Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:
@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)
@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the maximum send wr delivered by the user is zero, the qp does not
have a sq.
When allocating the sq db buffer to store the user sq pi pointer and map
it to the kernel mode, max_send_wr is used as the trigger condition, while
the kernel does not consider the max_send_wr trigger condition when
mapmping db. It will cause sq record doorbell map fail and create qp fail.
The failed print information as follows:
hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000
hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff
hns3 0000:7d:00.1: sq record doorbell map failed!
hns3 0000:7d:00.1: Create RC QP failed
Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ariel Levkovich says:
====================
The series exposes the ICM address of the receive transport
interface (TIR) of Raw Packet and RSS QPs to the user since they are
required to properly create and insert steering rules that direct flows to
these QPs.
====================
For dependencies this branch is based on mlx5-next from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* branch 'mlx5_tir_icm':
IB/mlx5: Expose TIR ICM address to user space
net/mlx5: Introduce new TIR creation core API
net/mlx5: Expose TIR ICM address in command outbox
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch exposes the TIR ICM address of raw packet and RSS
QPs to user space.
In order to pass the new field, the patch extends the mlx5
specific QP creation response structure and fills it with
the icm address returned by the FW command, if available.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe says:
====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
* BAR pages intended for read-only can be switched to writable via mprotect.
* Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
* Disassociate causes SIGBUS when touching the pages.
* CPU pages are being mapped through to the process via remap_pfn_range
instead of the more appropriate vm_insert_page, causing weird behaviors
during disassociation.
This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================
For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
* branch 'rdma_mmap':
RDMA: Remove rdma_user_mmap_page
RDMA/mlx5: Use get_zeroed_page() for clock_info
RDMA/ucontext: Fix regression with disassociate
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
RDMA/mlx5: Do not allow the user to write to the clock page
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Upon further research drivers that want this should simply call the core
function vm_insert_page(). The VMA holds a reference on the page and it
will be automatically freed when the last reference drops. No need for
disassociate to sequence the cleanup.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
get_zeroed_page() returns a virtual address for the page which is better
than allocating a struct page and doing a permanent kmap on it.
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.
Cc: stable@vger.kernel.org
Fixes: 5f9794dc94 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The intent of this VMA was to be read-only from user space, but the
VM_MAYWRITE masking was missed, so mprotect could make it writable.
Cc: stable@vger.kernel.org
Fixes: 5c99eaecb1 ("IB/mlx5: Mmap the HCA's clock info to user-space")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These counters are required for error analysis and debug.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The reference count adjustments on reference count completion
are open coded throughout.
Add a routine to do all reference count adjustments and use.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The currently include file ordering for rdmavt headers has an
ab/ba include issue the precludes using inlines from rdma_vt.h
in rdmavt_qp.h.
At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h.
Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h
and move qp related inlines to rdmavt_qp.h.
Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared
by rdmavt_cq.h and rdmavt_qp.h.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The opfn.h include file build-ablility depends on the including file
having the correct includes.
Fix by making opfn.h self sufficient.
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch fixes miscellaneous comment errors.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple
handles to PCI resource0. In order to continue to support expansion ROM
updates while the driver is loaded, the driver must now provide an
interface to control the expansion ROM write protection.
This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom
user tool to disable the expansion ROM write protection by opening the
file and writing a '1'. The write protection is released when writing a
'0' or automatically re-enabled when the file handle is closed. The
current implementation will only allow one handle to be opened at a time
across all hfi1 devices.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Verbs destroy callbacks are synchronous operations and can't be delayed.
The expectation is that after driver returned from destroy function, the
memory can be freed and user won't be able to access it again.
Ditch workqueue implementation used in HNS driver.
Fixes: d838c481e0 ("IB/hns: Fix the bug when destroy qp")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: oulijun <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
4+dwGBk=
=pyXR
-----END PGP SIGNATURE-----
Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next
Linux 5.1-rc1
We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Switchdev mode and mutiport RoCE mode aren't compatible at this point.
Don't create IB reps when a user switches to switchdev mode and the driver
operates in that mode.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When working in mutliport RoCE mode it is possible to attach a slave
before the master. In that case the slave is waiting for a master to be
attached. When the master is attached it goes over the list of waiting
slaves, finds a slave that is compatible and tries to bind it to itself.
The call stack is:
mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port()
In the bind function we will create a netdev notifier, but this is done
before we initialize the RoCE structure (this is done at a later stage by
the master in the ROCE stage).
Once events are delivered to that notifier we will use
mlx5_ib_get_native_port_mdev() to get the actual port and as the native
port is zero we will access an invalid index in the port structure.
Move the RoCE structure initialization to an earlier stage.
Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove the limitations that were in place and provide support for DEVX and
raw flow creation on reps.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed
only if HCA_CAP.eswitch_manager==1.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allow this only via mlx5 raw create flow API, legacy verbs are not
supported. To accommodate that, we add a new attribute to matcher creation
to indicate the type of flow table to be used.
MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE
With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer
needed, we keep it for compatibility but at most only a single attribute can
be passed of the two.
When inserting a flow rule to the FDB we require that a DEVX FT is
provided as a destination, no other configuration is allowed.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of failing the request, just use the supported number of flow
entries.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a specific prio inside the FDB namespace allow retrieving
it from the RDMA side.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a spelling mistake in a module parameter description. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command
since it tried to treat it as as QP create mailbox command instead of a
DCT create command.
The corrupted mailbox command causes userspace to malfunction as the
device doesn't create the QP as expected.
A new mlx5 capability is exposed to user-space which ensures that it will
not enable the feature on DCT without this fix in the kernel.
Fixes: 5d6ff1babe ("IB/mlx5: Support scatter to CQE for DC transport type")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.
Addresses-Coverity: ("Dereference null return")
Fixes: b38a0ad8ec ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently when the call to create_flow_rule_vport_sq fails, the error
check is being performed on err rather than on the return pointer
flow_rule. The return flow_rule maybe NULL (which is not considered an
error) or an error code, so check for the error on flow_rule.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: d5ed8ac34c ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Removing the use of IDR variable just to name the function ids. Using the
PCI_FUNC(pdev->devfn) instead to create the device name, associated
resources and to print driver into at various places.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.
This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.
Fixes: 70dcb2e3dc ("IB/hfi1: Add the TID second leg send packet builder")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a single IB device with multiple ports we can remove the
VF representor profile.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Move from IB device (representor) per virtual function to single IB device
with port per virtual function (port 1 represents the uplink). As number
of ports is a static property of an IB device, declare the IB device with
as many port as the possible according to the PCI bus.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
We store the SMI information in the core device's struct, make sure we set
that information only once (and not per port), while here make the for
loop based on the actual size of the array.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The design of representors is such that once an IB representor is created,
the netdev of representor already exists, we can use that fact to simplify
the netdev affinity code.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently the steering for SQs created on representors is done on
creation, once we move to representors as ports of an IB device we need
the port argument which is given only at the modify QP stage, adjust the
code appropriately.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In preparation of moving into a model of single IB device multiple ports
move rep to be part of the port structure. We mark a representor device by
setting is_rep, no functional change with this patch.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
On allocation we use the array size and on destruction num_ports, use the
array size of destruction as well, in this context the array corresponds
to the native/actual ports on the NIC so no need to adjust this logic for
representors.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In downstream patches we will need access to the ports before doing any
stages, in order to set net device per representor.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Simplify the code and move the deallocation of the IB device into the
remove function.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Netdev info is stored in a separate array and holds data relevant on a per
port basis, move it to be part of the port struct.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>