OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Mike Marciniszyn	9755f72496	IB/hfi1: Create inline to get extended headers This paves the way for another patch that reacts to a flush sdma completion for RC. Fixes: `81cd3891f0` ("IB/hfi1: Add support for 16B Management Packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-17 21:15:40 -04:00
Mike Marciniszyn	3230f4a8d4	IB/hfi1: Silence txreq allocation warnings The following warning can happen when a memory shortage occurs during txreq allocation: [10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) [10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016 [10220.939247] cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0 [10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1] [10220.939261] node 0: slabs: 1026568, objs: 43115856, free: 0 [10220.939262] Call Trace: [10220.939262] node 1: slabs: 820872, objs: 34476624, free: 0 [10220.939263] dump_stack+0x5a/0x73 [10220.939265] warn_alloc+0x103/0x190 [10220.939267] ? wake_all_kswapds+0x54/0x8b [10220.939268] __alloc_pages_slowpath+0x86c/0xa2e [10220.939270] ? __alloc_pages_nodemask+0x2fe/0x320 [10220.939271] __alloc_pages_nodemask+0x2fe/0x320 [10220.939273] new_slab+0x475/0x550 [10220.939275] ___slab_alloc+0x36c/0x520 [10220.939287] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939299] ? __get_txreq+0x54/0x160 [hfi1] [10220.939310] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939312] __slab_alloc+0x40/0x61 [10220.939323] ? hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939325] kmem_cache_alloc+0x181/0x1b0 [10220.939336] hfi1_make_rc_req+0x90/0x18b0 [hfi1] [10220.939348] ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1] [10220.939359] ? find_prev_entry+0xb0/0xb0 [hfi1] [10220.939371] hfi1_do_send+0x1d9/0x3f0 [hfi1] [10220.939372] process_one_work+0x171/0x380 [10220.939374] worker_thread+0x49/0x3f0 [10220.939375] kthread+0xf8/0x130 [10220.939377] ? max_active_store+0x80/0x80 [10220.939378] ? kthread_bind+0x10/0x10 [10220.939379] ret_from_fork+0x35/0x40 [10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) The shortage is handled properly so the message isn't needed. Silence by adding the no warn option to the slab allocation. Fixes: `45842abbb2` ("staging/rdma/hfi1: move txreq header code") Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-17 21:15:40 -04:00
Mike Marciniszyn	cf131a8196	IB/hfi1: Avoid hardlockup with flushlist_lock Heavy contention of the sde flushlist_lock can cause hard lockups at extreme scale when the flushing logic is under stress. Mitigate by replacing the item at a time copy to the local list with an O(1) list_splice_init() and using the high priority work queue to do the flushes. Fixes: `7724105686` ("IB/hfi1: add driver files") Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-17 21:15:40 -04:00
Ingo Molnar	23da766ab1	Linux 5.2-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Gj1MeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGctkH/0At3+SQPY2JJSy8 i6+TDeytFx9OggeGLPHChRfehkAlvMb/kd34QHnuEvDqUuCAMU6HZQJFKoK9mvFI sDJVayPGDSqpm+iv8qLpMBPShiCXYVnGZeVfOdv36jUswL0k6wHV1pz4avFkDeZa 1F4pmI6O2XRkNTYQawbUaFkAngWUCBG9ECLnHJnuIY6ohShBvjI4+E2JUaht+8gO M2h2b9ieddWmjxV3LTKgsK1v+347RljxdZTWnJ62SCDSEVZvsgSA9W2wnebVhBkJ drSmrFLxNiM+W45mkbUFmQixRSmjv++oRR096fxAnodBxMw0TDxE1RiMQWE6rVvG N6MC6xA= =+B0P -----END PGP SIGNATURE----- Merge tag 'v5.2-rc5' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-06-17 12:12:27 +02:00
Yuval Avnery	1f8a7bee27	net/mlx5: Add EQ enable/disable API Previously, EQ joined the chain notifier on creation. This forced the caller to be ready to handle events before creating the EQ through eq_create_generic interface. To help the caller control when the created EQ will be attached to the IRQ, add enable/disable API. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-13 10:59:49 -07:00
Ariel Levkovich	81bfa20603	net/mlx5: Use a single IRQ for all async EQs The patch modifies the IRQ allocation so that all async EQs are assigned to the same IRQ resulting in more available IRQs for completion EQs. The changes are using the support for IRQ sharing and EQ polling budget that was introduced in previous patches so when the shared interrupt is triggered, the kernel will serially call the handler of each of the sharing EQs with a certain budget of EQEs to poll in order to prevent starvation. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-13 10:59:49 -07:00
Yuval Avnery	24163189da	net/mlx5: Separate IRQ request/free from EQ life cycle Instead of requesting IRQ with eq creation, IRQs will be requested before EQ table creation. Instead of freeing the IRQs after EQ destroy, free IRQs after eq table destroy. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-13 10:59:49 -07:00
Yuval Avnery	ca390799c2	net/mlx5: Change interrupt handler to call chain notifier Multiple EQs may share the same IRQ in subsequent patches. Instead of calling the IRQ handler directly, the EQ will register to an atomic chain notfier. The Linux built-in shared IRQ is not used because it forces the caller to disable the IRQ and clear affinity before free_irq() can be called. This patch is the first step in the separation of IRQ and EQ logic. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-13 10:59:49 -07:00
Jason Gunthorpe	2d3c72ed50	rdma: Remove nes This driver was first merged over 10 years ago and has not seen major activity by the authors in the last 7 years. However, in that time it has been patched 150 times to adapt it to changing kernel APIs. Further, the hardware has several issues, like not supporting 64 bit DMA, that make it rather uninteresting for use with modern systems and RDMA. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-13 09:59:49 -04:00
Leon Romanovsky	e39afe3d6d	RDMA: Convert CQ allocations to be under core responsibility Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-11 16:39:49 -04:00
Leon Romanovsky	a52c8e2469	RDMA: Clean destroy CQ in drivers do not return errors Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-11 16:17:10 -04:00
Leon Romanovsky	147b308e6a	RDMA/nes: Avoid memory allocation during CQ destroy The memory allocation call can fail and cause to early return from nes_desotroy_cq() function. This situation will cause to memory leak of struct nes_cq. Rewrite function to avoid memory allocation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-11 16:12:34 -04:00
Mike Marciniszyn	cc78076af1	IB/hfi1: Correct tid qp rcd to match verbs context The qp priv rcd pointer doesn't match the context being used for verbs causing issues when 9B and kdeth packets are processed by different receive contexts and hence different CPUs. When running on different CPUs the following panic can occur: WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230 CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1 Call Trace: <IRQ> [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b [<ffffffffb74916d8>] __warn+0xd8/0x100 [<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80 [<ffffffffb7768671>] __list_del_entry+0xa1/0xd0 [<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1] [<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1] [<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1] [<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0 [<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80 [<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60 [<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150 [<ffffffffb742d504>] handle_irq+0xe4/0x1a0 [<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0 [<ffffffffb7b16362>] common_interrupt+0x162/0x162 <EOI> [<ffffffffb775a326>] ? memcpy+0x6/0x110 [<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs] [<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs] [<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs] [<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs] [<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs] [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f [<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs] [<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl] [<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl] [<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs] [<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl] [<ffffffffb74cee80>] ? wake_up_state+0x20/0x20 [<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs] [<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl] [<ffffffffb74bae31>] kthread+0xd1/0xe0 [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40 [<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40 Fix by reading the map entry in the same manner as the hardware so that the kdeth and verbs contexts match. Cc: <stable@vger.kernel.org> Fixes: `5190f052a3` ("IB/hfi1: Allow the driver to initialize QP priv struct") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-11 17:06:45 -03:00
Mike Marciniszyn	da9de5f852	IB/hfi1: Close PSM sdma_progress sleep window The call to sdma_progress() is called outside the wait lock. In this case, there is a race condition where sdma_progress() can return false and the sdma_engine can idle. If that happens, there will be no more sdma interrupts to cause the wakeup and the user_sdma xmit will hang. Fix by moving the lock to enclose the sdma_progress() call. Also, delete busycount. The need for this was removed by: commit `bcad29137a` ("IB/hfi1: Serve the most starved iowait entry first") Cc: <stable@vger.kernel.org> Fixes: `7724105686` ("IB/hfi1: add driver files") Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-11 17:06:45 -03:00
Kaike Wan	5f90677ed3	IB/hfi1: Validate fault injection opcode user input The opcode range for fault injection from user should be validated before it is applied to the fault->opcodes[] bitmap to avoid out-of-bound error. Cc: <stable@vger.kernel.org> Fixes: `a74d5307ca` ("IB/hfi1: Rework fault injection machinery") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-11 17:06:37 -03:00
Jason Gunthorpe	7a15414252	RDMA: Move owner into struct ib_device_ops This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-10 16:56:03 -03:00
Jason Gunthorpe	72c6ec18eb	RDMA: Move uverbs_abi_ver into struct ib_device_ops No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-10 16:56:02 -03:00
Jason Gunthorpe	b9560a419b	RDMA: Move driver_id into struct ib_device_ops No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-10 16:56:02 -03:00
Lijun Ou	4f18904c78	RDMA/hns: Bugfix for filling the sge of srq When user post recv a srq with multiple sges, the hardware will get the last correct sge and count the sge numbers according to the specific identifier with lkey. For example, when the driver fills the sges with every wr less than the max sge that the user configured when creating srq, the hardware will stop getting the sge according to the specific lkey in the sge. However, it will always end with the first sge in the current post srq recv interface implementation. Fixes: `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-07 15:01:05 -03:00
David S. Miller	a6cdeeb16b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-07 11:00:14 -07:00
Colin Ian King	fa027328a1	RDMA/hns: fix inverted logic of readl read and shift A previous change incorrectly changed the inverted logic and logically negated the readl rather than the shifted readl result. Fix this by adding in missing parentheses around the expression that needs to be logically negated. Addresses-Coverity: ("Logically dead code") Fixes: `669cefb654` ("RDMA/hns: Remove jiffies operation in disable interrupt context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-07 14:56:26 -03:00
Linus Torvalds	6e38335dcc	5.2 First rc pull request The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3 374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg +Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/ aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0= =RmxY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Things are looking pretty quiet here in RDMA, not too many bug fixes rolling in right now. The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/efa: Remove MAYEXEC flag check from mmap flow mlx5: avoid 64-bit division IB/hfi1: Validate page aligned for a given virtual address IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value IB/hfi1: Insure freeze_work work_struct is canceled on shutdown IB/rdmavt: Fix alloc_qpn() WARN_ON() RDMA/core: Fix panic when port_data isn't initialized RDMA/uverbs: Pass udata on uverbs error unwind RDMA/core: Clear out the udata before error unwind RDMA/hns: Fix PD memory leak for internal allocation RDMA/srp: Rename SRP sysfs name after IB device rename trigger	2019-06-07 09:25:27 -07:00
David S. Miller	6c018b738a	mlx5-updates-2019-05-31 This series provides some updates to mlx5 core and netdevice driver. 1) use __netdev_tx_sent_queue() to improve performance under GSO workload 2) Allow matching only enc_key_id/enc_dst_port for decapsulation action 3) Geneve support: This patchset adds support for GENEVE tunnel encap/decap flows offload: encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams. The driver supports 6081 destination UDP port number, which is the default IANA-assigned port. Encap: ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is provided by the mlx5 driver to the outgoing packet. Decap: Geneve header is matched and the packet is decapsulated. Notes about decap flows with Geneve TLV Options: - Support offloading of 32-bit options data only - At any given time, only one combination of class/type parameters can be offloaded, but the same class/type combination can have many different flows offloaded with different 32-bit option data - Options with value of 0 can't be offloaded Managing Geneve TLV options: Matching (on receive) is done by ConnectX-5 flex parser. Geneve TLV options are managed using General Object of type “Geneve TLV Options”. When the first flow with a certain class/type values is requested to be offloaded, the driver creates a FW object with FW command (Geneve TLV Options general object) and starts counting the number of flows using this object. During this time, any request with a different class/type values will fail to be offloaded. Once the refcount reaches 0, the driver destroys the TLV options general object, and can now offload a flow with any class/type parameters. Geneve TLV Options object is added to core device. It is currently used to manage Geneve TLV options general object allocation in FW and its reference counting only. In the future it will also be used for managing geneve ports by registering callbacks for ndo_udp_tunnel_add/del. TC tunnel code refactoring: As a preparation for Geneve code, the TC tunnel code in mlx5 was rearranged in a modular way, so that it would be easier to add future tunnels: - Defined tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way – using only the objects and their callbacks. 4) Termination tables: Actions in tables set with the termination flag are guaranteed to terminate the action list. Thus, potential looping functionality (e.g. haripin) can safely be executed without potential loops. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAlzxiMsACgkQSD+KveBX +j708ggAwjhVpazLCbo4kXfutln1eeQ6uImb2ivBDEIXjri3uK+GN5fWtqZVhg5v oRaTwdWAMZJFmEdvFKPOvAaqJwy3l3M1mXIjHYfQXpP8WYXYvteoq5AuSxqfEFcE wK127DRe2zcH75Q5Q8ObL1lMBVvYeu6xBnr3EQUaPFDF9hi4np+r5bJvhHwJzt7z lxdsGdxdTmqz3hw+rkp/Uuvx2Nniy5Tkm4zuNeQdoCtlYtqEs3dVFUpZqIfYgjdx hCZC1GEqKfLpdRU3qCW6HRaO2Yeok6a9QYbb70KUEeCVbwMXDnjohlz+61XJEd+M gp92vmf11tjSBruv56O8KfokFBIxUw== =oum3 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-05-31 This series provides some updates to mlx5 core and netdevice driver. 1) use __netdev_tx_sent_queue() to improve performance under GSO workload 2) Allow matching only enc_key_id/enc_dst_port for decapsulation action 3) Geneve support: This patchset adds support for GENEVE tunnel encap/decap flows offload: encapsulating layer 2 Ethernet frames within layer 4 UDP datagrams. The driver supports 6081 destination UDP port number, which is the default IANA-assigned port. Encap: ConnectX-5 inserts the header (w/ or w/o Geneve TLV options) that is provided by the mlx5 driver to the outgoing packet. Decap: Geneve header is matched and the packet is decapsulated. Notes about decap flows with Geneve TLV Options: - Support offloading of 32-bit options data only - At any given time, only one combination of class/type parameters can be offloaded, but the same class/type combination can have many different flows offloaded with different 32-bit option data - Options with value of 0 can't be offloaded Managing Geneve TLV options: Matching (on receive) is done by ConnectX-5 flex parser. Geneve TLV options are managed using General Object of type “Geneve TLV Options”. When the first flow with a certain class/type values is requested to be offloaded, the driver creates a FW object with FW command (Geneve TLV Options general object) and starts counting the number of flows using this object. During this time, any request with a different class/type values will fail to be offloaded. Once the refcount reaches 0, the driver destroys the TLV options general object, and can now offload a flow with any class/type parameters. Geneve TLV Options object is added to core device. It is currently used to manage Geneve TLV options general object allocation in FW and its reference counting only. In the future it will also be used for managing geneve ports by registering callbacks for ndo_udp_tunnel_add/del. TC tunnel code refactoring: As a preparation for Geneve code, the TC tunnel code in mlx5 was rearranged in a modular way, so that it would be easier to add future tunnels: - Defined tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way – using only the objects and their callbacks. 4) Termination tables: Actions in tables set with the termination flag are guaranteed to terminate the action list. Thus, potential looping functionality (e.g. haripin) can safely be executed without potential loops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-03 13:42:56 -07:00
Sebastian Andrzej Siewior	3bd3706251	sched/core: Provide a pointer to the valid CPU mask In commit: `4b53a3412d` ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not much difference in !RT but in RT we used this to implement migrate_disable(). Within a migrate_disable() section the CPU mask is restricted to single CPU while the "normal" CPU mask remains untouched. As an alternative implementation Ingo suggested to use: struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; }; with t->cpus_ptr = &t->cpus_mask; In -RT we then can switch the cpus_ptr to: t->cpus_ptr = &cpumask_of(task_cpu(p)); in a migration disabled region. The rules are simple: - Code that 'uses' ->cpus_allowed would use the pointer. - Code that 'modifies' ->cpus_allowed would use the direct mask. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-06-03 11:49:37 +02:00
Florian Westphal	2638eb8b50	net: ipv4: provide __rcu annotation for ifa_list ifa_list is protected by rcu, yet code doesn't reflect this. Add the __rcu annotations and fix up all places that are now reported by sparse. I've done this in the same commit to not add intermediate patches that result in new warnings. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-02 18:08:36 -07:00
Florian Westphal	cb8f1478ce	drivers: use in_dev_for_each_ifa_rtnl/rcu Like previous patches, use the new iterator macros to avoid sparse warnings once proper __rcu annotations are added. Compile tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-02 18:06:26 -07:00
Saeed Mahameed	7fe4d43ecc	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This series provides some low level updates for mlx5 driver needed for both rdma and netdev trees. 1) Termination flow steering table bits and hardware definitions. 2) Introduce the core dump HW access registers definitions. 3) Refactor and cleans-up VF representors functions handlers. 4) Renames host_params bits to function_changed bits and add the support for eswitch functions change event in the eswitch general case. (for both legacy and switchdev modes). 5) Potential error pointer dereference in error handling Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 13:04:06 -07:00
Parav Pandit	8693115af4	{IB,net}/mlx5: Constify rep ops functions pointers Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Parav Pandit	c94ff74877	{IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev* Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv* as it is not needed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-31 12:28:14 -07:00
Lijun Ou	97545b1022	RDMA/hns: Bugfix for posting multiple srq work request When the user submits more than 32 work request to a srq queue at a time, it needs to find the corresponding number of entries in the bitmap in the idx queue. However, the original lookup function named ffs only processes 32 bits of the array element, When the number of srq wqe issued exceeds 32, the ffs will only process the lower 32 bits of the elements, it will not be able to get the correct wqe index for srq wqe. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-31 16:11:02 -03:00
Gustavo A. R. Silva	6fe1a9b9b6	IB/hfi1: Use struct_size() helper Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace the following form: sizeof(struct opa_port_status_rsp) + num_vls * sizeof(struct _vls_pctrs) with: struct_size(rsp, vls, num_vls) and so on... Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-30 15:40:50 -03:00
Gustavo A. R. Silva	829ca44ecf	IB/qib: Use struct_size() helper Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, replace the following form: sizeof(pkt) + sizeof(pkt->addr[0])n with: struct_size(pkt, addr, n) Also, notice that variable size is unnecessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-30 15:40:50 -03:00
Gal Pressman	2367d00e2c	RDMA/efa: Remove unused includes Remove leftover includes that are no longer used from the driver. Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:20:48 -03:00
Gal Pressman	4d50e084c5	RDMA/efa: Use rdma block iterator in chunk list creation When creating the chunks list the rdma_for_each_block() iterator is used in order to iterate over the payload in EFA_CHUNK_PAYLOAD_SIZE (device defined) strides. Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:20:48 -03:00
Gal Pressman	e0e3f39759	RDMA/efa: Remove unneeded admin commands abort flow The admin commands abort flow is buggy (use-after-free) and not really necessary as it is guaranteed that after ib_unregister_device() is called there are no user verbs threads running in parallel, delete it. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:14:14 -03:00
Gal Pressman	255efcaeb6	RDMA/efa: Use kvzalloc instead of kzalloc with fallback Use kvzalloc which attempts to allocate a physically continuous buffer and fallbacks to virtually continuous on failure instead of open coding it in the driver. The is_vmalloc_addr function is used to determine whether the buffer is physically continuous or not (which determines direct vs indirect MR registration mode). Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:14:14 -03:00
Gal Pressman	4f240dfec6	RDMA/efa: Remove MAYEXEC flag check from mmap flow MAYEXEC test was mistakenly added, remove it. Checking MAYEXEC in the driver prevents it from working with userspace that uses things like EXEC STACK. (ie some Fortran and other runtimes) Fixes: `40909f664d` ("RDMA/efa: Add EFA verbs implementation") Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:13:03 -03:00
Michal Kubecek	37eb86c450	mlx5: avoid 64-bit division Commit `25c13324d0` ("IB/mlx5: Add steering SW ICM device memory type") breaks i386 build by introducing three 64-bit divisions. As the divisor is MLX5_SW_ICM_BLOCK_SIZE() which is always a power of 2, we can replace the division with bit operations. Fixes: `25c13324d0` ("IB/mlx5: Add steering SW ICM device memory type") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 13:03:21 -03:00
Dennis Dalessandro	5f5e4eb4fb	IB/hfi1: Remove extra brackets from an if A recent patch to hfi1 left behind a checkpatch error. Fixes: `fb24ea52f7` ("drivers: Remove explicit invocations of mmiowb()") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 12:59:48 -03:00
Kamenee Arumugam	97736f36db	IB/hfi1: Validate page aligned for a given virtual address User applications can register memory regions for TID buffers that are not aligned on page boundaries. Hfi1 is expected to pin those pages in memory and cache the pages with mmu_rb. The rb tree will fail to insert pages that are not aligned correctly. Validate whether a given virtual address is page aligned before pinning. Fixes: `7e7a436ecb` ("staging/hfi1: Add TID entry program function body") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 12:56:05 -03:00
Mike Marciniszyn	35164f5259	IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value The command 'ibv_devinfo -v' reports 0 for max_mr. Fix by assigning the query values after the mr lkey_table has been built rather than early on in the driver. Fixes: `7b1e2099ad` ("IB/rdmavt: Move memory registration into rdmavt") Reviewed-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 12:56:05 -03:00
Mike Marciniszyn	6d517353c7	IB/hfi1: Insure freeze_work work_struct is canceled on shutdown By code inspection, the freeze_work is never canceled. Fix by adding a cancel_work_sync in the shutdown path to insure it is no longer running. Fixes: `7724105686` ("IB/hfi1: add driver files") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-29 12:56:05 -03:00
John Hubbard	ea99697458	RDMA: Convert put_page() to put_user_page() For infiniband code that retains pages via get_user_pages(), release those pages via the new put_user_page(), or put_user_pages(), instead of put_page() This is a tiny part of the second step of fixing the problem described in [1]. The steps are: 1) Provide put_user_page() routines, intended to be used for releasing pages that were pinned via get_user_pages(). 2) Convert all of the call sites for get_user_pages(), to invoke put_user_page(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 20:11:11 -03:00
YueHaibing	cfcc048ca7	IB/hfi1: Remove set but not used variables 'offset' and 'fspsn' Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/hfi1/tid_rdma.c: In function tid_rdma_rcv_error: drivers/infiniband/hw/hfi1/tid_rdma.c:2029:7: warning: variable offset set but not used [-Wunused-but-set-variable] drivers/infiniband/hw/hfi1/tid_rdma.c: In function hfi1_rc_rcv_tid_rdma_ack: drivers/infiniband/hw/hfi1/tid_rdma.c:4555:35: warning: variable fspsn set but not used [-Wunused-but-set-variable] 'offset' is never used since introduction in commit `d0d564a1ca` ("IB/hfi1: Add functions to receive TID RDMA READ request") 'fspsn' is never used since introduciotn in commit `9e93e967f7` ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 20:09:16 -03:00
Lijun Ou	2a3d923f87	RDMA/hns: Replace magic numbers with #defines This patch makes the code more readable by removing magic numbers. Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 17:31:00 -03:00
Lang Cheng	669cefb654	RDMA/hns: Remove jiffies operation in disable interrupt context In some functions, the jiffies operation is unnecessary, and we can control delay using mdelay and udelay functions only. Especially, in hns_roce_v1_clear_hem, the function calls spin_lock_irqsave, the context disables interrupt, so we can not use jiffies and msleep functions. Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 17:28:39 -03:00
Lang Cheng	780f33962e	RDMA/hns: Move spin_lock_irqsave to the correct place When hip08 set gid, it will call spin_unlock_bh when send cmq. if main.ko call spin_lock_irqsave firstly, and the kernel is before commit `f71b74bca6` ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled"), it will cause WARN_ON_ONCE because of calling spin_unlock_bh in disable context. In fact, the spin_lock_irqsave in main.ko is only used for hip06, and should be placed in hns_roce_hw_v1.c. hns_roce_hw_v2.c uses its own spin_unlock_bh and does not need main.ko manage spin_lock. Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 17:27:59 -03:00
Lijun Ou	0502849d0b	RDMA/hns: Update CQE specifications According to hip08 UM, the maximum number of CQEs supported by each CQ is 4M. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 17:27:59 -03:00
Yixian Liu	8ffb813255	RDMA/hns: Remove unnecessary print message in aeq There is no need to print when communication is established, especially while lots of qp used by application. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 17:27:59 -03:00
Nirranjan Kirubaharan	f70baa7ee3	iw_cxgb4: Fix qpid leak Add await in destroy_qp() so that all references to qp are dereferenced and qp is freed in destroy_qp() itself. This ensures freeing of all QPs before invocation of dealloc_ucontext(), which prevents loss of in use qpids stored in the ucontext. Signed-off-by: Nirranjan Kirubaharan <nirranjan@chelsio.com> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:58:24 -03:00
Leon Romanovsky	cae626b978	RDMA/cxgb4: Don't expose DMA addresses Change unconditional print of DMA address to be printed with special printk format type specifier. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:32:17 -03:00
Leon Romanovsky	34d568930b	RDMA/cxgb4: Use sizeof() notation Convert various sizeof call sites to be written in standard format sizeof(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:32:17 -03:00
Leon Romanovsky	a80287c813	RDMA/cxgb3: Delete and properly mark unimplemented resize CQ function Resize CQ implementation was guarded by undeclared "notyet" define while cxgb3 was added to the kernel. Twelve years later, this call is still unimplemented, so safely delete it and fix improper return error code when .resize_cq() is not implemented. Fixes: `b038ced7b3` ("RDMA/cxgb3: Add driver for Chelsio T3 RNIC") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:24:04 -03:00
Leon Romanovsky	0ddf8f6267	RDMA/cxgb3: Don't expose DMA addresses DMA addresses like all other kernel addresses should be printed with special %p* formatter. It is needed to allow control of exposure of such information through a dedicated knob. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:24:03 -03:00
Leon Romanovsky	d34d37d5a1	RDMA/cxgb3: Use sizeof() notation instead of plain sizeof sizeof(a), sizeof a and sizeof (a) are all valid notations, but first is more readable format recommended by checkpatch.pl. Let's canonize it in cxgb3 drivers, so latter patches won't emit checkpatch warnings. As part of this change, a redundant memset() was removed. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-27 14:15:26 -03:00
Michal Kalderon	3576e99e08	qed*: Add iWARP 100g support Add iWARP engine affinity setting for supporting iWARP over 100g. iWARP cannot be distinguished by the LLH from L2, hence the engine division will affect L2 as well. For this reason we add a parameter to devlink to determine the engine division. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-26 13:04:12 -07:00
Michal Kalderon	443473d2f3	qedr: Change the MSI-X vectors selection to be based on affined engine Use the msix vectors of the affined hwfn and not the leading one. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-26 13:04:11 -07:00
Michal Kalderon	08eb1fb0f7	qed*: Change hwfn used for sb initialization When initializing status blocks use the affined hwfn instead of the leading one for RDMA / Storage Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-26 13:04:11 -07:00
Leon Romanovsky	62a38e704d	RDMA/efa: Remove check that prevents destroy of resources in error flows Drivers cannot check the udata for validity when doing destroy as there will be no way to report this error back to the uverbs. Since udata is new for destroy no driver should start to use it - instead drivers should opt for the ioctl interface and define it in a way where it cannot fail due to incorrect data. Remove the checks on udata construction so EFA is consistent with everything else. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 21:14:28 -03:00
Linus Torvalds	2c1212de6f	SPDX update for 5.2-rc2, round 1 Here are series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOP8uw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynmGQCgy3evqzleuOITDpuWaxewFdHqiJYAnA7KRw4H 1KwtfRnMtG6dk/XaS7H7 =O9lH -----END PGP SIGNATURE----- Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...	2019-05-21 12:33:38 -07:00
Leon Romanovsky	dab99af99c	RDMA/nes: Remove second wait queue initialization call The same wait queue is initialized a couple of lines above. Fixes: `3c2d774cad` ("RDMA/nes: Add a driver for NetEffect RNICs") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:50:53 -03:00
Leon Romanovsky	3bb58cfe07	RDMA/i40iw: Remove useless NULL checks There is no need to check existence of structures to be destroyed. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:50:53 -03:00
Leon Romanovsky	269c97fd48	RDMA/nes: Remove useless NULL checks The destroy functions are always called with relevant structs, there is no need to check their existence. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:50:53 -03:00
Yuval Shaia	8ce0048f76	IB/mlx4: Delete unused func arg The function argument virt_addr is not in use - delete it. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:27:25 -03:00
Leon Romanovsky	619122be3d	RDMA/hns: Fix PD memory leak for internal allocation free_pd is allocated internally by the driver hence needs to be freed internally too or it leaks. Fixes: `21a428a019` ("RDMA: Handle PD allocations by IB/core") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:25:37 -03:00
Jason Gunthorpe	d2183c6f19	RDMA/umem: Move page_shift from ib_umem to ib_odp_umem This value has always been set to PAGE_SHIFT in the core code, the only thing that does differently was the ODP path. Move the value into the ODP struct and still use it for ODP, but change all the non-ODP things to just use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-05-21 15:23:24 -03:00
Sagiv Ozeri	69054666df	RDMA/qedr: Fix incorrect device rate. Use the correct enum value introduced in commit `12113a35ad` ("IB/core: Add HDR speed enum") Prior to this change a 50Gbps port would show 40Gbps. This patch also cleaned up the redundant redefiniton of ib speeds for qedr. Fixes: `12113a35ad` ("IB/core: Add HDR speed enum") Signed-off-by: Sagiv Ozeri <sagiv.ozeri@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-21 15:04:53 -03:00
Thomas Gleixner	ec8f24b7fa	treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:46 +02:00
Linus Torvalds	78e0365184	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung Cheng. 3) Orphan NFC, we'll take the patches directly into my tree. From Johannes Berg. 4) We can't recycle cloned TCP skbs, from Eric Dumazet. 5) Some flow dissector bpf test fixes, from Stanislav Fomichev. 6) Fix RCU marking and warnings in rhashtable, from Herbert Xu. 7) Fix some potential fib6 leaks, from Eric Dumazet. 8) Fix a _decode_session4 uninitialized memory read bug fix that got lost in a merge. From Florian Westphal. 9) Fix ipv6 source address routing wrt. exception route entries, from Wei Wang. 10) The netdev_xmit_more() conversion was not done %100 properly in mlx5 driver, fix from Tariq Toukan. 11) Clean up botched merge on netfilter kselftest, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits) of_net: fix of_get_mac_address retval if compiled without CONFIG_OF net: fix kernel-doc warnings for socket.c net: Treat sock->sk_drops as an unsigned int when printing kselftests: netfilter: fix leftover net/net-next merge conflict mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM mlxsw: core: Prevent QSFP module initialization for old hardware vsock/virtio: Initialize core virtio vsock before registering the driver net/mlx5e: Fix possible modify header actions memory leak net/mlx5e: Fix no rewrite fields with the same match net/mlx5e: Additional check for flow destination comparison net/mlx5e: Add missing ethtool driver info for representors net/mlx5e: Fix number of vports for ingress ACL configuration net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled net/mlx5e: Fix wrong xmit_more application net/mlx5: Fix peer pf disable hca command net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index net/mlx5: Add meaningful return codes to status_to_err function net/mlx5: Imply MLXFW in mlx5_core Revert "tipc: fix modprobe tipc failed after switch order of device registration" vsock/virtio: free packets during the socket release ...	2019-05-20 08:21:07 -07:00
Parav Pandit	02f3afd975	net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: `5ae5162066` ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-17 13:16:47 -07:00
Linus Torvalds	5ac9433224	5.2 Merge Window second pull request This is being sent to get a fix for the gcc 9.1 build warnings, and I've also pulled in some bug fix patches that were posted in the last two weeks. - Avoid the gcc 9.1 warning about overflowing a union member - Fix the wrong callback type for a single response netlink to doit - Bug fixes from more usage of the mlx5 devx interface -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzbYAsACgkQOG33FX4g mxpzYw/9HxKMpU5QmHIpV17sVV5SSepfWVQ6YmrNMG5BTBI8by0zj58fJ9TLuNu+ OYMD6dS/baLeiN6jszec6zWufjUVfMU5aw1ja+iwF78fS8NmVXlrLz/xWmkLu4fi pBN3PCt90ziCnVXOlsn55dKAcgmiaRws+TzGjGGvQP9IYpfO6kyj8HIrP6im910E j41HcGrD1fMLy0js9Aq6OzMswbop8uFTV/UBp5onKASNPwAGlnigvjTKqnSlt+Vo rswc/h8uIz1jnuH1s8EfggFY7nGqxNmq9G/UNBo/86JcLI97SaYN9pqQJ+HcEtDR tJYoDr8PFDJcDaFpm0gbNK5pO9cS7X/I/NWZrdePywZAPAMFKXWgnUejLXVcPKd9 EdkWyg7sJxPHoo6CXrNECu7t/57q3E3qOG93HnXt64pJqv9C9lUmpGrvdv7PBVRK 6nVBysrkV0/27sBeZzul0teRbEqRii/RJ/iphE3w3hPx696Bi5uFzN/8M3tfavj1 pBX7eLAevA+yPlN7+sZiefPjeP0jsvwlzNdrP+9CmB5iIlj0yNlmTvT2rbv+hte0 0JTQvDilmC0e/W0KqQ6fGGfmPFBbHm/UDLu0h24qdw1qQXGOaDH6RRMslrtgNYNw Mkc++uIC6/KdiehEzolht87FH4sMJrd0DS540WVqJqje7K3jyY8= =Lo/s -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull more rdma updates from Jason Gunthorpe: "This is being sent to get a fix for the gcc 9.1 build warnings, and I've also pulled in some bug fix patches that were posted in the last two weeks. - Avoid the gcc 9.1 warning about overflowing a union member - Fix the wrong callback type for a single response netlink to doit - Bug fixes from more usage of the mlx5 devx interface" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net/mlx5: Set completion EQs as shared resources IB/mlx5: Verify DEVX general object type correctly RDMA/core: Change system parameters callback from dumpit to doit RDMA: Directly cast the sockaddr union to sockaddr	2019-05-14 20:56:31 -07:00
Ira Weiny	f3b4fdb18c	IB/mthca: use the new FOLL_LONGTERM flag to get_user_pages_fast() Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. Link: http://lkml.kernel.org/r/20190328084422.29911-8-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-8-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:46 -07:00
Ira Weiny	664b21e717	IB/qib: use the new FOLL_LONGTERM flag to get_user_pages_fast() Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. Link: http://lkml.kernel.org/r/20190328084422.29911-7-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-7-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:46 -07:00
Ira Weiny	9fdf4aa156	IB/hfi1: use the new FOLL_LONGTERM flag to get_user_pages_fast() Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS DAX pages being mapped. [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:46 -07:00
Ira Weiny	73b0140bf0	mm/gup: change GUP fast to use flags rather than a write 'bool' To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:46 -07:00
Ira Weiny	932f4a630a	mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM Pach series "Add FOLL_LONGTERM to GUP fast and use it". HFI1, qib, and mthca, use get_user_pages_fast() due to its performance advantages. These pages can be held for a significant time. But get_user_pages_fast() does not protect against mapping FS DAX pages. Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which retains the performance while also adding the FS DAX checks. XDP has also shown interest in using this functionality.[1] In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and remove the specialized get_user_pages_longterm call. [1] https://lkml.org/lkml/2019/3/19/939 "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Secondly, it depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use _fast. There are patches submitted to the RDMA list which would allow the use of _fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use _fast. As an aside, Jasons pointed out in my previous submission that _fast and _unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. This patch (of 7): This patch starts a series which aims to support FOLL_LONGTERM in get_user_pages_fast(). Some callers who would like to do a longterm (user controlled pin) of pages with the fast variant of GUP for performance purposes. Rather than have a separate get_user_pages_longterm() call, introduce FOLL_LONGTERM and change the longterm callers to use it. This patch does not change any functionality. In the short term "longterm" or user controlled pins are unsafe for Filesystems and FS DAX in particular has been blocked. However, callers of get_user_pages_fast() were not "protected". FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it requires vmas to determine if DAX is in use. NOTE: In merging with the CMA changes we opt to change the get_user_pages() call in check_and_migrate_cma_pages() to a call of __get_user_pages_locked() on the newly migrated pages. This makes the code read better in that we are calling __get_user_pages_locked() on the pages before and after a potential migration. As a side affect some of the interfaces are cleaned up but this is not the primary purpose of the series. In review[1] it was asked: <quote> > This I don't get - if you do lock down long term mappings performance > of the actual get_user_pages call shouldn't matter to start with. > > What do I miss? A couple of points. First "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Second, It depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use _fast. There are patches submitted to the RDMA list which would allow the use of _fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use _fast. As an asside, Jasons pointed out in my previous submission that _fast and _unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. </quote> [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965 [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:45 -07:00
Yishai Hadas	cd5d20f13f	IB/mlx5: Verify DEVX general object type correctly As the obj_id in the firmware is not globally unique in general_object, the object type must be considered upon checking for a valid object id. Fixes: `2351776e87` ("IB/mlx5: Verify DEVX object type") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-14 10:22:09 -03:00
Jason Gunthorpe	641114d2af	RDMA: Directly cast the sockaddr union to sockaddr gcc 9 now does allocation size tracking and thinks that passing the member of a union and then accessing beyond that member's bounds is an overflow. Instead of using the union member, use the entire union with a cast to get to the sockaddr. gcc will now know that the memory extends the full size of the union. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-13 22:16:38 -03:00
Linus Torvalds	dce45af5c2	5.2 Merge Window pull request This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzTIU0ACgkQOG33FX4g mxrGKQ/8CqpyvuCyZDW5ovO4DI4YlzYSPXehWlwxA4CWhU1AYTujutnNOdZdngnz atTthOlJpZWJV26orvvzwIOi4qX/5UjLXEY3HYdn07JP1Z4iT7E3P4W2sdU3vdl3 j8bU7xM7ZWmnGxrBZ6yQlVRadEhB8+HJIZWMw+wx66cIPnvU+g9NgwouH67HEEQ3 PU8OCtGBwNNR508WPiZhjqMDfi/3BED4BfCihFhMbZEgFgObjRgtCV0M33SSXKcR IO2FGNVuDAUBlND3vU9guW1+M77xE6p1GvzkIgdCp6qTc724NuO5F2ngrpHKRyZT CxvBhAJI6tAZmjBVnmgVJex7rA8p+y/8M/2WD6GE3XSO89XVOkzNBiO2iTMeoxXr +CX6VvP2BWwCArxsfKMgW3j0h/WVE9w8Ciej1628m1NvvKEV4AGIJC1g93lIJkRN i3RkJ5PkIrdBrTEdKwDu1FdXQHaO7kGgKvwzJ7wBFhso8BRMrMfdULiMbaXs2Bw1 WdL5zoSe/bLUpPZxcT9IjXRxY5qR0FpIOoo6925OmvyYe/oZo1zbitS5GGbvV90g tkq6Jb+aq8ZKtozwCo+oMcg9QPLYNibQsnkL3QirtURXWCG467xdgkaJLdF6s5Oh cp+YBqbR/8HNMG/KQlCfnNQKp1ci8mG3EdthQPhvdcZ4jtbqnSI= =TS64 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...	2019-05-09 09:02:46 -07:00
Linus Torvalds	80f232121b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...	2019-05-07 22:03:58 -07:00
Gal Pressman	f23afd75fc	RDMA/efa: Add driver to Kconfig/Makefile Add EFA Makefile and Kconfig. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-07 12:47:47 -03:00
Gal Pressman	b7f5e880f3	RDMA/efa: Add the efa module Add the main EFA module file which takes care of device probe/initialization/registration/etc. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-07 12:47:47 -03:00
Gal Pressman	40909f664d	RDMA/efa: Add EFA verbs implementation Add a file that implements the EFA verbs. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-07 12:47:47 -03:00
Linus Torvalds	dd4e5d6106	Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Remove mmiowb() from the kernel memory barrier API and instead, for architectures that need it, hide the barrier inside spin_unlock() when MMIO has been performed inside the critical section. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/ VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1 GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7 yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL WecjKC9WqBxrGY+4ew6YJP70ijLBCw== =aC8m -----END PGP SIGNATURE----- Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull mmiowb removal from Will Deacon: "Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb()) Remove mmiowb() from the kernel memory barrier API and instead, for architectures that need it, hide the barrier inside spin_unlock() when MMIO has been performed inside the critical section. The only relatively recent changes have been addressing review comments on the documentation, which is in a much better shape thanks to the efforts of Ben and Ingo. I was initially planning to split this into two pull requests so that you could run the coccinelle script yourself, however it's been plain sailing in linux-next so I've just included the whole lot here to keep things simple" * tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits) docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section arch: Remove dummy mmiowb() definitions from arch code net/ethernet/silan/sc92031: Remove stale comment about mmiowb() i40iw: Redefine i40iw_mmiowb() to do nothing scsi/qla1280: Remove stale comment about mmiowb() drivers: Remove explicit invocations of mmiowb() drivers: Remove useless trailing comments from mmiowb() invocations Documentation: Kill all references to mmiowb() riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock() m68k/io: Remove useless definition of mmiowb() nds32/io: Remove useless definition of mmiowb() x86/io: Remove useless definition of mmiowb() arm64/io: Remove useless definition of mmiowb() ARM/io: Remove useless definition of mmiowb() mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors ...	2019-05-06 16:57:52 -07:00
Gal Pressman	e9c6c53730	RDMA/efa: Add common command handlers Add the EFA common commands implementation. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 15:18:18 -03:00
Gal Pressman	0420e54256	RDMA/efa: Implement functions that submit and complete admin commands Add admin commands submissions/completions implementation. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 15:18:18 -03:00
Gal Pressman	cd9b3d5970	RDMA/efa: Add the com service API definitions Header file for the various commands that can be sent through admin queue. This includes queue create/modify/destroy, setting up and remove protection domains, address handlers, and memory registration, etc. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 15:18:17 -03:00
Gal Pressman	43eaa49d51	RDMA/efa: Add the efa_com.h file A helper header file for EFA admin queue, admin queue completion, asynchronous notification queue, and various hardware configuration data structures and functions. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 15:18:17 -03:00
Gal Pressman	853f565235	RDMA/efa: Add the efa.h header file Add EFA driver generic header file defining driver's device independent internal data structures and definitions. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 15:18:17 -03:00
Gal Pressman	01edac3aa2	RDMA/efa: Add EFA device definitions EFA PCIe device implements a single Admin Queue (AQ) and Admin Completion Queue (ACQ) pair to initialize and communicate configuration with the device. Through this pair, we run set/get commands for querying and configuring the device, create/modify/destroy queues, and IB specific commands like Address Handler (AH), Memory Registration (MR) and Protection Domains (PD). In addition to admin (AQ/ACQ), we have data path queues that get classified as Queue Pairs (QP) and Completion Queues (CQ). Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 13:47:50 -03:00
Shiraz Saleem	d85582517e	RDMA/bnxt_re: Use core helpers to get aligned DMA address Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported bnxt_re page size. Remove checking the umem->hugtetlb flag as it is no longer required. The new DMA block iterator will return the 2M aligned address if the MR is backed by 2M huge pages. Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 13:08:11 -03:00
Shiraz Saleem	eb52c0333f	RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages which involves checking the umem->hugetlb flag and VMA inspection. The new DMA iterator will return the 2M aligned address if the MR is backed by 2M pages. Fixes: `f26c7c8339` ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 13:08:11 -03:00
Mike Marciniszyn	4c4b1996b5	IB/hfi1: Fix WQ_MEM_RECLAIM warning The work_item cancels that occur when a QP is destroyed can elicit the following trace: workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 Call Trace: __flush_work.isra.29+0x8c/0x1a0 ? __switch_to_asm+0x40/0x70 __cancel_work_timer+0x103/0x190 ? schedule+0x32/0x80 iowait_cancel_work+0x15/0x30 [hfi1] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] rvt_destroy_qp+0x65/0x1f0 [rdmavt] ? _cond_resched+0x15/0x30 ib_destroy_qp+0xe9/0x230 [ib_core] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become entangled with memory reclaim, so this flag is appropriate. Fixes: `0a226edd20` ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:57:45 -03:00
Leon Romanovsky	10bf13c334	RDMA/mlx5: Remove MAYEXEC flag MAYEXEC flag was mistakenly added in the commit cited in the fixes line. Fixes: `4eb6ab13b9` ("RDMA: Remove rdma_user_mmap_page") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:56:55 -03:00
Ariel Levkovich	33cde96fb5	IB/mlx5: Device resource control for privileged DEVX user For DEVX users who have SYS_RAWIO capability, we set the internal device resources capability when creating the UCTX. This will allow the device to restrict the allocation of internal device resources such as SW ICM memory to privileged DEVX users only. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:51:51 -03:00
Ariel Levkovich	25c13324d0	IB/mlx5: Add steering SW ICM device memory type This patch adds support for allocating, deallocating and registering a new device memory type, STEERING_SW_ICM. This memory can be allocated and used by a privileged user for direct rule insertion and management of the device's steering tables. The type is provided by the user via the dedicated attribute in the alloc_dm ioctl command. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:51:51 -03:00
Ariel Levkovich	4056b12efd	IB/mlx5: Warn on allocated MEMIC buffers during cleanup Adding a warning on allocated MEMIC buffers that weren't freed prior to driver tear down. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:51:51 -03:00
Ariel Levkovich	3b113a1ec3	IB/mlx5: Support device memory type attribute This patch intoruduces a new mlx5_ib driver attribute to the DM allocation method - the DM type. In order to allow addition of new types in downstream patches this patch also refactors the allocation, deallocation and registration handlers to consider the requested type and perform the necessary actions according to it. Since not all future device memory types will be such that are mapped to user memory, the mandatory page index output attribute is modified to be optional. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-06 12:51:50 -03:00
David S. Miller	f3f050a4df	mlx5-updates-2019-04-30 mlx5 misc updates: 1) Bodong Wang and Parav Pandit (6): - Remove unused mlx5_query_nic_vport_vlans - vport macros refactoring - Fix vport access in E-Switch - Use atomic rep state to serialize state change 2) Eli Britstein (2): - prio tag mode support, added ACLs and replace TC vlan pop with vlan 0 rewrite when prio tag mode is enabled. 3) Erez Alfasi (2): - ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions - mlx5e: ethtool, Add support for EEPROM high pages query 4) Masahiro Yamada (1): - remove meaningless CFLAGS_tracepoint.o 5) Maxim Mikityanskiy (1): - Put the common XDP code into a function 6) Tariq Toukan (2): - Turn on HW tunnel offload in all TIRs 7) Vlad Buslov (1): - Return error when trying to insert existing flower filter -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJcyhIFAAoJEEg/ir3gV/o+LgsH/idNT42AQewm2gn1NAt/njRx hA/ILH4ZmqYD8tgme5q3lByGrGRTweCPQ92+/tYP1i90PL8EJKNFbRPXuORp+hUk m+ywoeyBHx0ZyDlAIGNDCFprY//jZV/3XQKuJhLUliGfN77lUSkVtIz2UY+cDr2U XBn0B3Fy54+XP7EqVHXdxRkLiwDCsDwZBF6O9/1cw/rKsly6fIzw1b7UVjFaFA8f 1g5Ca/+v4X0Rsky1KOGLv8HVB4bxbiSZspAjKwVGJagPUNJMRR6xZyL+VNHWX71R N68VMQQbwg7XDDFQNtYAFSpxOkAY+wilkRDe7+3A50cFE8ZYYskwVJunvb75fCA= =oqb8 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-04-30 mlx5 misc updates: 1) Bodong Wang and Parav Pandit (6): - Remove unused mlx5_query_nic_vport_vlans - vport macros refactoring - Fix vport access in E-Switch - Use atomic rep state to serialize state change 2) Eli Britstein (2): - prio tag mode support, added ACLs and replace TC vlan pop with vlan 0 rewrite when prio tag mode is enabled. 3) Erez Alfasi (2): - ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitions - mlx5e: ethtool, Add support for EEPROM high pages query 4) Masahiro Yamada (1): - remove meaningless CFLAGS_tracepoint.o 5) Maxim Mikityanskiy (1): - Put the common XDP code into a function 6) Tariq Toukan (2): - Turn on HW tunnel offload in all TIRs 7) Vlad Buslov (1): - Return error when trying to insert existing flower filter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-04 00:25:02 -04:00
Parav Pandit	a70c07397f	RDMA: Introduce and use GID attr helper to read RoCE L2 fields Instead of RoCE drivers figuring out vlan, smac fields while working on QP/AH, provide a helper routine to read the L2 fields such as vlan_id and source mac address. This moves logic from mlx5 driver to core for wider usage for RoCE ports. This is a preparation patch to allow detaching netdev in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-03 11:10:02 -03:00
Kamal Heib	dd05cb828d	RDMA: Get rid of iw_cm_verbs Integrate iw_cm_verbs data members into ib_device_ops and ib_device structs, this is done to achieve the following: 1) Avoid memory related bugs durring error unwind 2) Make the code more cleaner 3) Reduce code duplication Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-03 10:56:56 -03:00
Jack Morgenstein	8f4426aa19	IB/mlx5: Add missing XRC options to QP optional params mask The QP transition optional parameters for the various transition for XRC QPs are identical to those for RC QPs. Many of the XRC QP transition optional parameter bits are missing from the QP optional mask table. These omissions caused failures when doing XRC QP state transitions. For example, when trying to change the response timer of an XRC receive QP via the RTS2RTS transition, the new timer value was ignored because MLX5_QP_OPTPAR_RNR_TIMEOUT bit was missing from the optional params mask for XRC qps for the RTS2RTS transition. Fix this by adding the missing XRC optional parameters for all QP transitions to the opt_mask table. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Fixes: `a4774e9095` ("IB/mlx5: Fix opt param mask according to firmware spec") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-03 10:15:13 -03:00
David S. Miller	ff24e4980a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-02 22:14:21 -04:00
Saeed Mahameed	c515e70d67	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Aya: Enable general events on all physical link types and restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma driver to ethernet links only as it was intended. 2) From Eli: Introduce low level bits for prio tag mode 3) From Maor: Low level steering updates to support RDMA RX flow steering and enables RoCE loopback traffic when switchdev is enabled. 4) From Vu and Parav: Two small mlx5 core cleanups 5) From Yevgeny add HW definitions of geneve offloads Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-01 13:57:48 -07:00
Aya Levin	6cfdc7e468	IB/mlx5: Restrict 'DELAY_DROP_TIMEOUT' subtype to Ethernet interfaces Subtype 'DELAY_DROP_TIMEOUT' (under 'GENERAL' event) is restricted to Ethernet interfaces. This patch doesn't change functionality or breaks current flow. In the downstream patch, non Ethernet (like IB) interfaces will receive 'GENERAL' event. Fixes: `5d3c537f90` ("net/mlx5: Handle event of power detection in the PCIE slot") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-29 16:55:05 -07:00
Vu Pham	c42260f195	net/mlx5: Separate and generalize dma device from pci device The mlx5 Sub-Function (SF) sub device will be introduced in subsequent patches. It will be created as mediated device and belong to mdev bus. It is necessary to treat dma operations on PF, VF and SF in uniform way, hence reduce the dependency on pdev pci dev struct and work directly out of newly introduced 'struct device' from previous patch. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-29 16:55:05 -07:00
Michal Kubecek	ae0be8de9a	netlink: make nla_nest_start() add NLA_F_NESTED flag Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 \| NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
Matthew Wilcox	a7b36d5fa8	ib/bnxt: Remove mention of idr_alloc from comment Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 12:02:18 -03:00
Lijun Ou	2557fabd6e	RDMA/hns: Bugfix for mapping user db When the maximum send wr delivered by the user is zero, the qp does not have a sq. When allocating the sq db buffer to store the user sq pi pointer and map it to the kernel mode, max_send_wr is used as the trigger condition, while the kernel does not consider the max_send_wr trigger condition when mapmping db. It will cause sq record doorbell map fail and create qp fail. The failed print information as follows: hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000 hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff hns3 0000:7d:00.1: sq record doorbell map failed! hns3 0000:7d:00.1: Create RC QP failed Fixes: `0425e3e6e0` ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:40:04 -03:00
Jason Gunthorpe	1d045aa76f	Merge branch 'mlx5_tir_icm' into rdma.git for-next Ariel Levkovich says: ==================== The series exposes the ICM address of the receive transport interface (TIR) of Raw Packet and RSS QPs to the user since they are required to properly create and insert steering rules that direct flows to these QPs. ==================== For dependencies this branch is based on mlx5-next from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'mlx5_tir_icm': IB/mlx5: Expose TIR ICM address to user space net/mlx5: Introduce new TIR creation core API net/mlx5: Expose TIR ICM address in command outbox Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:33:00 -03:00
Ariel Levkovich	1f1d6abbf0	IB/mlx5: Expose TIR ICM address to user space This patch exposes the TIR ICM address of raw packet and RSS QPs to user space. In order to pass the new field, the patch extends the mlx5 specific QP creation response structure and fills it with the icm address returned by the FW command, if available. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:03:19 -03:00
Jason Gunthorpe	449a224c10	Merge branch 'rdma_mmap' into rdma.git for-next Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 16:20:34 -03:00
Jason Gunthorpe	4eb6ab13b9	RDMA: Remove rdma_user_mmap_page Upon further research drivers that want this should simply call the core function vm_insert_page(). The VMA holds a reference on the page and it will be automatically freed when the last reference drops. No need for disassociate to sequence the cleanup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 16:18:36 -03:00
Jason Gunthorpe	ddcdc368b1	RDMA/mlx5: Use get_zeroed_page() for clock_info get_zeroed_page() returns a virtual address for the page which is better than allocating a struct page and doing a permanent kmap on it. Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 13:40:50 -03:00
Jason Gunthorpe	d5e560d3f7	RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages Since mlx5 supports device disassociate it must use this API for all BAR page mmaps, otherwise the pages can remain mapped after the device is unplugged causing a system crash. Cc: stable@vger.kernel.org Fixes: `5f9794dc94` ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-04-24 13:06:40 -03:00
Jason Gunthorpe	c660133c33	RDMA/mlx5: Do not allow the user to write to the clock page The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: `5c99eaecb1` ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-04-24 13:06:24 -03:00
John Fleck	3c176c9d72	IB/hfi1: Remove reference to RHF.VCRCErr The bit VCRCErr in the receive header flag is actually a reserved field. Remove bit operations on this field. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:48:11 -03:00
Mike Marciniszyn	a9c62e0078	IB/hfi1: Add selected Rcv counters These counters are required for error analysis and debug. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:48:10 -03:00
Mike Marciniszyn	d40f69c9b9	IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts The reference count adjustments on reference count completion are open coded throughout. Add a routine to do all reference count adjustments and use. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Mike Marciniszyn	715ab1a862	IB/rdmavt: Fix ab/ba include issues The currently include file ordering for rdmavt headers has an ab/ba include issue the precludes using inlines from rdma_vt.h in rdmavt_qp.h. At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h. Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h and move qp related inlines to rdmavt_qp.h. Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared by rdmavt_cq.h and rdmavt_qp.h. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Mike Marciniszyn	62644c1d2b	IB/hfi1: Make opfn.h self sufficient The opfn.h include file build-ablility depends on the including file having the correct includes. Fix by making opfn.h self sufficient. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Kaike Wan	ea752bc5e5	IB/{rdmavt, hfi1): Miscellaneous comment fixes This patch fixes miscellaneous comment errors. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:48 -03:00
Josh Collier	07c5ba9124	IB/hfi1: Add debugfs to control expansion ROM write protect Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple handles to PCI resource0. In order to continue to support expansion ROM updates while the driver is loaded, the driver must now provide an interface to control the expansion ROM write protection. This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom user tool to disable the expansion ROM write protection by opening the file and writing a '1'. The write protection is released when writing a '0' or automatically re-enabled when the file handle is closed. The current implementation will only allow one handle to be opened at a time across all hfi1 devices. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:26:41 -03:00
Leon Romanovsky	5742582222	RDMA/hns: Remove asynchronic QP destroy Verbs destroy callbacks are synchronous operations and can't be delayed. The expectation is that after driver returned from destroy function, the memory can be freed and user won't be able to access it again. Ditch workqueue implementation used in HNS driver. Fixes: `d838c481e0` ("IB/hns: Fix the bug when destroy qp") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: oulijun <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 10:55:31 -03:00
Saeed Mahameed	c3bdd5e651	Linux 5.1-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z 4+dwGBk= =pyXR -----END PGP SIGNATURE----- Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next Linux 5.1-rc1 We forgot to reset the branch last merge window thus mlx5-next is outdated and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next branch with 5.1-rc1. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-22 15:25:39 -07:00
Mark Bloch	5fb58c9e2f	RDMA/mlx5: Don't create IB representors when in multiport RoCE mode Switchdev mode and mutiport RoCE mode aren't compatible at this point. Don't create IB reps when a user switches to switchdev mode and the driver operates in that mode. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	d3b5cc1cd9	RDMA/mlx5: Initialize roce port info before multiport master init When working in mutliport RoCE mode it is possible to attach a slave before the master. In that case the slave is waiting for a master to be attached. When the master is attached it goes over the list of waiting slaves, finds a slave that is compatible and tries to bind it to itself. The call stack is: mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port() In the bind function we will create a netdev notifier, but this is done before we initialize the RoCE structure (this is done at a later stage by the master in the ROCE stage). Once events are delivered to that notifier we will use mlx5_ib_get_native_port_mdev() to get the actual port and as the native port is zero we will access an invalid index in the port structure. Move the RoCE structure initialization to an earlier stage. Fixes: `32f69e4be2` ("{net, IB}/mlx5: Manage port association for multiport RoCE") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	7f575103b0	RDMA/mlx5: Allow DEVX and raw creation flow on reps Remove the limitations that were in place and provide support for DEVX and raw flow creation on reps. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Maor Gottlieb	56e5acd405	RDMA/mlx5: Add query e-switch vport context to devx white list Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed only if HCA_CAP.eswitch_manager==1. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	52438be441	RDMA/mlx5: Allow inserting a steering rule to the FDB Allow this only via mlx5 raw create flow API, legacy verbs are not supported. To accommodate that, we add a new attribute to matcher creation to indicate the type of flow table to be used. MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer needed, we keep it for compatibility but at most only a single attribute can be passed of the two. When inserting a flow rule to the FDB we require that a DEVX FT is provided as a destination, no other configuration is allowed. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	3b70508a6b	RDMA/mlx5: Create flow table with max size supported Instead of failing the request, just use the supported number of flow entries. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	13a4376568	RDMA/mlx5: Access the prio bypass inside the FDB flow table namespace Now that we have a specific prio inside the FDB namespace allow retrieving it from the RDMA side. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Chengguang Xu	2d95984977	infiniband/qib: Fix typo in comment Fix typo 'faspath' -> 'pastpath'. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 14:08:15 -03:00
Colin Ian King	ff5eefe6d3	RDMA/cxgb4: Fix spelling mistake "immedate" -> "immediate" There is a spelling mistake in a module parameter description. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-18 03:17:42 -03:00
Guy Levi	7249c8ea22	IB/mlx5: Fix scatter to CQE in DCT QP creation When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: `5d6ff1babe` ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-18 03:13:41 -03:00
David S. Miller	6b0a7f84ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-17 11:26:25 -07:00
Colin Ian King	a6d2a5a92e	RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure Currently if alloc_skb fails to allocate the skb a null skb is passed to t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid the NULL pointer dereference by checking for a NULL skb and returning early. Addresses-Coverity: ("Dereference null return") Fixes: `b38a0ad8ec` ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-16 08:03:17 -03:00
Colin Ian King	1db86318c4	RDMA/mlx5: Check for error return in flow_rule rather than err Currently when the call to create_flow_rule_vport_sq fails, the error check is being performed on err rather than on the return pointer flow_rule. The return flow_rule maybe NULL (which is not considered an error) or an error code, so check for the error on flow_rule. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `d5ed8ac34c` ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-12 11:19:49 -03:00
Devesh Sharma	1c00d7bc96	RDMA/ocrdma: Remove use of idr use pci bdf instead Removing the use of IDR variable just to name the function ids. Using the PCI_FUNC(pdev->devfn) instead to create the device name, associated resources and to print driver into at various places. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-12 10:59:02 -03:00
Kaike Wan	d737b25b1a	IB/hfi1: Do not flush send queue in the TID RDMA second leg When a QP is put into error state, the send queue will be flushed. This mechanism is implemented in both the first and the second leg of the send engine. Since the second leg is only responsible for data transactions in the KDETH space for the TID RDMA WRITE request, it should not perform the flushing of the send queue. This patch removes the flushing function of the second leg, but still keeps the bailing out of the QP if it is put into error state. Fixes: `70dcb2e3dc` ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:09:30 -03:00
Mark Bloch	fb652d3299	RDMA/mlx5: Remove VF representor profile Now that we have a single IB device with multiple ports we can remove the VF representor profile. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:40 -03:00
Mark Bloch	26628e2d58	RDMA/mlx5: Move to single device multiport ports in switchdev mode Move from IB device (representor) per virtual function to single IB device with port per virtual function (port 1 represents the uplink). As number of ports is a static property of an IB device, declare the IB device with as many port as the possible according to the PCI bus. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	a989ea01cb	RDMA/mlx5: Move SMI caps logic We store the SMI information in the core device's struct, make sure we set that information only once (and not per port), while here make the for loop based on the actual size of the array. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	35b0aa67b2	RDMA/mlx5: Refactor netdev affinity code The design of representors is such that once an IB representor is created, the netdev of representor already exists, we can use that fact to simplify the netdev affinity code. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	d5ed8ac34c	RDMA/mlx5: Move default representors SQ steering to rule to modify QP Currently the steering for SQs created on representors is done on creation, once we move to representors as ports of an IB device we need the port argument which is given only at the modify QP stage, adjust the code appropriately. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	6a4d00be08	RDMA/mlx5: Move rep into port struct In preparation of moving into a model of single IB device multiple ports move rep to be part of the port structure. We mark a representor device by setting is_rep, no functional change with this patch. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	5d8f6a0e92	RDMA/mlx5: Use correct size for device resources On allocation we use the array size and on destruction num_ports, use the array size of destruction as well, in this context the array corresponds to the native/actual ports on the NIC so no need to adjust this logic for representors. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	da796ccb3e	RDMA/mlx5: Move ports allocation to outside of INIT stage In downstream patches we will need access to the ports before doing any stages, in order to set net device per representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	4a6dc8552a	RDMA/mlx5: Free IB device on remove Simplify the code and move the deallocation of the IB device into the remove function. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	95579e785a	RDMA/mlx5: Move netdev info into the port struct Netdev info is stored in a separate array and holds data relevant on a per port basis, move it to be part of the port struct. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00

1 2 3 4 5 ...

6254 Commits