OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Steve Wise	97df1c6736	RDMA/cxgb4: Use uninitialized_var() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:10 -07:00
Steve Wise	98a3e87990	RDMA/cxgb4: Add missing debug stats Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:09 -07:00
Steve Wise	c3f98fa291	RDMA/cxgb4: Initialize reserved fields in a FW work request Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:09 -07:00
Hariprasad Shenai	aec844df10	RDMA/cxgb4: Use pr_warn_ratelimited Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:08 -07:00
Steve Wise	a03d9f94cc	RDMA/cxgb4: Max fastreg depth depends on DSGL support The max depth of a fastreg mr depends on whether the device supports DSGL or not. So compute it dynamically based on the device support and the module use_dsgl option. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:08 -07:00
Steve Wise	b4e2901c52	RDMA/cxgb4: SQ flush fix There is a race when moving a QP from RTS->CLOSING where a SQ work request could be posted after the FW receives the RDMA_RI/FINI WR. The SQ work request will never get processed, and should be completed with FLUSHED status. Function c4iw_flush_sq(), however was dropping the oldest SQ work request when in CLOSING or IDLE states, instead of completing the pending work request. If that oldest pending work request was actually complete and has a CQE in the CQ, then when that CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent SQ/CQ state. This is a very small timing hole and has only been hit once so far. The fix is two-fold: 1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED status regardless of the QP state. 2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue before moving the state out of RTS. This ensures that the state transition will not happen while another thread is in post_rc_send(), because set_state() and post_rc_send() both aquire the qp spinlock. Also, once we transition the state out of RTS, subsequent calls to post_rc_send() will fail because the "in error" bit is set. I don't think this fully closes the race where the FW can get a FINI followed a SQ work request being posted (because they are posted to differente EQs), but the #1 fix will handle the issue by flushing the SQ work request. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:08 -07:00
Steve Wise	def4771f4b	RDMA/cxgb4: rmb() after reading valid gen bit Some HW platforms can reorder read operations, so we must rmb() after we see a valid gen bit in a CQE but before we read any other fields from the CQE. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:07 -07:00
Steve Wise	b33bd0cbfa	RDMA/cxgb4: Endpoint timeout fixes 1) timedout endpoint processing can be starved. If there are continual CPL messages flowing into the driver, the endpoint timeout processing can be starved. This condition exposed the other bugs below. Solution: In process_work(), call process_timedout_eps() after each CPL is processed. 2) Connection events can be processed even though the endpoint is on the timeout list. If the endpoint is scheduled for timeout processing, then we must ignore MPA Start Requests and Replies. Solution: Change stop_ep_timer() to return 1 if the ep has already been queued for timeout processing. All the callers of stop_ep_timer() need to check this and act accordingly. There are just a few cases where the caller needs to do something different if stop_ep_timer() returns 1: 1) in process_mpa_reply(), ignore the reply and process_timeout() will abort the connection. 2) in process_mpa_request, ignore the request and process_timeout() will abort the connection. It is ok for callers of stop_ep_timer() to abort the connection since that will leave the state in ABORTING or DEAD, and process_timeout() now ignores timeouts when the ep is in these states. 3) Double insertion on the timeout list. Since the endpoint timers are used for connection setup and teardown, we need to guard against the possibility that an endpoint is already on the timeout list. This is a rare condition and only seen under heavy load and in the presense of the above 2 bugs. Solution: In ep_timeout(), don't queue the endpoint if it is already on the queue. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:07 -07:00
Steve Wise	fa658a98a2	RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fix cast from u64* to integer. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-11 11:36:01 -07:00
Eli Cohen	f360d88a2e	IB/mlx5: Add block multicast loopback support Add support for the block multicast loopback QP creation flag along the proper firmware API for that. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-10 18:43:32 -07:00
Alexander Gordeev	9684c2ea6d	IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix() As result of the deprecation of the MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block(), all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-10 18:41:34 -07:00
Alexander Gordeev	bf3f043e7b	IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix() As result of the deprecation of the MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block(), all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() and pci_enable_msix_range() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-10 18:39:13 -07:00
Linus Torvalds	877f075aac	Main batch of InfiniBand/RDMA changes for 3.15: - The biggest change is core API extensions and mlx5 low-level driver support for handling DIF/DIX-style protection information, and the addition of PI support to the iSER initiator. Target support will be arriving shortly through the SCSI target tree. - A nice simplification to the "umem" memory pinning library now that we have chained sg lists. Kudos to Yishai Hadas for realizing our code didn't have to be so crazy. - Another nice simplification to the sg wrappers used by qib, ipath and ehca to handle their mapping of memory to adapter. - The usual batch of fixes to bugs found by static checkers etc. from intrepid people like Dan Carpenter and Yann Droneaud. - A large batch of cxgb4, ocrdma, qib driver updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+ 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS zn6f16GmJ37BKn2/qYY/ =Ww6H -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "Main batch of InfiniBand/RDMA changes for 3.15: - The biggest change is core API extensions and mlx5 low-level driver support for handling DIF/DIX-style protection information, and the addition of PI support to the iSER initiator. Target support will be arriving shortly through the SCSI target tree. - A nice simplification to the "umem" memory pinning library now that we have chained sg lists. Kudos to Yishai Hadas for realizing our code didn't have to be so crazy. - Another nice simplification to the sg wrappers used by qib, ipath and ehca to handle their mapping of memory to adapter. - The usual batch of fixes to bugs found by static checkers etc. from intrepid people like Dan Carpenter and Yann Droneaud. - A large batch of cxgb4, ocrdma, qib driver updates" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits) RDMA/ocrdma: Unregister inet notifier when unloading ocrdma RDMA/ocrdma: Fix warnings about pointer <-> integer casts RDMA/ocrdma: Code clean-up RDMA/ocrdma: Display FW version RDMA/ocrdma: Query controller information RDMA/ocrdma: Support non-embedded mailbox commands RDMA/ocrdma: Handle CQ overrun error RDMA/ocrdma: Display proper value for max_mw RDMA/ocrdma: Use non-zero tag in SRQ posting RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr() RDMA/ocrdma: Increment abi version count RDMA/ocrdma: Update version string be2net: Add abi version between be2net and ocrdma RDMA/ocrdma: ABI versioning between ocrdma and be2net RDMA/ocrdma: Allow DPP QP creation RDMA/ocrdma: Read ASIC_ID register to select asic_gen RDMA/ocrdma: SQ and RQ doorbell offset clean up RDMA/ocrdma: EQ full catastrophe avoidance RDMA/cxgb4: Disable DSGL use by default RDMA/cxgb4: rx_data() needs to hold the ep mutex ...	2014-04-03 16:57:19 -07:00
Roland Dreier	f7eaa7ed8f	Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next	2014-04-03 08:30:17 -07:00
Selvin Xavier	2d8f57d56f	RDMA/ocrdma: Unregister inet notifier when unloading ocrdma Unregister the inet notifier during ocrdma unload to avoid a panic after driver unload. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:07 -07:00
Roland Dreier	7a1e89d8b7	RDMA/ocrdma: Fix warnings about pointer <-> integer casts We should cast pointers to and from unsigned long to turn them into ints. Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:07 -07:00
Devesh Sharma	fad51b7d36	RDMA/ocrdma: Code clean-up Clean up code. Also modifying GSI QP to error during ocrdma_close is fixed. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:06 -07:00
Selvin Xavier	334b8db3a6	RDMA/ocrdma: Display FW version Adding a sysfs file for getting the FW version. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:06 -07:00
Selvin Xavier	a51f06e167	RDMA/ocrdma: Query controller information Issue mailbox commands to query ocrdma controller information and phy information and print them while adding ocrdma device. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:05 -07:00
Selvin Xavier	bbc5ec524e	RDMA/ocrdma: Support non-embedded mailbox commands Added a routine to issue non-embedded mailbox commands for handling large mailbox request/response data. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:05 -07:00
Selvin Xavier	1228056bcf	RDMA/ocrdma: Handle CQ overrun error Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:05 -07:00
Selvin Xavier	ac578aef8b	RDMA/ocrdma: Display proper value for max_mw Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:04 -07:00
Selvin Xavier	cf5788ade7	RDMA/ocrdma: Use non-zero tag in SRQ posting As part of SRQ receive buffers posting we populate a non-zero tag which will be returned in SRQ receive completions. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:04 -07:00
Selvin Xavier	9d1878a369	RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr() Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:03 -07:00
Devesh Sharma	2e6e9f2bb8	RDMA/ocrdma: Increment abi version count Increment the ABI version count for driver/library interface. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:30:02 -07:00
Devesh Sharma	0154410bd4	RDMA/ocrdma: Update version string Update the driver vrsion string and node description string Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:59 -07:00
Devesh Sharma	b6b87d2e69	RDMA/ocrdma: ABI versioning between ocrdma and be2net While loading RoCE driver be2net driver should check for ABI version to catch functional incompatibilities. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:51 -07:00
Devesh Sharma	1eebbb6ec3	RDMA/ocrdma: Allow DPP QP creation Allow creating DPP QP even if inline-data is not requested. This is an optimization to lower latency. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:44 -07:00
Devesh Sharma	21c3391a9a	RDMA/ocrdma: Read ASIC_ID register to select asic_gen ocrdma driver selects execution path based on sli_family and asic generation number. This introduces code to read the asic gen number from pci register instead of obtaining it from the Emulex NIC driver. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:40 -07:00
Devesh Sharma	2df84fa87f	RDMA/ocrdma: SQ and RQ doorbell offset clean up Introducing new macros to define SQ and RQ doorbell offset. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:36 -07:00
Devesh Sharma	ea61762679	RDMA/ocrdma: EQ full catastrophe avoidance Stale entries in the CQ being destroyed causes hardware to generate EQEs indefinitely for a given CQ. Thus causing uncontrolled execution of irq_handler. This patch fixes this using following sementics: * irq_handler will ring EQ doorbell atleast once and implement budgeting scheme. * cq_destroy will count number of valid entires during destroy and ring cq-db so that hardware does not generate uncontrolled EQE. * cq_destroy will synchronize with last running irq_handler instance. * arm_cq will always defer arming CQ till poll_cq, except for the first arm_cq call. * poll_cq will always ring cq-db with arm=SET if arm_cq was called prior to enter poll_cq. * poll_cq will always ring cq-db with arm=UNSET if arm_cq was not called prior to enter poll_cq. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-03 08:29:34 -07:00
Steve Wise	96bb2706c8	RDMA/cxgb4: Disable DSGL use by default Current hardware doesn't correctly support DSGL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-02 08:53:54 -07:00
Steve Wise	c529fb5046	RDMA/cxgb4: rx_data() needs to hold the ep mutex To avoid racing with other threads doing close/flush/whatever, rx_data() should hold the endpoint mutex. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-02 08:53:54 -07:00
Steve Wise	977116c698	RDMA/cxgb4: Drop RX_DATA packets if the endpoint is gone Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-02 08:53:53 -07:00
Steve Wise	a7db89eb89	RDMA/cxgb4: Lock around accept/reject downcalls There is a race between ULP threads doing an accept/reject, and the ingress processing thread handling close/abort for the same connection. The accept/reject path needs to hold the lock to serialize these paths. Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-02 08:52:45 -07:00
Mike Marciniszyn	f3585a6ae3	IB/ehca: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads These methods appear to only mimic the sg_dma_address() and sg_dma_len() behavior. They can be safely removed. Suggested-by: Bart Van Assche <bvanassche@acm.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 11:16:31 -07:00
Mike Marciniszyn	49c5c27e05	IB/ipath: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads The removal of these methods is compensated for by code changes to .map_sg to insure that the vanilla sg_dma_address() and sg_dma_len() will do the same thing as the equivalent former ib_sg_dma_address() and ib_sg_dma_len() calls into the drivers. The introduction of this patch required that the struct ipath_dma_mapping_ops be converted to a C99 initializer. Suggested-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 11:16:31 -07:00
Mike Marciniszyn	446bf432a9	IB/qib: Remove ib_sg_dma_address() and ib_sg_dma_len() overloads Remove the overload for .dma_len and .dma_address The removal of these methods is compensated for by code changes to .map_sg to insure that the vanilla sg_dma_address() and sg_dma_len() will do the same thing as the equivalent former ib_sg_dma_address() and ib_sg_dma_len() calls into the drivers. Suggested-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Vinod Kumar <vinod.kumar@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 11:16:31 -07:00
Dan Carpenter	4661bd798f	mlx4_core: Make buffer larger to avoid overflow warning My static checker complains that the sprintf() here can overflow. drivers/infiniband/hw/mlx4/main.c:1836 mlx4_ib_alloc_eqs() error: format string overflow. buf_size: 32 length: 69 This seems like a valid complaint. The "dev->pdev->bus->name" string can be 48 characters long. I just made the buffer 80 characters instead of 69 and I changed the sprintf() to snprintf(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 10:53:29 -07:00
Dan Carpenter	3839d8ac1b	mlx4_core: Fix some indenting in mlx4_ib_add() The code was indented too far and also kernel style says we should have curly braces. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 10:52:18 -07:00
Yann Droneaud	5bdb0f02ad	IB/ehca: Returns an error on ib_copy_to_udata() failure In case of error when writing to userspace, function ehca_create_cq() does not set an error code before following its error path. This patch sets the error code to -EFAULT when ib_copy_to_udata() fails. This was caught when using spatch (aka. coccinelle) to rewrite call to ib_copy_{from,to}_udata(). Link: `75ebf2c103`:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 10:36:07 -07:00
Yann Droneaud	08e74c4b00	IB/mthca: Return an error on ib_copy_to_udata() failure In case of error when writing to userspace, the function mthca_create_cq() does not set an error code before following its error path. This patch sets the error code to -EFAULT when ib_copy_to_udata() fails. This was caught when using spatch (aka. coccinelle) to rewrite call to ib_copy_{from,to}_udata(). Link: `75ebf2c103`:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-04-01 10:35:42 -07:00
Yann Droneaud	bfd2793c95	RDMA/cxgb4: set error code on kmalloc() failure If kmalloc() fails in c4iw_alloc_ucontext(), the function leaves but does not set an error code in ret variable: it will return 0 to the caller. This patch set ret to -ENOMEM in such case. Cc: Steve Wise <swise@opengridcomputing.com> Cc: Steve Wise <swise@chelsio.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 14:55:21 -04:00
Matan Barak	e471b40321	mlx4: Use actual number of PCI functions (PF + VFs) for alias GUID logic The code which is dealing with SRIOV alias GUIDs in the mlx4 IB driver has some logic which operated according to the maximal possible active functions (PF + VFs). After the single port VFs code integration this resulted in a flow of false-positive warnings going to the kernel log after the PF driver started the alias GUID work. Fix it by referring to the actual number of functions. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-25 20:48:05 -04:00
Steve Wise	9c88aa003d	RDMA/cxgb4: Update snd_seq when sending MPA messages Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-24 10:07:35 -07:00
Steve Wise	be13b2dff8	RDMA/cxgb4: Connect_request_upcall fixes When processing an MPA Start Request, if the listening endpoint is DEAD, then abort the connection. If the IWCM returns an error, then we must abort the connection and release resources. Also abort_connection() should not post a CLOSE event, so clean that up too. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-24 10:07:35 -07:00
Steve Wise	70b9c66053	RDMA/cxgb4: Ignore read reponse type 1 CQEs These are generated by HW in some error cases and need to be silently discarded. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-24 10:07:35 -07:00
Steve Wise	1ce1d471ac	RDMA/cxgb4: Fix possible memory leak in RX_PKT processing If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must free the request skb and the saved skb with the tcp header. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-24 10:07:35 -07:00
Steve Wise	dbb084cc5f	RDMA/cxgb4: Don't leak skb in c4iw_uld_rx_handler() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-24 10:07:35 -07:00
Matan Barak	449fc48866	net/mlx4: Adapt code for N-Port VF Adds support for N-Port VFs, this includes: 1. Adding support in the wrapped FW command In wrapped commands, we need to verify and convert the slave's port into the real physical port. Furthermore, when sending the response back to the slave, a reverse conversion should be made. 2. Adjusting sqpn for QP1 para-virtualization The slave assumes that sqpn is used for QP1 communication. If the slave is assigned to a port != (first port), we need to adjust the sqpn that will direct its QP1 packets into the correct endpoint. 3. Adjusting gid[5] to modify the port for raw ethernet In B0 steering, gid[5] contains the port. It needs to be adjusted into the physical port. 4. Adjusting number of ports in the query / ports caps in the FW commands When a slave queries the hardware, it needs to view only the physical ports it's assigned to. 5. Adjusting the sched_qp according to the port number The QP port is encoded in the sched_qp, thus in modify_qp we need to encode the correct port in sched_qp. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-20 16:18:30 -04:00
Matan Barak	82373701be	IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constant Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports. Instead, we should only loop until the number of actual ports in i the device, which is stored in dev->caps.num_ports. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-20 16:18:29 -04:00
Dan Carpenter	186f8ba062	IB/qib: Cleanup qib_register_observer() Returning directly is easier to read than do-nothing gotos. Remove the duplicative check on "olp" and pull the code in one indent level. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:19:18 -07:00
CQ Tang	49c0e2414b	IB/qib: Change SDMA progression mode depending on single- or multi-rail Improve performance by changing the behavour of the driver when all SDMA descriptors are in use, and the processes adding new descriptors are single- or multi-rail. For single-rail processes, the driver will block the call and finish posting all SDMA descriptors onto the hardware queue before returning back to PSM. Repeated kernel calls are slower than blocking. For multi-rail processes, the driver will return to PSM as quick as possible so PSM can feed packets to other rail. If all hardware queues are full, PSM will buffer the remaining SDMA descriptors until notified by interrupt that space is available. This patch builds a red-black tree to track the number rails opened by a particular PID. If the number is more than one, it is a multi-rail PSM process, otherwise, it is a single-rail process. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: John A Gregor <john.a.gregor@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:19:12 -07:00
Steve Wise	eda6d1d1b7	RDMA/cxgb4: Save the correct map length for fast_reg_page_lists We cannot save the mapped length using the rdma max_page_list_len field of the ib_fast_reg_page_list struct because the core code uses it. This results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl(). I found this with dma mapping debugging enabled in the kernel. The fix is to save the length in the c4iw_fr_page_list struct. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:01:30 -07:00
Steve Wise	df2d5130ec	RDMA/cxgb4: Default peer2peer mode to 1 Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:01:30 -07:00
Steve Wise	ba32de9d8d	RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:01:30 -07:00
Steve Wise	8a9c399eee	RDMA/cxgb4: Fix incorrect BUG_ON conditions Based on original work from Jay Hernandez <jay@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 10:01:30 -07:00
Steve Wise	ebf00060c3	RDMA/cxgb4: Always release neigh entry Always release the neigh entry in rx_pkt(). Based on original work by Santosh Rastapur <santosh@chelsio.com>. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 09:59:04 -07:00
Steve Wise	f8e819081f	RDMA/cxgb4: Allow loopback connections find_route() must treat loopback as a valid egress interface. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 09:59:04 -07:00
Steve Wise	ffd435924c	RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZE Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 09:59:04 -07:00
Dan Carpenter	e24a72a330	RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq() There is a four byte hole at the end of the "uresp" struct after the ->qid_mask member. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 09:59:04 -07:00
Dan Carpenter	ff1706f4fe	RDMA/cxgb4: Fix underflows in c4iw_create_qp() These sizes should be unsigned so we don't allow negative values and have underflow bugs. These can come from the user so there may be security implications, but I have not tested this. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-20 09:59:04 -07:00
Bart Van Assche	0e9855dbf4	IB/mlx4: Fix a sparse endianness warning Fix the following warning for the mlx4 driver: $ make M=drivers/infiniband C=2 CF=-D__CHECK_ENDIAN__ drivers/infiniband/hw/mlx4/qp.c:1885:31: warning: restricted __be16 degrades to integer Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 22:23:52 -07:00
Prarit Bhargava	bc1b04ab34	RDMA/ocrdma: Fix compiler warning drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function ‘_ocrdma_modify_qp’: drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1299:31: error: ‘old_qps’ may be used uninitialized in this function [-Werror=maybe-uninitialized] status = ocrdma_mbx_modify_qp(dev, qp, attr, attr_mask, old_qps); ocrdma_mbx_modify_qp() (and subsequent calls) doesn't appear to use old_qps so it doesn't need to be passed on. Removing the variable results in the warning going away. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Devesh Sharma (Devesh.sharma@emulex.com) Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:34:13 -07:00
Dan Carpenter	349850f0a9	RDMA/nes: Clean up a condition We don't need to test "ret" twice and also the white space is messed up. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:29:37 -07:00
Dan Carpenter	db498827ff	IB/qib: Remove duplicate check in get_a_ctxt() We already know "pusable" is non-zero, no need to check again. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:28:09 -07:00
Fabio Estevam	970918b32b	IB/usnic: Remove '0x' when using %pa format %pa format already prints in hexadecimal format, so remove the '0x' annotation to avoid a double '0x0x' pattern. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:26:38 -07:00
Yann Droneaud	9d194d1025	IB/nes: Return an error on ib_copy_from_udata() failure instead of NULL In case of error while accessing to userspace memory, function nes_create_qp() returns NULL instead of an error code wrapped through ERR_PTR(). But NULL is not expected by ib_uverbs_create_qp(), as it check for error with IS_ERR(). As page 0 is likely not mapped, it is going to trigger an Oops when the kernel will try to dereference NULL pointer to access to struct ib_qp's fields. In some rare cases, page 0 could be mapped by userspace, which could turn this bug to a vulnerability that could be exploited: the function pointers in struct ib_device will be under userspace total control. This was caught when using spatch (aka. coccinelle) to rewrite calls to ib_copy_{from,to}_udata(). Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null Link: `75ebf2c103`:ib_copy_udata.cocci Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:20:28 -07:00
Dennis Dalessandro	06064a103f	IB/qib: Fix memory leak of recv context when driver fails to initialize. In qib_create_ctxts() we allocate an array to hold recv contexts. Then attempt to create data for those recv contexts. If that call to qib_create_ctxtdata() fails then an error is returned but the previously allocated memory is not freed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Yann Droneaud	8572de9732	IB/qib: fixup indentation in qib_ib_rcv() Commit `af061a644a` add some code in qib_ib_rcv() which trigger a warning from coccicheck (coccinelle/spatch): $ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/ CHECK drivers/infiniband/hw/qib/qib_verbs.c drivers/infiniband/hw/qib/qib_verbs.c:679:5-32: code aligned with following code on line 681 CC [M] drivers/infiniband/hw/qib/qib_verbs.o In fact, according to similar code in qib_kreceive(), qib_ib_rcv() code is correct but improperly indented. This patch fix indentation for the misaligned portion. Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: infinipath@intel.com Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: cocci@systeme.lip6.fr Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Yann Droneaud	37a967651c	IB/qib: add missing braces in do_qib_user_sdma_queue_create() Commit `c804f07248` moved qib_assign_ctxt() to do_qib_user_sdma_queue_create() but dropped the braces around the statements. This was spotted by coccicheck (coccinelle/spatch): $ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/ CHECK drivers/infiniband/hw/qib/qib_file_ops.c drivers/infiniband/hw/qib/qib_file_ops.c:1583:2-23: code aligned with following code on line 1587 This patch adds braces back. Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: infinipath@intel.com Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: cocci@systeme.lip6.fr Cc: stable@vger.kernel.org Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Mike Marciniszyn	7d7632add8	IB/qib: Modify software pma counters to use percpu variables The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv are now maintained as percpu variables. The mad code is modified to add a z_ latch so that the percpu counters monotonically increase with appropriate adjustments in the reset, read logic to maintain the z_ latch. This patch also corrects the fact the unitcast_xmit wasn't handled at all for UC and RC QPs. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Mike Marciniszyn	1ed88dd7d0	IB/qib: Add percpu counter replacing qib_devdata int_counter This patch replaces the dd->int_counter with a percpu counter. The maintanance of qib_stats.sps_ints and int_counter are combined into the new counter. There are two new functions added to read the counter: - qib_int_counter (for a particular qib_devdata) - qib_sps_ints (for all HCAs) A z_int_counter is added to allow the interrupt detection logic to determine if interrupts have occured since z_int_counter was "reset". Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Mike Marciniszyn	f8b6c47a44	IB/qib: Fix debugfs ordering issue with multiple HCAs The debugfs init code was incorrectly called before the idr mechanism is used to get the unit number, so the dd->unit hasn't been initialized. This caused the unit relative directory creation to fail after the first. This patch moves the init for the debugfs stuff until after all of the failures and after the unit number has been determined. A bug in unwind code in qib_alloc_devdata() is also fixed. Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Dennis Dalessandro	a2cb0eb8a6	IB/ipath: Fix potential buffer overrun in sending diag packet routine Guard against a potential buffer overrun. The size to read from the user is passed in, and due to the padding that needs to be taken into account, as well as the place holder for the ICRC it is possible to overflow the 32bit value which would cause more data to be copied from user space than is allocated in the buffer. Cc: <stable@vger.kernel.org> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Dennis Dalessandro	1c20c81909	IB/qib: Fix potential buffer overrun in sending diag packet routine Guard against a potential buffer overrun. Right now the qib driver is protected by the fact that the data structure in question is only 16 bits. Should that ever change the problem will be exposed. There is a similar defect in the ipath driver and this brings the two code paths into sync. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 16:16:51 -07:00
Tatyana Nikolova	43adff3979	RDMA/nes: Fix for passing a valid QP pointer to the user space library Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 10:04:16 -07:00
Tatyana Nikolova	4ac79a7003	RDMA/nes: Fixes for IRD/ORD negotiation with MPA v2 Fixes to enable the negotiation of the supported IRD/ORD sizes with the peer when exchanging MPA v2 messages in connection establishment. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-17 10:03:17 -07:00
Steve Wise	05eb23893c	cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes The current logic suffers from a slow response time to disable user DB usage, and also fails to avoid DB FIFO drops under heavy load. This commit fixes these deficiencies and makes the avoidance logic more optimal. This is done by more efficiently notifying the ULDs of potential DB problems, and implements a smoother flow control algorithm in iw_cxgb4, which is the ULD that puts the most load on the DB fifo. Design: cxgb4: Direct ULD callback from the DB FULL/DROP interrupt handler. This allows the ULD to stop doing user DB writes as quickly as possible. While user DB usage is disabled, the LLD will accumulate DB write events for its queues. Then once DB usage is reenabled, a single DB write is done for each queue with its accumulated write count. This reduces the load put on the DB fifo when reenabling. iw_cxgb4: Instead of marking each qp to indicate DB writes are disabled, we create a device-global status page that each user process maps. This allows iw_cxgb4 to only set this single bit to disable all DB writes for all user QPs vs traversing the idr of all the active QPs. If the libcxgb4 doesn't support this, then we fall back to the old approach of marking each QP. Thus we allow the new driver to work with an older libcxgb4. When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes via the status page and transition the DB state to STOPPED. As user processes see that DB writes are disabled, they call into iw_cxgb4 to submit their DB write events. Since the DB state is in STOPPED, the QP trying to write gets enqueued on a new DB "flow control" list. As subsequent DB writes are submitted for this flow controlled QP, the amount of writes are accumulated for each QP on the flow control list. So all the user QPs that are actively ringing the DB get put on this list and the number of writes they request are accumulated. When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq context, we change the DB state to FLOW_CONTROL, and begin resuming all the QPs that are on the flow control list. This logic runs on until the flow control list is empty or we exit FLOW_CONTROL mode (due to a DB DROP upcall, for example). QPs are removed from this list, and their accumulated DB write counts written to the DB FIFO. Sets of QPs, called chunks in the code, are removed at one time. The chunk size is 64. So 64 QPs are resumed at a time, and before the next chunk is resumed, the logic waits (blocks) for the DB FIFO to drain. This prevents resuming to quickly and overflowing the FIFO. Once the flow control list is empty, the db state transitions back to NORMAL and user QPs are again allowed to write directly to the user DB register. The algorithm is designed such that if the DB write load is high enough, then all the DB writes get submitted by the kernel using this flow controlled approach to avoid DB drops. As the load lightens though, we resume to normal DB writes directly by user applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:44:11 -04:00
Steve Wise	7a2cea2aaa	cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice Based on original work by Anand Priyadarshee <anandp@chelsio.com>. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:44:11 -04:00
Jack Morgenstein	aa9a2d51a3	mlx4: Activate RoCE/SRIOV To activate RoCE/SRIOV, need to remove the following: 1. In mlx4_ib_add, need to remove the error return preventing initialization of a RoCE port under SRIOV. 2. In update_vport_qp_params (in resource_tracker.c) need to remove the error return when a RoCE RC or UD qp is detected. This error return causes the INIT-to-RTR qp transition to fail in the wrapper function under RoCE/SRIOV. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:16 -04:00
Shani Michaelli	ceb5433b3a	mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCE * Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID in multiplex_cm_handler and demux_cm_handler. * Handle Service ID Resolution messages and REQ messages separately, for their formats are different. Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:16 -04:00
Jack Morgenstein	5ea8bbfc49	mlx4: Implement IP based gids support for RoCE/SRIOV Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:16 -04:00
Jack Morgenstein	2f5bb47368	mlx4: Add ref counting to port MAC table for RoCE The IB side of RoCE requires the MAC table index of the MAC address used by its QPs. To obtain the real MAC index, the IB side registers the MAC (increasing its ref count, and also returning the real MAC index) during the modify-qp sequence. This protects against the ETH side deleting or modifying that MAC table entry while the QP is active. Note that until the modify-qp command returns success, the MAC and VLAN information only has "candidate" status. If the modify-qp succeeds, the "candidate" info is promoted to the operational MAC/VLAN info for the qp. If the modify fails, the candidate MAC/VLAN is unregistered, and the old qp info is preserved. The patch is a bit complex, because there are multiple qp transitions where the primary-path information may be modified: INIT-to-RTR, and SQD-to-SQD. Similarly for the alternate path information. Therefore the code must handle cases where path information has already been entered into the QP context by previous qp transitions. For the MAC address, the success logic is as follows: 1. If there was no previous MAC, simply move the candidate MAC information to the operational information, and reset the candidate MAC info. 2. If there was a previous MAC, unregister it. Then move the MAC information from candidate to operational, and reset the candidate info (as in 1. above). The MAC address failure logic is the same for all cases: - Unregister the candidate MAC, and reset the candidate MAC info. For Vlan registration, the logic is similar. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:15 -04:00
Jack Morgenstein	b6ffaeffae	mlx4: In RoCE allow guests to have multiple GIDS The GIDs are statically distributed, as follows: PF: gets 16 GIDs VFs: Remaining GIDS are divided evenly between VFs activated by the driver. If the division is not even, lower-numbered VFs get an extra GID. For an IB interface, the number of gids per guest remains as before: one gid per guest. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:14 -04:00
Jack Morgenstein	6ee51a4e86	mlx4: Adjust QP1 multiplexing for RoCE/SRIOV This requires the following modifications: 1. Fix build_mlx4_header to properly fill in the ETH fields 2. Adjust mux and demux QP1 flow to support RoCE. This commit still assumes only one GID per slave for RoCE. The commit enabling multiple GIDs is a subsequent commit, and is done separately because of its complexity. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:57:12 -04:00
Sagi Grimberg	2dea909444	IB/mlx5: Expose support for signature MR feature Currently support only T10-DIF types of signature handover operations (types 1\|2\|3). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:40:04 -08:00
Sagi Grimberg	d5436ba010	IB/mlx5: Collect signature error completion This commit takes care of the generated signature error CQE generated by the HW (if happened). The underlying mlx5 driver will handle signature error completions and will mark the relevant memory region as dirty. Once the consumer gets the completion for the transaction, it must check for signature errors on signature memory region using a new lightweight verb ib_check_mr_status(). In case the user doesn't check for signature error (i.e. doesn't call ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the memory region cannot be used for another signature operation (REG_SIG_MR work request will fail). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:40:04 -08:00
Sagi Grimberg	e6631814fb	IB/mlx5: Support IB_WR_REG_SIG_MR This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involves 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the memory domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. 3. post SET_PSV in order to set the wire domain initial signature parameters passed by the user. SET_PSV is not signaled and solicited CQE. * After this compound WR we place a small fence for next WR to come. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:39:51 -08:00
Sagi Grimberg	2ac45934f8	IB/mlx5: Remove MTT access mode from umr flags helper function get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. So remove it from helper, and callers will add their own access mode flag. This commit does not add/change functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Sagi Grimberg	6e5eadace1	IB/mlx5: Break up wqe handling into begin & finish routines As a preliminary step for signature feature which will require posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Sagi Grimberg	e1e66cc264	IB/mlx5: Initialize mlx5_ib_qp signature-related members If user requested signature enable we initialize relevant mlx5_ib_qp members. We mark the qp as sig_enable and we increase the effective SQ size, but still limit the user max_send_wr to original size computed. We also allow the create_qp routine to accept sig_enable create flag. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Sagi Grimberg	3121e3c441	mlx5: Implement create_mr and destroy_mr Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mrs that can register other (pre-registered) ib_mrs in an indirect manner. In addition user may request signature enable, that will mean that the created ib_mr may be attached with signature attributes (BSF, PSVs). Currently we only allow direct/indirect registration modes. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-07 11:26:49 -08:00
Yishai Hadas	eeb8461e36	IB: Refactor umem to use linear SG table This patch refactors the IB core umem code and vendor drivers to use a linear (chained) SG table instead of chunk list. With this change the relevant code becomes clearer—no need for nested loops to build and use umem. Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-03-04 10:34:28 -08:00
Amir Vadai	169a1d85d0	net,IB/mlx: Bump all Mellanox driver versions Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-25 17:34:44 -05:00
Roland Dreier	c9459388d8	Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next	2014-02-14 09:49:12 -08:00
Devesh Sharma	09de3f1313	RDMA/ocrdma: Fix load time panic during GID table init We should use rdma_vlan_dev_real_dev() instead of using vlan_dev_real_dev() when building the GID table for a vlan interface. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-14 09:49:04 -08:00
Devesh Sharma	a61d93d92f	RDMA/ocrdma: Fix traffic class shift Use correct value for obtaining traffic class from device response for Query QP request. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-14 09:49:00 -08:00
Upinder Malhi	f809309a25	IB/usnic: Fix smatch endianness error Error reported at http://marc.info/?l=linux-rdma&m=138995755801039&w=2 Fix short to int cast for big endian systems. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-14 09:47:29 -08:00
Eli Cohen	0861565f50	IB/mlx5: Remove dependency on X86 Remove Kconfig dependency of mlx5_ib/mlx5_core on X86, since there is no such dependency in reality. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 20:48:02 -08:00
Mike Marciniszyn	2f75e12c44	IB/qib: Add missing serdes init sequence Research has shown that commit `a77fcf8950` ("IB/qib: Use a single txselect module parameter for serdes tuning") missed a key serdes init sequence. This patch add that sequence. Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:52:44 -08:00
Kumar Sanghvi	0f0132001f	RDMA/cxgb4: Add missing neigh_release in LE-Workaround path Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:46:40 -08:00
Moni Shoua	b4a26a2728	IB: Report using RoCE IP based gids in port caps For userspace RoCE UD QPs we need to know the GID format that the kernel uses, e.g when working over older kernels. For that end, add a new port capability IB_PORT_IP_BASED_GIDS and report it when query port is issued. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:46:03 -08:00
Moni Shoua	ad4885d279	IB/mlx4: Build the port IBoE GID table properly under bonding When scanning netdevices we need to check a few more conditions and cases to build the IBoE GID table properly. For example, under bonding we must make sure that when a port is down, the bond IP address isn't programmed as a GID, since doing so will cause failure with IB core flows that selects ports by GID. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:31:09 -08:00
Moni Shoua	5071456fe2	IB/mlx4: Do IBoE GID table resets per-port The IBoE code used to reset the GID table did it for all Ethernet ports of the device. Since the whole architecture of generating GIDs and responding to events is port-based, this is inefficient and can lead to wrong content in the GID table. Change the reset flow to be per-port. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:31:08 -08:00
Moni Shoua	ddf8bd3491	IB/mlx4: Do IBoE locking earlier when initializing the GID table Updating the GID table under IBoE requires read/write from/to shared data structures. These data structures are protected with the device iboe lock. The flows that modify the GID table start from 1. Initializing the GID table 2. NETDEV events 3. INET or INET6 events This patch makes sure that the flow of initializing the GID table is consistent with the other two flows w.r.t on what step the lock is taken. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:31:08 -08:00
Moni Shoua	4ce5a5744a	IB/mlx4: Move rtnl locking to the right place On the one hand, the invocation of netdev_master_upper_dev_get() within mlx4_ib_scan_netdevs() must be done with rtnl lock held. On the other hand, it's wrong to call rtnl_lock() from within this function since it's also called by our netdev notifier callback. Therefore move the locking to mlx4_ib_add() so that both cases are covered. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:31:08 -08:00
Moni Shoua	acc4fccf4e	IB/mlx4: Make sure GID index 0 is always occupied Make sure that for Ethernet ports, the port GID table index 0 is always occupied with a default GID of the relevant IPv6 link-local adderss. This provides better user experience for legacy applications that don't use the RDMA CM and were working on index 0 prior to the IP addressing change. Also, as GIDs are generated from IP addresses of the network devices that are associated with the port, it's basically possible that the GID table will be empty if no IP address was assigned. This doesn't comply with the IB spec section 4.1.1 "GID usage and properties". Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 14:31:08 -08:00
Matan Barak	4196670be7	IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device When the device has only Ethernet ports, don't try to allocate range of steerable UD QPs since they aren't needed. This fixes an issue where mlx4 VFs tried to allocate a range of UD steerable QPs, but failed to do so. Fixes: `c1c9850112` ("IB/mlx4: Add support for steerable IB UD QPs") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-13 09:00:18 -08:00
Julia Lawall	ab576627c8	RDMA/amso1100: Fix error return code Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 ($ret < 0\\|ret != 0$) { ... return ret; } \| ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-12 11:11:46 -08:00
Julia Lawall	d07875bd0d	RDMA/nes: Fix error return code Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 ($ret < 0\\|ret != 0$) { ... return ret; } \| ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-12 11:11:09 -08:00
Eli Cohen	1a4c3a3dc5	IB/mlx5: Don't set "block multicast loopback" capability Currently Connect-IB does not support blocking multicast loopback, so don't set IB_DEVICE_BLOCK_MULTICAST_LOOPBACK in the device caps. Reported by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-06 23:09:48 -08:00
Eli Cohen	78c0f98cc9	IB/mlx5: Fix binary compatibility with libmlx5 Commit `c1be5232d2` ("Fix micro UAR allocator") broke binary compatibility between libmlx5 and mlx5_ib since it defines a different value to the number of micro UARs per page, leading to wrong calculation in libmlx5. This patch defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct mlx5_ib_alloc_ucontext_req. The extended size is determined in mlx5_ib_alloc_ucontext() and in case of old library we use uuarn 0 which works fine -- this is acheived due to create_user_qp() falling back from high to medium then to low class where low class will return 0. For new libraries we use the more sophisticated allocation algorithm. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-06 23:00:48 -08:00
Eli Cohen	9e65dc371b	IB/mlx5: Fix RC transport send queue overhead computation Fix the RC QPs send queue overhead computation to take into account two additional segments in the WQE which are needed for registration operations. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-02-06 23:00:48 -08:00
Roland Dreier	fb1b5034e4	Merge branch 'ip-roce' into for-next Conflicts: drivers/infiniband/hw/mlx4/main.c	2014-01-22 23:24:21 -08:00
Roland Dreier	8f399921ea	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next	2014-01-22 23:24:13 -08:00
Eli Cohen	57761d8df8	IB/mlx5: Verify reserved fields are cleared Verify that reserved fields in struct mlx5_ib_resize_cq are cleared before continuing execution of the verb. This is required to allow making use of this area in future revisions. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:54 -08:00
Eli Cohen	9e9c47d07d	IB/mlx5: Allow creation of QPs with zero-length work queues The current code attmepts to call ib_umem_get() even if the length is zero, which causes a failure. Since the spec allows zero length work queues, change the code so we don't call ib_umem_get() in those cases. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:53 -08:00
Eli Cohen	bde51583f4	IB/mlx5: Add support for resize CQ Implement resize CQ which is a mandatory verb in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:50 -08:00
Eli Cohen	3bdb31f688	IB/mlx5: Implement modify CQ Modify CQ is used by ULPs like IPoIB to change moderation parameters. This patch adds support in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:49 -08:00
Eli Cohen	ada388f7af	IB/mlx5: Make sure doorbell record is visible before doorbell Put a wmb() to make sure the doorbell record is visible to the HCA before we hit doorbell. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:49 -08:00
Ding Tianhong	79adc5321e	RDMA/nes: Slight optimization of Ethernet address compare Use the possibly more efficient ether_addr_equal() instead of memcmp(). Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:22:26 -08:00
Ira Weiny	6e0ea9e6cb	IB/qib: Fix QP check when looping back to/from QP1 The GSI QP type is compatible with and should be allowed to send data to/from any UD QP. This was found when testing ibacm on the same node as an SA. Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:16:47 -08:00
Paul Bolle	298589b1cb	RDMA/cxgb4: Fix gcc warning on 32-bit arch Building mem.o for 32 bits x86 triggers a GCC warning: drivers/infiniband/hw/cxgb4/mem.c: In function '_c4iw_write_mem_dma_aligned': drivers/infiniband/hw/cxgb4/mem.c:79:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Silence that warning by casting "&wr_wait" to unsigned long before casting it to __be64. That's what _c4iw_write_mem_inline() already does. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:07:09 -08:00
Wei Yongjun	a384b20e41	IB/usnic: Remove unused includes of <linux/version.h> Remove including <linux/version.h> that don't need it. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:05:51 -08:00
Svetlana Mavrina	d9d5713ca6	RDMA/amso1100: Add check if cache memory was allocated before freeing it There is a path in handle_vq() where kmem_cache_free() can be called with pointer to a local variable. It can happen if vq_repbuf_alloc() failed to allocate memory from cache and req is NULL. The patch adds check if cache memory was allocated before freeing it. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Svetlana Mavrina <another.karnil@gmail.com> Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:03:59 -08:00
Dan Carpenter	8ce96afa82	IB/usnic: Use GFP_ATOMIC under spinlock This is called from qp_grp_and_vf_bind() and we are holding the vf->lock so the allocation can't sleep. Fixes: `e3cf00d0a8` ('IB/usnic: Add Cisco VIC low-level hardware driver') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-21 10:47:56 -08:00
Roland Dreier	27cdef637c	IB/mlx4: Use IS_ENABLED(CONFIG_IPV6) ...instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:18:49 -08:00
Roland Dreier	9392fa0641	RDMA/ocrdma: Add dependency on INET Now that ocrdma supports IP-based addressing, we need to depend on INET, since ocrdma registers itself for net device events. Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:16:23 -08:00
Roland Dreier	31ab8acbf6	RDMA/ocrdma: Move ocrdma_inetaddr_event outside of "#if CONFIG_IPV6" This fixes the build if IPV6 isn't enabled. Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:14:05 -08:00
Matan Barak	f282651de6	IB/mlx4: Add dependency INET Since mlx4_ib supports IP based addressing, a dependency on INET needs to be added, since mlx4_ib registers itself for net device events. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:14:05 -08:00
Moni Shoua	37721d8501	RDMA/ocrdma: Populate GID table with IP based gids This patch is similar in spirit to the "IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table" patch. Changes to inet4 and inet6 addresses for the host are monitored and if the address is associated with an ocrdma device then a gid is added or deleted from the device's gid table. The gid format will be a IPv4 to IPv6 mapped or the IPv6 address. Cc: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:14:01 -08:00
Moni Shoua	40aca6ffca	RDMA/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing This patch is similar in spirit to the "IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing". It handles the fact that IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. When building an address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Cc: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-19 15:13:58 -08:00
Moni Shoua	297e0dad72	IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Therefore, we need to extract them from the CQE and place them in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 14:12:53 -08:00
Moni Shoua	d487ee7774	IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table Currently, the mlx4 driver set IBoE (RoCE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). This requires learning the IP addresses which are of use by a netdevice associated with the HCA port, formatting them to gids and adding them to the port gid table. Furthermore, events of add and delete address are caught to maintain the gid table accordingly. Associated IP addresses may belong to a master of an Ethernet netdevice on top of that port so this should be considered when building and maintaining the gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 14:12:52 -08:00
Julia Lawall	af2e2e35a2	IB/mlx4: Fix error return code Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 ($ret < 0\\|ret != 0$) { ... return ret; } \| ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:51:33 -08:00
Wei Yongjun	d1db47c5ee	IB/usnic: Remove unused variable in usnic_debugfs_exit() The variable qp_grp is initialized but never used otherwise, so remove the unused variable. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:50:14 -08:00
Upinder Malhi	6dcebe614c	IB/usnic: Set userspace/kernel ABI ver to 4 usNIC userspace/kernel ABI should be set to 4 instead of 3. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:48:54 -08:00
Upinder Malhi	61f7826893	IB/usnic: Advertise usNIC devices as RDMA_NODE_USNIC_UDP usNIC default transport is UDP. Hence, advertise RDMA_NODE_USNIC_UDP by default for usNIC devices. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:48:54 -08:00
Upinder Malhi	2d97436f5b	IB/usnic: Add dependency on CONFIG_INET usNIC needs inet notifiers to function correctly, so add a Kconfig dependency on CONFIG_INET. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:48:54 -08:00
Upinder Malhi	4942c0b4b6	IB/usnic: Fix endianness-related warnings Fix sparse endianness related warnings. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-18 13:48:54 -08:00
Matan Barak	dd5f03beb4	IB/core: Ethernet L2 attributes in verbs/cm structures This patch add the support for Ethernet L2 attributes in the verbs/cm/cma structures. When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority in a similar manner that the IB L2 (and the L4 PKEY) attributes are used. Thus, those attributes were added to the following structures: * ib_ah_attr - added dmac * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority) * ib_wc - added smac, vlan_id * ib_sa_path_rec - added smac, dmac, vlan_id * cm_av - added smac and vlan_id For the path record structure, extra care was taken to avoid the new fields when packing it into wire format, so we don't break the IB CM and SA wire protocol. On the active side, the CM fills. its internal structures from the path provided by the ULP. We add there taking the ETH L2 attributes and placing them into the CM Address Handle (struct cm_av). On the passive side, the CM fills its internal structures from the WC associated with the REQ message. We add there taking the ETH L2 attributes from the WC. When the HW driver provides the required ETH L2 attributes in the WC, they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks for the presence of these flags, and in their absence does address resolution from the ib_init_ah_from_wc() helper function. ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters are mandatory for Ethernet link layer, while they are irrelevant for IB. Vendor drivers are modified to support the new function signature. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:20:54 -08:00
Matan Barak	c1c9850112	IB/mlx4: Add support for steerable IB UD QPs This patch adds support for steerable (NETIF) QP creation. When we create the device, we allocate a range of steerable QPs. Afterward when a QP is created with the NETIF flag, it's allocated from this range. Allocation is managed by bitmap allocator. Internal steering rules for those QPs is automatically generated on their creation. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:06:50 -08:00
Matan Barak	a37a1a4284	IB/mlx4: Add mechanism to support flow steering over IB links The mlx4 device requires adding IB flow spec to rules that apply over infiniband link layer. This patch adds a mechanism to add such a rule. If higher levels e.g. IP/UDP/TCP flow specs are provided, the device requires us to add an empty wild-carded IB rule. Furthermore, the device requires the QPN to be put in the rule. Add here specific parsing support for IB empty rules and the ability to self-generate missing specs based on existing ones. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:06:50 -08:00
Matan Barak	0a9b7d59d5	IB/mlx4: Enable device-managed steering support for IB ports too Up until now, flow steering wasn't supported when using IB ports. This patch enables support for flow steering if all hardware ports support that, for example the new MLX4_DEV_CAP_FLAG2_DMFS_IPOIB mlx4 device capability. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 14:06:50 -08:00
Eli Cohen	c1be5232d2	IB/mlx5: Fix micro UAR allocator The micro UAR (uuar) allocator had a bug which resulted from the fact that in each UAR we only have two micro UARs avaialable, those at index 0 and 1. This patch defines iterators to aid in traversing the list of available micro UARs when allocating a uuar. In addition, change the logic in create_user_qp() so that if high class allocation fails (high class means lower latency), we revert to medium class and not to the low class. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 13:54:23 -08:00
Eli Cohen	d9fe409163	IB/mlx5: Remove unused code in mr.c The variable start in struct mlx5_ib_mr is never used. Remove it. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 13:54:23 -08:00
Upinder Malhi	3108bccb3d	IB/usnic: Append documentation to usnic_transport.h and cleanup Add comment describing usnic_transport_rsrv port and remove extraneous space from usnic_transport.c. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:51:00 -08:00
Roland Dreier	c30392ab5b	IB/usnic: Fix typo "Ignorning" -> "Ignoring" Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:46 -08:00
Upinder Malhi	9f637f7936	IB/usnic: Expose flows via debugfs Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:46 -08:00
Upinder Malhi	c5f855e08a	IB/usnic: Use for_each_sg instead of a for-loop Use for_each_sg() instead of an explicit for-loop to iterate over scatter-gather list. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:46 -08:00
Upinder Malhi	6a54d9f9a0	IB/usnic: Remove superflous parentheses Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:45 -08:00
Upinder Malhi	e45e614e40	IB/usnic: Add UDP support in usnic_ib_qp_grp.[hc] UDP support for qp_grps/qps. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:44 -08:00
Upinder Malhi	c7845bcafe	IB/usnic: Add UDP support in uverbs.c, umain.c and u*util.h Add supports for: 1) Parsing the socket file descriptor pass down from userspace. 2) IP notifiers 3) Encoding the IP in the GID 4) Other aux. changes to support UDP Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:44 -08:00
Upinder Malhi	6214105460	IB:usnic: Add UDP support to usnic_transport.[hc] This patch provides API for rest of usNIC code to increment or decrement socket's reference count. Auxiliary socket APIs are also provided. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:44 -08:00
Upinder Malhi	3f92bed3d6	IB/usnic: Add UDP support to usnic_fwd.[hc] Add ip field to struct usnic_fwd_dev as well as new functions to manipulate the ip field. Furthermore, add new functions for programming UDP flows in the forwarding device. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:43 -08:00
Upinder Malhi	b85caf479b	IB/usnic: Update ABI and Version file for UDP support Expand the kernel/userspace interface so userspace may push down a socket file descriptor to usNIC. Also, bump up the abi and version numbers. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:43 -08:00
Upinder Malhi	60b215e8b2	IB/usnic: Port over sysfs to new usnic_fwd.h This patch ports usnic_ib_sysfs.c to the new interface of usnic_fwd.h. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:42 -08:00
Upinder Malhi	256d6a6ac5	IB/usnic: Port over usnic_ib_qp_grp.[hc] to new usnic_fwd.h This patch ports usnic_ib_qp_grp.[hc] to the new interface of usnic_fwd.h. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:42 -08:00
Upinder Malhi	8af94ac66a	IB/usnic: Port over main.c and verbs.c to the usnic_fwd.h This patch ports usnic_ib_main.c, usnic_ib_verbs.c and usnic_ib.h to the new interface of usnic_fwd.h. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:42 -08:00
Upinder Malhi	2183b990b6	IB/usnic: Push all forwarding state to usnic_fwd.[hc] Push all of the usnic device forwarding state - such as mtu, mac - to usnic_fwd_dev. Furthermore, usnic_fwd.h exposes a improved interface for rest of the usnic code. The primary improvement is that usnic_fwd.h's flow management interface takes in high-level filter and action structures now, instead of low-level paramaters such as vnic_idx, rq_idx. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:41 -08:00
Upinder Malhi	301a0dd68e	IB/usnic: Add struct usnic_transport_spec Add struct usnic_transport_spec for passing around transport specifications. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:41 -08:00
Upinder Malhi	8192d4acb5	IB/usnic: Change WARN_ON to lockdep_assert_held usNIC calls WARN_ON(spin_is_locked..) at few places. In some of these instances, the call is made while holding a spinlock. Change all WARN_ON(spin_is_locked...) calls in usNIC to lockdep_assert_held to make it fool-proof bc the latter can be called while holding a spinlock and unlike spin_is_locked, lockdep_assert_held also works correctly on UP. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:40 -08:00
Upinder Malhi	e3cf00d0a8	IB/usnic: Add Cisco VIC low-level hardware driver This adds a driver that allows userspace to use UD-like QPs over a proprietary Cisco transport with Cisco's Virtual Interface Cards (VICs), including VIC 1240 and 1280 cards. Signed-off-by: Upinder Malhi <umalhi@cisco.com> Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-14 00:44:28 -08:00
Devesh Sharma	be8348df6e	RDMA/ocrdma: Fix OCRDMA_GEN2_FAMILY macro definition OCRDMA_GEN2_FAMILY is wrongly defined as 0x02 -- it should be 0x0F. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-13 13:14:33 -08:00
Devesh Sharma	fe5e8a1acc	RDMA/ocrdma: Fix AV_VALID bit position Fix ah->av->valid bit position and big endian portability. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-13 13:14:33 -08:00
Linus Torvalds	67e0c1b037	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Some holiday bug fixes for 3.13... There is still one bug I'd like to get fixed before 3.13-final. The vlan code erroneously assignes the header ops of the underlying real device to the VLAN device above it when the real device can hardware offload VLAN handling. That's completely bogus because header ops are tied to the device type, so they only expect to see a 'dev' argument compatible with their ops. The fix is the have the VLAN code use a special set of header ops that does the pass-thru correctly, by calling the underlying real device's header ops but _also_ passing in the real device instead of the VLAN device. That fix is currently waiting some testing. Anyways, of note here: 1) Fix bitmap edge case in radiotap, from Johannes Berg. 2) Fix oops on driver unload in rtlwifi, from Larry Finger. 3) Bonding doesn't do locking correctly during speed/duplex/link changes, from Ding Tianhong. 4) Fix header parsing in GRE code, this bug has been around for a few releases. From Timo Teräs. 5) SIT tunnel driver MTU check needs to take GSO into account, from Eric Dumazet. 6) Minor info leak in inet_diag, from Daniel Borkmann. 7) Info leak in YAM hamradio driver, from Salva Peiró. 8) Fix route expiration state handling in ipv6 routing code, from Li RongQing. 9) DCCP probe module does not check request_module()'s return value, from Wang Weidong. 10) cpsw driver passes NULL device names to request_irq(), from Mugunthan V N. 11) Prevent a NULL splat in RDS binding code, from Sasha Levin. 12) Fix 4G overflow test in tg3 driver, from Nithin Sujir. 13) Cure use after free in arc_emac and fec driver's software timestamp handling, from Eric Dumazet. 14) SIT driver can fail to release the route when iptunnel_handle_offloads() throws an error. From Li RongQing. 15) Several batman-adv fixes from Simon Wunderlich and Antonio Quartulli. 16) Fix deadlock during TIPC socket release, from Ying Xue. 17) Fix regression in ROSE protocol recvmsg() msg_name handling, from Florian Westphal. 18) stmmac PTP support releases wrong spinlock, from Vince Bridgers" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits) stmmac: Fix incorrect spinlock release and PTP cap detection. phy: IRQ cannot be shared net: rose: restore old recvmsg behavior xen-netback: fix guest-receive-side array sizes fec: Do not assume that PHY reset is active low tipc: fix deadlock during socket release netfilter: nf_tables: fix wrong datatype in nft_validate_data_load() batman-adv: fix vlan header access batman-adv: clean nf state when removing protocol header batman-adv: fix alignment for batadv_tvlv_tt_change batman-adv: fix size of batadv_bla_claim_dst batman-adv: fix size of batadv_icmp_header batman-adv: fix header alignment by unrolling batadv_header batman-adv: fix alignment for batadv_coded_packet netfilter: nf_tables: fix oops when updating table with user chains netfilter: nf_tables: fix dumping with large number of sets ipv6: release dst properly in ipip6_tunnel_xmit netxen: Correct off-by-one errors in bounds checks net: Add some clarification to skb_tx_timestamp() comment. arc_emac: fix potential use after free ...	2013-12-30 09:33:30 -08:00
Kumar Sanghvi	41b4f86c13	RDMA/cxgb4: Use cxgb4_select_ntuple to correctly calculate ntuple fields Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-22 18:09:08 -05:00
Kumar Sanghvi	8c04469057	RDMA/cxgb4: Server filters are supported only for IPv4 Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-22 18:09:08 -05:00
Kumar Sanghvi	a4ea025fc2	RDMA/cxgb4: Calculate the filter server TID properly Based on original work by Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-22 18:09:08 -05:00
Rashika	c00850dd6c	RDMA/cxgb4: Make _c4iw_write_mem_dma() static This patch marks the function _c4iw_write_mem_dma() as static because it is not used outside this file, which fixes the warning: drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-12-15 08:04:15 -08:00
Linus Torvalds	1ea406c0e0	Main batch of InfiniBand/RDMA changes for 3.13: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz VDvT9DEy6zjSMCLpMcdo =xxC+ -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...	2013-11-18 15:36:04 -08:00
Roland Dreier	b4fdf52b3f	Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next	2013-11-17 08:22:19 -08:00
Matan Barak	69ad5da41b	IB/core: Re-enable create_flow/destroy_flow uverbs This commit reverts commit `7afbddfae9` ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Yann Droneaud	f21519b23c	IB/core: extended command: an improved infrastructure for uverbs commands Commit `400dbc9658` ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit `436f2ad05a` ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit `400dbc9658`, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ \| flags \| 00 00 \| command \| \| in_words \| out_words \| +----------------------------------------+ \| response \| \| response \| \| provider_in_words \| provider_out_words \| \| padding \| +----------------------------------------+ \| \| . <uverbs input> . . (in_words * 8) . \| \| +----------------------------------------+ \| \| . <provider input> . . (provider_in_words * 8) . \| \| +----------------------------------------+ - response, if present: +----------------------------------------+ \| \| . <uverbs output space> . . (out_words * 8) . \| \| +----------------------------------------+ \| \| . <provider output space> . . (provider_out_words * 8) . \| \| +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-17 08:22:09 -08:00
Eli Cohen	cf1c5e1f1c	IB/mlx5: Fix page shift in create CQ for userspace When creating a CQ, we must use mlx5 adapter page shift. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:36 -08:00
Eli Cohen	79d3da9c51	IB/mlx4: Fix device max capabilities check Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:36 -08:00
Eli Cohen	7e2e19210a	IB/mlx5: Remove dead code The value of the local variable index is never used in reg_mr_callback(). Signed-off-by: Eli Cohen <eli@mellanox.com> [ Remove now-unused variable delta too. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 14:36:14 -08:00
Eli Cohen	93b80ac297	IB/mlx4: Fix endless loop in resize CQ When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-15 10:24:17 -08:00
Linus Torvalds	2f466d33f5	PCI changes for the v3.13 merge window: Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb Nh+M2WlUUA9Z3V6rWJB6 =CTPk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Resource management - Fix host bridge window coalescing (Alexey Neyman) - Pass type, width, and prefetchability for window alignment (Wei Yang) PCI device hotplug - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu) Power management - Remove pci_pm_complete() (Liu Chuansheng) MSI - Fail initialization if device is not in PCI_D0 (Yijing Wang) MPS (Max Payload Size) - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang) - Use pcie_set_readrq() to simplify code (Yijing Wang) - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang) SR-IOV - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas) - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang) Virtualization - Add x86 MSI masking ops (Konrad Rzeszutek Wilk) Freescale i.MX6 - Support i.MX6 PCIe controller (Sean Cross) - Increase link startup timeout (Marek Vasut) - Probe PCIe in fs_initcall() (Marek Vasut) - Fix imprecise abort handler (Tim Harvey) - Remove redundant of_match_ptr (Sachin Kamat) Renesas R-Car - Support Gen2 internal PCIe controller (Valentine Barshak) Samsung Exynos - Add MSI support (Jingoo Han) - Turn off power when link fails (Jingoo Han) - Add Jingoo Han as maintainer (Jingoo Han) - Add clk_disable_unprepare() on error path (Wei Yongjun) - Remove redundant of_match_ptr (Sachin Kamat) Synopsys DesignWare - Add irq_create_mapping() (Pratyush Anand) - Add header guards (Seungwon Jeon) Miscellaneous - Enable native PCIe services by default on non-ACPI (Andrew Murray) - Cleanup _OSC usage and messages (Bjorn Helgaas) - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas) - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman) - Remove unused pci_mem_start (Myron Stowe) - Make sysfs functions static (Sachin Kamat) - Warn on invalid return from driver probe (Stephen M. Cameron) - Remove Intel Haswell D3 delays (Todd E Brandt) - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu) - Use pci_is_pcie() to simplify code (Yijing Wang) - Use PCIe capability accessors to simplify code (Yijing Wang) - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang) - Removed unused "is_pcie" from struct pci_dev (Yijing Wang) - Simplify sysfs CPU affinity implementation (Yijing Wang)" * tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits) PCI: Enable upstream bridges even for VFs on virtual buses PCI: Add pci_upstream_bridge() PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() PCI: Warn on driver probe return value greater than zero PCI: Drop warning about drivers that don't use pci_set_master() PCI: Workaround missing pci_set_master in pci drivers powerpc/pci: Use pci_is_pcie() to simplify code [fix] PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms PCI: imx6: Probe the PCIe in fs_initcall() PCI: Add R-Car Gen2 internal PCI support PCI: imx6: Remove redundant of_match_ptr PCI: Report pci_pme_active() kmalloc failure mn10300/PCI: Remove useless pcibios_last_bus frv/PCI: Remove pcibios_last_bus PCI: imx6: Increase link startup timeout PCI: exynos: Remove redundant of_match_ptr PCI: imx6: Fix imprecise abort handler PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0 PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe() x86/PCI: Coalesce multiple overlapping host bridge windows ...	2013-11-14 14:02:00 +09:00
Al Viro	441a9d0e1e	qib_fs: fix (some) dcache abuses * lookup_one_len() really wants i_mutex held on directory. * leaks galore - just mount ipathfs, then cd /sys/bus/pci/drivers/qib_ib; echo ::. >unbind on a box with that card present and try to umount ipathfs... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 08:08:19 -05:00
Dave Jones	4127c365c9	RDMA/nes: Remove self-assignment from nes_query_qp() Assigning a value to itself is pointless. Spotted with coverity, no hardware to test. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-09 02:34:27 -08:00
Mike Marciniszyn	2fadd83184	IB/qib: Fix txselect regression Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in removing a simple_strtoul for a parse routine, setup_txselect(). That routine is required to handle a multi-value string. Unwind that aspect of the fix. Cc: <stable@vger.kernel.org> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Mike Marciniszyn	78a5886472	IB/qib: Fix checkpatch __packed warnings Convert __attribute__ ((packed)) to __packed. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:12 -08:00
Jan Kara	603e772992	IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Jan Kara	4adcf7fb67	IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast() ipath_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function ipath_user_sdma_queue_pkts() (and also ipath_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Merged in fix for call to get_user_pages_fast from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:11 -08:00
Naresh Gottumukkala	d5e3f37833	RDMA/ocrdma: Remove redundant check in ocrdma_build_fr() Remove the redundant check of comparing if a 32-bit value is greater than 0xffffffffULL. Reported by Dan Carpenter. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Naresh Gottumukkala	1852d1da3b	RDMA/ocrdma: Fix a crash in rmmod 1) ocrdma_remove_free() is called from a call_rcu callback funtion context, which can be a bottom-half context. So the code in ocrdma_remove_free should not sleep. But ocrdma_cleanup_hw() can sleep, So move it ocrdma_remove() instead of ocrdma_remove_free. 2) Fix a couple of kbuild test robot warnings. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:06 -08:00
Dan Carpenter	6ebacdfc07	RDMA/ocrdma: Silence an integer underflow warning We recently added a cap on "max_wqe_allocated" in `43a6b4025c` ('RDMA/ocrdma: Create IRD queue fix'). My static checker complains that the cap has a problem because it casts large values to negative. "attrs->cap.max_send_wr" is a u32. It comes from the user, but it's capped in ocrdma_check_qp_params() so it can't wrap here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:05 -08:00
Eli Cohen	1b77d2bd75	mlx5: Use enum to indicate adapter page size The Connect-IB adapter has an inherent page size which equals 4K. Define an new enum that equals the page shift and use it instead of using the value 12 throughout the code. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	c2a3431e61	IB/mlx5: Update opt param mask for RTS2RTS RTS to RTS transition should allow update of alternate path. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	07c9113fe8	IB/mlx5: Remove "Always false" comparison mlx5_cur and mlx5_new cannot have negative values so remove the redundant condition. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:01 -08:00
Eli Cohen	2d036fad94	IB/mlx5: Remove dead code in mr.c In mlx5_mr_cache_init() the size variable is not used so remove it to avoid compiler warnings when running with make W=1. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Eli Cohen	bf0bf77f65	mlx5: Support communicating arbitrary host page size to firmware Connect-IB firmware requires 4K pages to be communicated with the driver. This patch breaks larger pages to 4K units to enable support for architectures utilizing larger page size, such as PowerPC. This patch also fixes several places that referred to PAGE_SHIFT instead of explicit 12 which is the inherent page shift on Connect-IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:43:00 -08:00
Moshe Lazer	cfd8f1d49b	IB/mlx5: Fix srq free in destroy qp On destroy QP the driver walks over the relevant CQ and removes CQEs reported for the destroyed QP. It also frees the related SRQ entry without checking that this is actually an SRQ-related CQE. In case of a CQ used for both send and receive QP, we could free SRQ entries for send CQEs. This patch resolves this issue by verifying that this is a SRQ related CQE by checking the SRQ number in the CQE is not zero. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	1faacf82df	IB/mlx5: Simplify mlx5_ib_destroy_srq Make use of destroy_srq_kernel() to clear SRQ resouces. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	9641b74ebe	IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR Make sure not to overflow when reading the page list from struct ib_fast_reg_page_list. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	746b5583c1	IB/mlx5: Multithreaded create MR Use asynchronous commands to execute up to eight concurrent create MR commands. This is to fill memory caches faster so we keep consuming from there. Also, increase timeout for shrinking caches to five minutes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:59 -08:00
Eli Cohen	51ee86a4af	IB/mlx5: Fix check of number of entries in create CQ Verify that the value is non negative before rounding up to power of 2. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:58 -08:00
Ben Hutchings	649fb5ec0e	IB/cxgb4: Fix formatting of physical address Physical addresses may be wider than virtual addresses (e.g. on i386 with PAE) and must not be formatted with %p. Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-11-08 14:42:30 -08:00
Jack Morgenstein	571b8b92c7	net/mlx4_core: Initialize all mailbox buffers to zero before use To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-07 19:22:47 -05:00
Jack Morgenstein	5a0d0a6161	mlx4: Structures and init/teardown for VF resource quotas This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 16:19:07 -05:00
Yann Droneaud	7afbddfae9	IB/core: Temporarily disable create_flow/destroy_flow uverbs The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-21 09:44:17 -07:00
Roland Dreier	59b5b28d1a	Merge branch 'misc' into for-next	2013-10-14 10:10:46 -07:00
Joe Perches	2b50176d11	IB: Remove unnecessary semicolons These aren't necessary after switch blocks. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-14 10:10:00 -07:00
Eli Cohen	5431390707	IB/mlx5: Ensure proper synchronization accessing memory Call mlx5_ib_populate_pas() before mapping the DMA buffer to ensure the hardware reads the values written by the CPU. Found by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:24:00 -07:00
Eli Cohen	fe45f82704	IB/mlx5: Fix alignment of reg umr gather buffers The hardware requires that gather buffers for UMR work requests be aligned to 2K. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:59 -07:00
Sagi Grimberg	ada9f5d007	IB/mlx5: Fix eq names to display nicely in /proc/interrupts It's helpful for a driver to put the pci slot name in its interrupt names, so /proc/interrupts will show the pci slot of the device. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:59 -07:00
Eli Cohen	a4774e9095	IB/mlx5: Fix opt param mask according to firmware spec Failed to configure opt mask to configure rre from init to rtr. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	75959f56fe	mlx5: Fix opt param mask for sq err to rts transition Add missing entry in the table for UC transport. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	81bea28ffd	IB/mlx5: Disable atomic operations Currently Atomic operations don't work properly. Disable them for the time being. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:58 -07:00
Eli Cohen	a0c84c326f	IB/mlx5: Avoid async events on invalid port number On a single ported Connect-IB, its possible for the firmware to issue events on the non-existing 2nd port. Make sure to ignore events generated for such ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:57 -07:00
Eli Cohen	203099fd73	IB/mlx5: Decrease memory consumption of mr caches Change the logic so we do not allocate memory nor map the device before actually posting to the REG_UMR QP. In addition, unmap and free the memory after we get completion. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:56 -07:00
Moshe Lazer	56e1ab0f13	IB/mlx5: Fix memory leak in mlx5_ib_create_srq The patch fixes the rollback in case of failure in creating SRQ. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Moshe Lazer	3c4619114c	IB/mlx5: Flush cache workqueue before destroying it Destroying the workqueue without flushing it first can lead to a case in which the kernel tries to push a delayed work to the workqueue which does not exist anymore. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Eli Cohen	b125a54bfd	IB/mlx5: Fix send work queue size calculation 1. Make sure wqe_cnt does not exceed the limit published by firmware. 2. There is no requirement that the number of outstanding work requests will be a power of two. Remove the ilog2 in the calculation of sq.max_post to fix that. 3. Add case for IB_QPT_XRC_TGT in sq_overhead and return 0 as XRC target QPs do not have a send queue. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-10-10 09:23:55 -07:00
Bjorn Helgaas	03078633a6	IB/qib: Drop qib_tune_pcie_caps() and qib_tune_pcie_coalesce() return values The callers of qib_tune_pcie_caps() and qib_tune_pcie_coalesce() don't check the return values, so this patch drops the return values altogether. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-10-04 14:30:19 -06:00
Yijing Wang	0ce0e62f1f	IB/qib: Use pcie_set_mps() and pcie_get_mps() to simplify code Refactor qib_tune_pcie_caps(). Use pcie_get_mps(), pcie_set_mps(), pcie_get_readrq(), and pcie_set_readrq() to simplify the code. The PCI core caches the "PCIe Max Payload Size Supported" in pci_dev->pcie_mpss, so use that instead of pcie_capability_read_word(). Remove the unused val2fld() and fld2val(). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-09-24 14:17:06 -06:00
Yijing Wang	dcaa73dc34	IB/qib: Use pci_is_root_bus() to check whether it is a root bus Use pci_is_root_bus() instead of "if (bus->parent)" statement for better readability. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2013-09-24 12:13:03 -06:00
Martin Schwidefsky	0244ad004a	Remove GENERIC_HARDIRQ config option After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-09-13 15:09:52 +02:00
Linus Torvalds	2e515bf096	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "The usual trivial updates all over the tree -- mostly typo fixes and documentation updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits) doc: Documentation/cputopology.txt fix typo treewide: Convert retrun typos to return Fix comment typo for init_cma_reserved_pageblock Documentation/trace: Correcting and extending tracepoint documentation mm/hotplug: fix a typo in Documentation/memory-hotplug.txt power: Documentation: Update s2ram link doc: fix a typo in Documentation/00-INDEX Documentation/printk-formats.txt: No casts needed for u64/s64 doc: Fix typo "is is" in Documentations treewide: Fix printks with 0x%# zram: doc fixes Documentation/kmemcheck: update kmemcheck documentation doc: documentation/hwspinlock.txt fix typo PM / Hibernate: add section for resume options doc: filesystems : Fix typo in Documentations/filesystems scsi/megaraid fixed several typos in comments ppc: init_32: Fix error typo "CONFIG_START_KERNEL" treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks page_isolation: Fix a comment typo in test_pages_isolated() doc: fix a typo about irq affinity ...	2013-09-06 09:36:28 -07:00
Roland Dreier	82af24ac6f	Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	2013-09-03 09:01:08 -07:00
Roland Dreier	33ccbd858f	RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch Fix: drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function 'ocrdma_build_fr': >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1832:7: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] mr = (struct ocrdma_mr *)qp->dev->stag_arr[(hdr->lkey >> 8) & ^ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function 'ocrdma_alloc_frmr': >> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:2661:64: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] = (u64) mr; Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-03 09:00:08 -07:00
Ira Weiny	0318f68521	IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards Commit `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") caused statements to leak pass the header guard. Fix this. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:22:20 -07:00
Naresh Gottumukkala	d7e19c0ad9	RDMA/ocrdma: Fix passing wrong opcode to modify_srq Fix passing wrong opcode to ocrdma_modify_srq and query SRQ. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:46 -07:00
Naresh Gottumukkala	84b105db59	RDMA/ocrdma: Fill PVID in UMC case In UMC case, driver needs to fill PVID in the address vector template for UD traffic. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:45 -07:00
Naresh Gottumukkala	3875439715	RDMA/ocrdma: Add ABI versioning support Add ABI versioning support between driver and userspace library. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:45 -07:00
Naresh Gottumukkala	117e6dd1c5	RDMA/ocrdma: Consider multiple SGES in case of DPP While posting inline DPP data, we are not considering multiple sges. Fix this. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:44 -07:00
Naresh Gottumukkala	f24ceba6b6	RDMA/ocrdma: Fix for displaying proper link speed Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:43 -07:00
Naresh Gottumukkala	c43e9ab84d	RDMA/ocrdma: Increase STAG array size 1) Increase STAG Array size. 2) Max inline data size should be set to the same value used during QP creation 3) Set max_sge_rd to zero since we dont support RD transport in our adapters. 4) Max cqes reported in ibv_devinfo should be from QUERY_CONFIG. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:42 -07:00
Naresh Gottumukkala	cffce99051	RDMA/ocrdma: Dont use PD 0 for userpace CQ DB Create_CQ verb doesn't provide a PD pointer. So, until now we are creating all (both userspace and kernel) CQ DB regions from PD0. This will result in mmapping PD0 to applications. A rogue userspace application can mess things up. Also more serious issues is even the be2net NIC uses PD0. This patch addresses this problem by: 1) Create a PD page for every userspace application when the alloc_ucontext is called. This will be destroyed in dealloc_ucontext. 2) All CQs for that context will use the PD allocated in ucontext. 3) The first create_PD call from application will result in returning the PD address from its ucontext (no new PD will be created). 4) For subsecquent create_pd calls from application, we create new PDs for the application. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:18:32 -07:00
Naresh Gottumukkala	2b51a9b9eb	RDMA/ocrdma: FRMA code cleanup 1) Fixed setting FR_MR bit for FRWR stag allocation 2) Access rights are passsed during FRWR stage and not during STAT allocation stage 3) FRWR WQE structure cleanup 4) Add QP level signaled bit. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:17:56 -07:00
Naresh Gottumukkala	f11220ee69	RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24 1) All RQ doorbells are handled by ERX2 and doorbell->num_posted offset is constant to bit offset 24 for ERX2 irrspective of Q id. 2) Fixed RESET to INIT state change (from ERR->RST->INIT->RTR case). Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:17:55 -07:00
Naresh Gottumukkala	c88bd03ffc	RDMA/ocrdma: Fix to work with even a single MSI-X vector There are cases like SRIOV where can get only one MSI-X vector allocated for RoCE. In that case we need to use the vector for both data plane and control plane. We need to use EQ create version V2. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:17:54 -07:00
Naresh Gottumukkala	d3cb6c0b2a	RDMA/ocrdma: Remove the MTU check based on Ethernet MTU Also increase MAX AH to 512. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:17:53 -07:00
Naresh Gottumukkala	7c33880c3c	RDMA/ocrdma: Add support for fast register work requests (FRWR) Also get the max_srq value from query_config mailbox response. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:17:48 -07:00
Naresh Gottumukkala	43a6b4025c	RDMA/ocrdma: Create IRD queue fix 1) Fix ocrdma_get_num_posted_shift for upto 128 QPs. 2) Create for min of dev->max_wqe and requested wqe in create_qp. 3) As part of creating ird queue, populate with basic header templates. 4) Make sure all the DB memory allocated to userspace are page aligned. 5) Fix issue in checking the mmap local cache. 6) Some code cleanup. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-09-02 21:16:21 -07:00
Hadar Hen Zion	f77c0162a3	IB/mlx4: Add receive flow steering support Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-28 09:53:56 -07:00
Joe Perches	8be04b9374	treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks Don't emit OOM warnings when k.alloc calls fail when there there is a v.alloc immediately afterwards. Converted a kmalloc/vmalloc with memset to kzalloc/vzalloc. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-08-20 13:06:40 +02:00
Steve Wise	09992579bc	RDMA/cxgb4: Issue RI.FINI before closing when entering TERM Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:49 -07:00
Steve Wise	a2de1499b3	RDMA/cxgb4: Advertise ~0ULL as max MR size Lustre uses a advertised max MR size of ~0ULL to indicate it should use a dma_mr. Hence advertise max MR size as ~0ULL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:48 -07:00
Steve Wise	b298881fcf	RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK When polling, we do a GTS update if the accumulated cidx_inc == the CQ depth / 16. However, if the CQ is large enough, Cq depth / 16 exceeds the size of the field in the GTS word. So we also need to update if cidx_inc hits CIDXINC_MASK to avoid overflowing the field. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:47 -07:00
Steve Wise	b38a0ad8ec	RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages accept_cr() failed to set the arp error handler on a reused skb. This results in a kernel crash if the arp does indeed time out. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:47 -07:00
Steve Wise	27ca34f54a	RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap When determining how many WRs are completed with a signaled CQE, correctly deal with queue wraps. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:46 -07:00
Steve Wise	1cf24dcef4	RDMA/cxgb4: Fix QP flush logic This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:45 -07:00
Steve Wise	97d7ec0c41	RDMA/cxgb4: Handle newer firmware changes Move QP to TERMINATE instead to allow the peer to get the TERM message. This bug wasn't detectable until newer FW that moves connections out of RDMA mode as soon as an error is detected. QP can exit RTS before the last AE arrives. This was introduced by changes in the FW to kick connections out of RDMA mode as soon as an error is detected. A side effect of this is that the driver can move the QP out of RTS before the AE causing the connection to get kicked out of RDMA mode is processed. Fix for this is to always post async errors even if the QP is out of RTS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:44 -07:00
Steve Wise	68074bb1ab	RDMA/cxgb4: Use correct bit shift macros for vlan filter tuples Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:43 -07:00
Vipul Pandya	830662f6f0	RDMA/cxgb4: Add support for active and passive open connection with IPv6 address Add new cpl messages, cpl_act_open_req6 and cpl_t5_act_open_req6, for initiating active open connections. Use LLD api cxgb4_create_server and cxgb4_create_server6 for initiating passive open connections. Similarly use cxgb4_remove_server to remove the passive open connections in place of listen_stop. Add support for iWARP over VLAN device and enable IPv6 support on VLAN device. Make use of import_ep in c4iw_reconnect. Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fix build when IPv6 is disabled and make sure iw_cxgb4 is not built-in when ipv6 is a module. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:55:06 -07:00
Yijing Wang	b29b076394	IB/qib: Clean up unnecessary MSI/MSI-X capability find PCI core will initialize device MSI/MSI-X capability in pci_msi_init_pci_dev(). So device drivers should use pci_dev->msi_cap/msix_cap to determine whether a device supports MSI/MSI-X instead of using pci_find_capability(pci_dev, PCI_CAP_ID_MSI/MSIX). Access to PCIe device config space again will consume more time. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:17:23 -07:00
Paul Bolle	bea25e82c6	IB/qib: Make qib_driver static struct pci_driver qib_driver is only used in qib_init.c. Remove it from qib.h and make it static in qib_init.c. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:14:51 -07:00
CQ Tang	4668e4b527	IB/qib: Improve SDMA performance 1. The code accepts chunks of messages, and splits the chunk into packets when converting packets into sdma queue entries. Adjacent packets will use user buffer pages smartly to avoid pinning the same page multiple times. 2. Instead of discarding all the work when SDMA queue is full, the work is saved in a pending queue. Whenever there are enough SDMA queue free entries, pending queue is directly put onto SDMA queue. 3. An interrupt handler is used to progress this pending queue. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Fixed up sparse warnings. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-13 11:14:34 -07:00
Steve Wise	24d44a391f	RDMA/cma: Add IPv6 support for iWARP Modify the type of local_addr and remote_addr fields in struct iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold IPv6 and IPv4 addresses uniformly. Change the references of local_addr and remote_addr in cxgb4, cxgb3, nes and amso drivers to match this. However to be able to actully run traffic over IPv6, low-level drivers have to add code to support this. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> [ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 12:32:31 -07:00
Naresh Gottumukkala	45e86b33ec	RDMA/ocrdma: Cache recv DB until QP moved to RTR 1) In post recv, don't ring the DB doorbell if the QP is in RTR state. Cache the DB calls, until the QP is moved to RTS state. 2) Add max_rd_sge support to dev->attr. 3) Code cleanup in alloc_pd path. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 11:00:51 -07:00
Naresh Gottumukkala	7b9b1a596e	RDMA/ocrdma: Remove __packed 1) Remove __packed for structures. 2) Align and pad all ABI structure to 64 bit boundaries instead of using __packed. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:59:44 -07:00
Naresh Gottumukkala	057729cb23	RDMA/ocrdma: Remove driver QP state machine Remove QP state machine in ocrdma low-level driver and use on the core IB stack's instead. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	9c58726ba9	RDMA/ocrdma: Don't allow zero/invalid sgid usage Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	1afc0454b6	RDMA/ocrdma: Remove redundant dev reference Remove redundant dev reference from structures: 1) ocrdma_cq. 2) ocrdma_ah. 3) ocrdma_hw_mr. 4) ocrdma_mw. 5) ocrdma_srq. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:38 -07:00
Naresh Gottumukkala	f99b1649db	RDMA/ocrdma: Style and redundant code cleanup Code cleanup and remove redundant code: 1) redundant initialization removed 2) braces changed as per CodingStyle. 3) redundant checks removed 4) extra braces in return statements removed. 5) removed unused pd pointer from mr. 6) reorganized get_dma_mr() 7) fixed set_av() to return error on invalid sgid index. 8) reference to ocrdma_dev removed from struct ocrdma_pd. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-08-12 10:58:37 -07:00
Roland Dreier	569935db80	Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma' and 'qib' into for-next	2013-07-31 14:24:06 -07:00
Andi Shyti	618af3846b	mlx5_core: Variable may be used uninitialized In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size will be used uninitialized. Signed-off-by: Andi Shyti <andi@etezian.org> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:33 -07:00
Dan Carpenter	92b0ca7cb1	IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext() We don't set "resp.reserved". Since it's at the end of the struct that means we don't have to copy it to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:08 -07:00
Wei Yongjun	281d1a9211	IB/mlx5: Fix error return code in init_one() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 14:12:07 -07:00
Jack Morgenstein	3eac103f83	IB/mlx4: Use default pkey when creating tunnel QPs When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-31 12:22:12 -07:00
Roland Dreier	3c93f039d2	Revert "RDMA/nes: Fix compilation error when nes_debug is enabled" This reverts commit `bca1935ccd`, which removes variables nes_tcp_state_str and nes_iwarp_state_str, assuming that they aren't defined. However, they are defined within a #ifdef NES_DEBUG statement, which if enabled causes "defined but not used" compiler warning, when the variables are removed. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 15:48:35 -07:00
Mike Marciniszyn	b268e4db3d	IB/qib: Add err_decode() call for ring dump Commit `0b3ddf380c` ("Log all SDMA errors unconditionally") missed part of the patch. This also corrects a format warning when dma_addr_t is 32 bits on a 64 bit system. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:13:00 -07:00
Dan Carpenter	246fcdbc9d	RDMA/cxgb3: Fix stack info leak in iwch_create_cq() The "uresp.reserved" field isn't initialized on this path so it could leak uninitialized stack information to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:11:33 -07:00
Dan Carpenter	604296303f	RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq() We pass a few bytes of uninitialized stack memory to the user here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:10:35 -07:00
Dan Carpenter	63ea374957	RDMA/ocrdma: Fix several stack info leaks A grab bag of places which don't properly initialize stack data. I removed one place which cleared ".rsvd" because it's not needed now that I have added a memset() earlier in the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:09:16 -07:00
Dan Carpenter	ae1fe07f3f	RDMA/cxgb4: Fix stack info leak in c4iw_create_qp() "uresp.ma_sync_key" doesn't get set on this path so we leak 8 bytes of data. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-30 10:07:56 -07:00
Roland Dreier	3606b99971	RDMA/ocrdma: Remove unused include I'd like to remove rdma/ib_cache.h some day, so let's avoid proliferating uses of it unnecessarily. Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-26 10:03:04 -07:00
Linus Torvalds	c552441373	Main batch of InfiniBand/RDMA changes for 3.11 merge window: - AF_IB (native IB addressing) for CMA from Sean Hefty - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - Resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - Other small changes and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q GBYjHWMUpZXIFHQaH7am =nZh8 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - AF_IB (native IB addressing) for CMA from Sean Hefty - new mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - other small changes and fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) mlx5: Return -EFAULT instead of -EPERM IB/qib: Log all SDMA errors unconditionally IB/qib: Fix module-level leak mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() mlx5_core: Fixes for sparse warnings IB/mlx5: Make profile[] static in main.c mlx5: Fix parameter type of health_handler_t mlx5: Add driver for Mellanox Connect-IB adapters IB/core: Add reserved values to enums for low-level driver use IB/srp: Bump driver version and release date IB/srp: Make HCA completion vector configurable IB/srp: Maintain a single connection per I_T nexus IB/srp: Fail I/O fast if target offline IB/srp: Skip host settle delay IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Fix remove_one crash due to resource exhaustion IB/qib: New transmitter tunning settings for Dell 1.1 backplane IB/core: Fix error return code in add_port() ...	2013-07-13 12:57:21 -07:00
Roland Dreier	e04abfa243	Merge branches 'mlx5', 'qib' and 'srp' into for-next	2013-07-11 16:49:30 -07:00
Dan Carpenter	5e631a03af	mlx5: Return -EFAULT instead of -EPERM For copy_to/from_user() failure, the correct error code is -EFAULT not -EPERM. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:48:45 -07:00
Dean Luick	0b3ddf380c	IB/qib: Log all SDMA errors unconditionally This patch adds code to log SDMA errors for supportability purposes. Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:47:06 -07:00
Mike Marciniszyn	308c813b19	IB/qib: Fix module-level leak The vzalloc()'ed field physshadow is leaked on module unload. This patch adds vfree after the sibling page shadow is freed. Reported-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-11 16:46:44 -07:00
Linus Torvalds	496322bc91	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit `0a4db187a9` ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...	2013-07-09 18:24:39 -07:00
Roland Dreier	0eba551148	Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' into for-next	2013-07-08 11:22:11 -07:00
Roland Dreier	ad32b95f82	IB/mlx5: Make profile[] static in main.c Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 10:32:32 -07:00
Eli Cohen	e126ba97db	mlx5: Add driver for Mellanox Connect-IB adapters The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4, except that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core is essentially a library that provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-07-08 10:32:24 -07:00
Linus Torvalds	74b9272bbe	Device tree updates for v3.11 This branch contains the following changes: - Removal of CONFIG_OF_DEVICE, it is always enabled by CONFIG_OF - Remove #ifdef from linux/of_platform.h to increase compiler syntax coverage - Bug fix for address decoding on Bimini and js2x powerpc platforms. - miscellaneous binding changes One note on the above. The binding changes going in from all kinds of different trees has gotten rather out of hand. I picked up some during this cycle, but even going though my tree isn't a great fit. Ian Campbell has prototyped splitting the bindings and .dtb files into a separate repository. The plan is to migrate to using that sometime in the next few kernel releases which should get rid of a lot of the churn on binding docs and .dts files. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR1fP3AAoJEEFnBt12D9kB3IIP/0Q5ctMespiJ50+ThjGsaR3m sUbQkMK46uL/oupXaJT2ybX2PxLN5LpgvO9rPt77hblOoL0+wZt+j9G0pLy1qZQZ aHprH9SrpGJv6F0SFbHp/+D/m9vESPv+zwYzL9TvrOALvCD7OSZ7tHLaoF7Y1ADM QnZa7pta3Owpu5NsGXaTXLpaZzfXzfWzf4PDzv2FsAIDbtuVJZGJZ7sJVO7Z0r+K KCY85uKJ4VOHY0onBVlM6uoCnopOi2XMMkyxYvR28lL2Kiv2b3np46jG3zX1EZH5 Qxdu85QZn2oio9iaTeYKK8bG9aRIRsXnzCnF2s68n2rQlEtPpWKN9Lj2AS/KJ+Ig obFTOFDHmxt1F4GIA0/HIPkDvRd7GTIwgwYYubEMi44E3Mae0N+xzkIRE41vYP7s 8zaNHbjAjsYjplsvN5gTPxxiU/ta24a5bl7Ont2zmOjAbXCsDajm4NCKZRJ3lb2f FHNsS1zHGmqgJ9zt13GQabo/Tp4t3KwTzBirPQsDokRO4eoL6klcS3GCRv82VWC0 dLnzu92hXcyXgh7mX2sj6sRBSwNygxMn4ZsZJklle38/LynvtrzT72BOZjghS+Vh l553uDInjSJ3IBrXnClPoyObcu50cmsBBgsK39FzU+MF9mcCHmkHQiT52zM6ZW3M wwY1OfcZk3XaT7akcVu6 =CndB -----END PGP SIGNATURE----- Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux Pull device tree updates from Grant Likely: "This branch contains the following changes: - Removal of CONFIG_OF_DEVICE, it is always enabled by CONFIG_OF - Remove #ifdef from linux/of_platform.h to increase compiler syntax coverage - Bug fix for address decoding on Bimini and js2x powerpc platforms. - miscellaneous binding changes One note on the above. The binding changes going in from all kinds of different trees has gotten rather out of hand. I picked up some during this cycle, but even going though my tree isn't a great fit. Ian Campbell has prototyped splitting the bindings and .dtb files into a separate repository. The plan is to migrate to using that sometime in the next few kernel releases which should get rid of a lot of the churn on binding docs and .dts files" * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: of: Fix address decoding on Bimini and js2x machines of: remove CONFIG_OF_DEVICE usb: chipidea: depend on CONFIG_OF instead of CONFIG_OF_DEVICE of: remove of_platform_driver ibmebus: convert of_platform_driver to platform_driver driver core: move to_platform_driver to platform_device.h mfd: DT bindings for the palmas family MFD ARM: dts: omap3-devkit8000: fix NAND memory binding of/base: fix typos of: remove #ifdef from linux/of_platform.h	2013-07-04 15:51:45 -07:00
Kees Cook	02aa2a3763	drivers: avoid format string in dev_set_name Calling dev_set_name with a single paramter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents, including wrappers like device_create*() and bdi_register(). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-07-03 16:07:41 -07:00
Mitko Haralanov	22baa407f9	IB/qib: New transmitter tunning settings for Dell 1.1 backplane The Dell blade chassis got an updated backplane which requires new transmitter tuning settings. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-26 09:27:50 -07:00
Wei Yongjun	c94e15c5cb	RDMA/ocrdma: Fix error return code in ocrdma_set_create_qp_rq_cmd() Fix to return -ENOMEM in the alloc dma coherent error case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-24 13:50:45 -07:00
Mike Marciniszyn	1dd173b01f	IB/qib: Add qp_stats debug file This adds a seq_file iterator for reporting the QP hash table when the qp_stats file is read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:52 -07:00
Mike Marciniszyn	17db3a92c1	IB/qib: Add per-context stats interface This patch adds a debugfs stats interface for per kernel contexts packet counts. The code uses the opcode stats count and eliminates the counter in the context. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:51 -07:00
Mike Marciniszyn	ddb8876589	IB/qib: Convert opcode counters to per-context This fix changes the opcode relative counters for receive to per context. Profiling has shown that when mulitple contexts are being used there is a lot of cache activity associated with these counters. The code formerly kept these counters per port, but only provided the interface to read per HCA. This patch converts the read of counters to per HCA and adds the debugfs hooks to be able to read the file as a sequence of opcodes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:50 -07:00
Mike Marciniszyn	85caafe307	IB/qib: Optimize CQ callbacks The current workqueue implemention has the following performance deficiencies on QDR HCAs: - The CQ call backs tend to run on the CPUs processing the receive queues - The single thread queue isn't optimal for multiple HCAs This patch adds a dedicated per HCA bound thread to process CQ callbacks. Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:49 -07:00
Ramkrishna Vepa	c804f07248	IB/qib: Add dual-rail NUMA awareness for PSM processes The driver currently selects a HCA based on the algorithm that PSM chooses, contexts within a HCA or across. The HCA can also be chosen by the user. Either way, this patch assigns a CPU on the NUMA node local to the selected HCA. This patch also tries to select the HCA closest to the NUMA node of the CPU assigned via taskset to PSM process. If this HCA is unusable then another unit is selected based on the algorithm that is currently enforced or selected by PSM - round robin context selection 'within' or 'across' HCA's. Fixed a bug wherein contexts are setup on the NUMA node on which the processes are opened (setup_ctxt()) and not on the NUMA node that the driver recommends the CPU on. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:49 -07:00
Ramkrishna Vepa	e0f30baca1	IB/qib: Add optional NUMA affinity This patch adds context relative numa affinity conditioned on the module parameter numa_aware. The qib_ctxtdata has an additional node_id member and qib_create_ctxtdata() has an addition node_id parameter. The allocations within the hdr queue and eager queue setup routines now take this additional member and adjust allocations as necesary. PSM will pass the either current numa node or the node closest to the HCA depending on numa_aware. Verbs will always use the node closest to the HCA. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:48 -07:00
Vinit Agnihotri	ab4a13d69b	IB/qib: Update minor version number External PSM repositories have advanced the minor number for a variety of reasons. The driver needs to increase to avoid warnings. Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:47 -07:00
Mike Marciniszyn	f7cf9a618b	IB/qib: Remove atomic_inc_not_zero() from QP RCU Follow Documentation/RCU/rcuref.txt guidance in removing atomic_inc_not_zero() from QP RCU implementation. This patch also removes an unneeded synchronize_rcu() in the add path. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:46 -07:00
Mike Marciniszyn	8469ba39a6	IB/qib: Add DCA support This patch adds DCA cache warming for systems that support DCA. The code uses cpu affinity notification to react to an affinity change from a user mode program like irqbalance and (re-)program the chip accordingly. This notification avoids reading the current cpu on every interrupt. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> [ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to build due to undefined struct irq_affinity_notify. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-21 17:19:38 -07:00
Naresh Gottumukkala	9884bcdca3	RDMA/ocrdma: Reorg structures to avoid padding Reorg structures to better packing to avoid cacheline padding. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:15 -07:00
Naresh Gottumukkala	df176ea074	RDMA/ocrdma: Change macros to inline funtions Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:14 -07:00
Naresh Gottumukkala	f6ddcf7107	RDMA/ocrdma: Set bad_wr in error case Fix post_send to set the bad_wr in error case. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:14 -07:00
Naresh Gottumukkala	ef99c4c2ed	RDMA/ocrdma: Replace ocrdma_err with pr_err Remove private macro ocrdma_err and replace with standard pr_err. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:14 -07:00
Naresh Gottumukkala	b1d58b9919	RDMA/ocrdma: Use MCC_CREATE_EXT_V1 for MCC create Use MCC_CREATE_EXT_V1 to create MCC_queue to receive RoCE events. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:14 -07:00
Gottumukkala, Naresh	27159f5087	RDMA/ocrdma: Remove use_cnt for queues Remove use_cnt. Rely on IB midlayer to keep track of the use count. Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:13 -07:00
Wei Yongjun	f29fa1cf34	IB/ehca: Fix error return code in ehca_create_slab_caches() Fix to return -ENOMEM in the kmem_cache_create() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:52:04 -07:00
Dan Carpenter	21bfd47062	RDMA/cxgb3: Timeout condition is never true This is a static checker fix. "count" is unsigned so it's never -1. Since "count" is 16 bits and the addition operation is implicitly casted to int then there is no wrapping here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-06-20 04:51:52 -07:00

... 4 5 6 7 8 ...

2706 Commits