OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Logan Gunthorpe	2b0569b3b7	NTB: Add MSI interrupt support to ntb_transport Introduce the module parameter 'use_msi' which, when set, uses MSI interrupts instead of doorbells for each queue pair (QP). The parameter is only available if NTB MSI support is configured into the kernel. We also require there to be more than one memory window (MW) so that an extra one is available to forward the APIC region. To use MSIs, we request one interrupt per QP and forward the MSI address and data to the peer using scratch pad registers (SPADS) above the MW SPADS. (If there are not enough SPADS the MSI interrupt will not be used.) Once registered, we simply use ntb_msi_peer_trigger and the receiving ISR simply queues up the rxc_db_work for the queue. This addition can significantly improve performance of ntb_transport. In a simple, untuned, apples-to-apples comparision using ntb_netdev and iperf with switchtec hardware, I see 3.88Gb/s without MSI interrupts and 14.1Gb/s wit MSI, which is a more than 3x improvement. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2019-06-13 09:03:04 -04:00
Logan Gunthorpe	51cb8dbf13	NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed Dan Carpenter's static checker reported: drivers/ntb/ntb_transport.c:1926 ntb_transport_create_queue() error: we previously assumed 'qp->tx_dma_chan' could be null (see line 1872) This is because the tx_mw_dma_addr is uninitialized in this function and may be incorrectly released using a NULL DMA channel. In practice this bug will not likely be seen. I'd guess you could hit this if you loaded ntb_netdev with use_dma=True, then unloaded it and loaded it again after setting the module parameter to use_dma=False. To fix this, we simply ensure that tx_mw_dma_addr is always initialized to zero. This is the safest in case any other part of the code operates on it if it is non-zero. Fixes: `c59666bb32` ("NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2019-06-13 08:58:12 -04:00
Logan Gunthorpe	c59666bb32	NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA Presently, when ntb_transport is used with DMA and the IOMMU turned on, it fails with errors from the IOMMU such as: DMAR: DRHD: handling fault status reg 202 DMAR: [DMA Write] Request device [00:04.0] fault addr 381fc0340000 [fault reason 05] PTE Write access is not set This is because ntb_transport does not map the BAR space with the IOMMU. To fix this, we map the entire MW region for each QP after we assign the DMA channel. This prevents needing an extra DMA map in the fast path. Link: https://lore.kernel.org/linux-pci/499934e7-3734-1aee-37dd-b42a5d2a2608@intel.com/ Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2019-02-11 09:26:30 -05:00
Joey Zhang	9143595a7e	NTB: ntb_transport: Free MWs in ntb_transport_link_cleanup() If NTB peer host crashes or reboots, the NTB transport link will be down and the MWs of NTB transport will be invalid. But the ntb_transport_link_cleanup() does not free these invalid MWs. When the NTB peer host is recovered later, NTB transport link will be up and the ntb_set_mw() will not reset up MWs. Because the MWs of NTB transport are invalid, the NTB transport will not work. We can fix it by freeing MWs when NTB transport link is down, then the ntb_set_mw() will reset up MWs when NTB transport link is up. Signed-off-by: Joey Zhang <joey.zhang@microchip.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2019-02-11 09:26:05 -05:00
Aaron Sierra	fc5d1829f9	NTB: transport: Try harder to alloc an aligned MW buffer Be a little wasteful if the (likely CMA) message window buffer is not suitably aligned after our first attempt; allocate a buffer twice as big as we need and manually align our MW buffer within it. This was needed on Intel Broadwell DE platforms with intel_iommu=off Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2018-10-31 21:20:05 -04:00
Gustavo A. R. Silva	846429bc99	ntb: ntb_transport: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1373888 ("Missing break in switch") Addresses-Coverity-ID: 1373889 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Allen Hubbe <allenbh@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2018-10-31 21:20:05 -04:00
Linus Torvalds	b08fc5277a	- Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees) -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7 oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2 zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8 tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3 BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3 LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh AtfvEeaPHaOyD8/h2Q== =zUUp -----END PGP SIGNATURE----- Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...	2018-06-12 18:28:00 -07:00
Kees Cook	590b5b7d86	treewide: kzalloc_node() -> kcalloc_node() The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This patch replaces cases of: kzalloc_node(a * b, gfp, node) with: kcalloc_node(a * b, gfp, node) as well as handling cases of: kzalloc_node(a * b * c, gfp, node) with: kzalloc_node(array3_size(a, b, c), gfp, node) as it's slightly less ugly than: kcalloc_node(array_size(a, b), c, gfp, node) This does, however, attempt to ignore constant size factors like: kzalloc_node(4 * 1024, gfp, node) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc_node( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc_node( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc_node( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc_node( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc_node( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc_node + kcalloc_node ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc_node( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc_node( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc_node( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc_node( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc_node( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc_node( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc_node( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc_node(C1 * C2 * C3, ...) \| kzalloc_node( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc_node( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc_node(sizeof(THING) * C2, ...) \| kzalloc_node(sizeof(TYPE) * C2, ...) \| kzalloc_node(C1 * C2 * C3, ...) \| kzalloc_node(C1 * C2, ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc_node + kcalloc_node ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc_node + kcalloc_node ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc_node + kcalloc_node ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Jia-Ju Bai	c9160b6925	ntb: ntb_transport: Replace GFP_ATOMIC with GFP_KERNEL in ntb_transport_create_queue ntb_transport_create_queue() is never called in atomic context. ntb_transport_create_queue() is only called by ntb_netdev_probe(), which is set as ".probe" in struct ntb_transport_client. Despite never getting called from atomic context, ntb_transport_create_queue() calls kzalloc_node() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2018-06-11 15:20:59 -04:00
Jia-Ju Bai	82edcc758f	ntb: ntb_transport: Replace GFP_ATOMIC with GFP_KERNEL in ntb_transport_setup_qp_mw ntb_transport_setup_qp_mw() is never called in atomic context. ntb_transport_setup_qp_mw() is only called by ntb_transport_link_work(), which is set as a parameter of INIT_DELAYED_WORK() in ntb_transport_probe(). Despite never getting called from atomic context, ntb_transport_setup_qp_mw() calls kzalloc_node() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2018-06-11 15:20:59 -04:00
Logan Gunthorpe	cbd27448fa	ntb_transport: Fix bug with max_mw_size parameter When using the max_mw_size parameter of ntb_transport to limit the size of the Memory windows, communication cannot be established and the queues freeze. This is because the mw_size that's reported to the peer is correctly limited but the size used locally is not. So the MW is initialized with a buffer smaller than the window but the TX side is using the full window. This means the TX side will be writing to a region of the window that points nowhere. This is easily fixed by applying the same limit to tx_size in ntb_transport_init_queue(). Fixes: `e26a5843f7` ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2018-01-28 22:17:23 -05:00
Logan Gunthorpe	980c41c86b	NTB: Ensure ntb_mw_get_align() is only called when the link is up With Switchtec hardware it's impossible to get the alignment parameters for a peer's memory window until the peer's driver has configured its windows. Strictly speaking, the link doesn't have to be up for this, but the link being up is the only way the client can tell that the other side has been configured. This patch converts ntb_transport and ntb_perf to use this function after the link goes up. This simplifies these clients slightly because they no longer have to store the alignment parameters. It also tweaks ntb_tool so that peer_mw_trans will print zero if it is run before the link goes up. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-11-18 20:37:11 -05:00
Dave Jiang	f3fd2afed8	ntb: transport shouldn't disable link due to bogus values in SPADs It seems that under certain scenarios the SPAD can have bogus values caused by an agent (i.e. BIOS or other software) that is not the kernel driver, and that causes memory window setup failure. This should not cause the link to be disabled because if we do that, the driver will never recover again. We have verified in testing that this issue happens and prevents proper link recovery. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Fixes: `84f766855f` ("ntb: stop link work when we do not have memory")	2017-08-01 13:31:44 -04:00
Logan Gunthorpe	bc240eec4b	ntb: use correct mw_count function in ntb_tool and ntb_transport After converting to the new API, both ntb_tool and ntb_transport are using ntb_mw_count to iterate through ntb_peer_get_addr when they should be using ntb_peer_mw_count. This probably isn't an issue with the Intel and AMD drivers but this will matter for any future driver with asymetric memory window counts. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Fixes: `443b9a14ec` ("NTB: Alter MW API to support multi-ports devices")	2017-07-17 12:56:15 -04:00
Serge Semin	d67288a395	NTB: Alter Scratchpads API to support multi-ports devices Even though there is no any real NTB hardware, which would have both more than two ports and Scratchpad registers, it is logically correct to have Scratchpad API accepting a peer port index as well. Intel/AMD drivers utilize Primary and Secondary topology to split Scratchpad between connected root devices. Since port-index API introduced, Intel/AMD NTB hardware drivers can use device port to determine which Scratchpad registers actually belong to local and peer devices. The same approach can be used if some potential hardware in future will be multi-port and have some set of Scratchpads. Here are the brief of changes in the API: ntb_spad_count() - return number of Scratchpads per each port ntb_peer_spad_addr(pidx, sidx) - address of Scratchpad register of the peer device with pidx-index ntb_peer_spad_read(pidx, sidx) - read specified Scratchpad register of the peer with pidx-index ntb_peer_spad_write(pidx, sidx) - write data to Scratchpad register of the peer with pidx-index Since there is hardware which doesn't support Scratchpad registers, the corresponding API methods are now made optional. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-07-06 11:30:07 -04:00
Serge Semin	443b9a14ec	NTB: Alter MW API to support multi-ports devices Multi-port NTB devices permit to share a memory between all accessible peers. Memory Windows API is altered to correspondingly initialize and map memory windows for such devices: ntb_mw_count(pidx); - number of inbound memory windows, which can be allocated for shared buffer with specified peer device. ntb_mw_get_align(pidx, widx); - get alignment and size restriction parameters to properly allocate inbound memory region. ntb_peer_mw_count(); - get number of outbound memory windows. ntb_peer_mw_get_addr(widx); - get mapping address of an outbound memory window If hardware supports inbound translation configured on the local ntb port: ntb_mw_set_trans(pidx, widx); - set translation address of allocated inbound memory window so a peer device could access it. ntb_mw_clear_trans(pidx, widx); - clear the translation address of an inbound memory window. If hardware supports outbound translation configured on the peer ntb port: ntb_peer_mw_set_trans(pidx, widx); - set translation address of a memory window retrieved from a peer device ntb_peer_mw_clear_trans(pidx, widx); - clear the translation address of an outbound memory window Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-07-06 11:30:07 -04:00
Serge Semin	1e5301196a	NTB: Add indexed ports NTB API There is some NTB hardware, which can combine more than just two domains over NTB. For instance, some IDT PCIe-switches can have NTB-functions activated on more than two-ports. The different domains are distinguished by ports they are connected to. So the new port-related methods are added to the NTB API: ntb_port_number() - return local port ntb_peer_port_count() - return number of peers local port can connect to ntb_peer_port_number(pdix) - return port number by it index ntb_peer_port_idx(port) - return port index by it number Current test-drivers aren't changed much. They still support two-ports devices for the time being while multi-ports hardware drivers aren't added. By default port-related API is declared for two-ports hardware. So corresponding hardware drivers won't need to implement it. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-07-06 11:30:07 -04:00
Allen Hubbe	88931ec3dc	ntb: no sleep in ntb_async_tx_submit Do not sleep in ntb_async_tx_submit, which could deadlock. This reverts commit "8c874cc140d667f84ae4642bb5b5e0d6396d2ca4" Fixes: `8c874cc140` ("NTB: Address out of DMA descriptor issue with NTB") Reported-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-06-19 14:24:41 -04:00
Logan Gunthorpe	8e8496e0e9	ntb_transport: fix bug calculating num_qps_mw A divide by zero error occurs if qp_count is less than mw_count because num_qps_mw is calculated to be zero. The calculation appears to be incorrect. The requirement is for num_qps_mw to be set to qp_count / mw_count with any remainder divided among the earlier mws. For example, if mw_count is 5 and qp_count is 12 then mws 0 and 1 will have 3 qps per window and mws 2 through 4 will have 2 qps per window. Thus, when mw_num < qp_count % mw_count, num_qps_mw is 1 higher than when mw_num >= qp_count. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Fixes: `e26a5843f7` ("NTB: Split ntb_hw_intel and ntb_transport drivers") Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-06-19 14:24:41 -04:00
Logan Gunthorpe	cb827ee6cc	ntb_transport: fix qp count bug In cases where there are more mw's than spads/2-2, the mw count gets reduced to match the limitation. ntb_transport also tries to ensure that there are fewer qps than mws but uses the full mw count instead of the reduced one. When this happens, the math in 'ntb_transport_setup_qp_mw' will get confused and result in a kernel paging request bug. This patch fixes the bug by reducing qp_count to the reduced mw count instead of the full mw count. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Fixes: `e26a5843f7` ("NTB: Split ntb_hw_intel and ntb_transport drivers") Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-06-19 14:24:41 -04:00
Thomas VanSelus	8fcd0950c0	ntb_transport: Pick an unused queue Fix typo causing ntb_transport_create_queue to select the first queue every time, instead of using the next free queue. Signed-off-by: Thomas VanSelus <tvanselus@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Fixes: `fce8a7bb5` ("PCI-Express Non-Transparent Bridge Support") Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-02-16 23:11:26 -05:00
Allen Hubbe	dd62245e73	NTB: ntb_transport: fix debugfs_remove_recursive The call to debugfs_remove_recursive(qp->debugfs_dir) of the sub-level directory must not be later than debugfs_remove_recursive(nt_debugfs_dir) of the top-level directory. Otherwise, the sub-level directory will not exist, and it would be invalid (panic) to attempt to remove it. This removes the top-level directory last, after sub-level directories have been cleaned up. Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com> Fixes: `e26a5843f` ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jon Mason <jdmason@kudzu.us>	2017-02-16 23:10:52 -05:00
Steve Wahl	dfb7d24c5a	ntb_transport: Remove unnecessary call to ntb_peer_spad_read The results were previously ignored, anyway. Signed-off-by: Steve Wahl <Steve.Wahl@dell.com> Fixes: `e26a5843f7` Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-12-23 16:11:07 -05:00
Shyam Sundar S K	b17faba03f	ntb_transport: Limit memory windows based on available, scratchpads When the underlying NTB H/W driver advertises more memory windows than the number of scratchpads available to setup MW's, it is likely that we may end up filling the remaining memory windows with garbage. So to avoid that, lets limit the memory windows that transport driver can setup based on the available scratchpads. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-12-23 16:10:35 -05:00
Nicholas Mc Guire	c0a88032ef	ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent schedule_timeout_* takes a timeout in jiffies but the code currently is passing in a constant which makes this timeout HZ dependent, so pass it through msecs_to_jiffies() to fix this up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-11-13 16:48:29 -05:00
Dave Jiang	72203572af	ntb: add DMA error handling for RX DMA Adding support on the rx DMA path to allow recovery of errors when DMA responds with error status and abort all the subsequent ops. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: linux-ntb@googlegroups.com Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2016-08-08 08:11:42 +05:30
Dave Jiang	9cabc2691e	ntb: add DMA error handling for TX DMA Adding support on the tx DMA path to allow recovery of errors when DMA responds with error status and abort all the subsequent ops. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: linux-ntb@googlegroups.com Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2016-08-08 08:11:42 +05:30
Logan Gunthorpe	19645a0771	ntb_transport: Check the number of spads the hardware supports I'm working on hardware that currently has a limited number of scratchpad registers and ntb_ndev fails with no clue as to why. I feel it is better to fail early and provide a reasonable error message then to fail later on. The same is done to ntb_perf, but it doesn't currently require enough spads to actually fail. I've also removed the unused SPAD_MSG and SPAD_ACK enums so that MAX_SPAD accurately reflects the number of spads used. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-08-05 10:21:06 -04:00
Dave Jiang	a754a8fcaf	NTB: allocate number transport entries depending on size of ring size Currently we only allocate a fixed default number of descriptors for the tx and rx side. We should dynamically resize it to the number of descriptors resides in the transport rings. We should know the number of transmit descriptors at initializaiton. We will allocate the default number of descriptors for receive side and allocate additional ones when we know the actual max entries for receive. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <allen.hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-08-05 10:21:05 -04:00
Dave Jiang	84f766855f	ntb: stop link work when we do not have memory Instead of keep trying to go through the init routine when we aren't able to allocate memory, we should just stop and go down. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-03-17 20:38:40 -04:00
Dave Jiang	e902133162	ntb: stop tasklet from spinning forever during shutdown. We can leave tasklet spinning forever if we disable the tasklet during qp shutdown and the tasklets are still being kicked off. This hopefully should avoid that race condition. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Alex Depoutovitch <alex@pernixdata.com> Tested-by: Alex Depoutovitch <alex@pernixdata.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-03-17 20:38:40 -04:00
Dave Jiang	8c874cc140	NTB: Address out of DMA descriptor issue with NTB The transport right now does not handle the case where we run out of DMA descriptors. We just fail when we do not succeed. Adding code to retry for a bit attempting to use the DMA engine instead of instantly fail to CPU copy. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-01-11 10:01:27 -05:00
Jon Mason	179f912a39	NTB: ntb_process_tx error path bug The transmit overrun avoidance error path in ntb_process_tx accidentally swapped the first two values being passed to the tx_handler client. This could result in crashes in the ntb_netdev (or other out-of-tree NTB clients). Reported-by: Alex Depoutovitch <alex@pernixdata.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-01-11 09:51:17 -05:00
Arnd Bergmann	fdcb4b2e78	NTB: fix 32-bit compiler warning resource_size_t may be 32-bit wide on some architectures, which causes this warning when building the NTB code: drivers/ntb/ntb_transport.c: In function 'ntb_transport_link_work': drivers/ntb/ntb_transport.c:828:46: warning: right shift count >= width of type [-Wshift-count-overflow] The warning is harmless but can be avoided by using the upper_32_bits() macro. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `e26a5843f7` ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-11-08 16:24:43 -05:00
Jon Mason	c92ba3c5d9	NTB: invalid buf pointer in multi-MW setups Order of operations issue with the QP Num and MW count, which would result in the receive buffer pointer being invalid if there are more than 1 MW. Corrected with parenthesis to enforce the proper order of operations. Reported-by: John I. Kading <John.Kading@gd-ms.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-11-08 16:11:21 -05:00
Sudip Mukherjee	70d4687d60	NTB: remove unused variable These variables were not used anywhere. So remove them. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-11-08 16:11:21 -05:00
Sudip Mukherjee	d4adee09fd	NTB: fix access of free-ed pointer We were accessing nt->mw_vec after freeing it. Fix the error path so that we free nt->mw_vec after we have finished using it. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-11-08 16:11:21 -05:00
Dave Jiang	04afde45e0	NTB: Fix issue where we may be accessing NULL ptr smatch detected an issue in the function ntb_transport_max_size() where we could be dereferencing a dma channel pointer when it is NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-11-08 16:11:21 -05:00
Dave Jiang	569410ca75	NTB: Use unique DMA channels for TX and RX Allocate two DMA channels, one for TX operation and one for RX operation, instead of having one DMA channel for everything. This provides slightly better performance, and also will make error handling cleaner later on. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-09-07 15:17:09 -04:00
Allen Hubbe	905921e748	NTB: Remove dma_sync_wait from ntb_async_rx The dma_sync_wait can hurt the performance of workloads mixed with both large and small frames. Large frames will be copied using the dma engine. Small frames will be copied by the cpu. The dma_sync_wait prevents the cpu and dma engine copying in parallel. In the period where the cpu is copying, the dma engine is stopped. The dma engine is not doing any useful work to copy large frames during that time, and the additional time to restart the dma engine for the next large frame. This will decrease the throughput for the portion of a workload with large frames. In the period where the dma engine is copying, the cpu is held up waiting for dma to complete. The small frames processing will be delayed until the dma is complete. The RX frames are completed in-order, and the processing of small frames takes very little time, so dma_sync_wait may have an insignificant impact on the respose time of frames. The more significant impact is to the system, because the delay in dma_sync_wait is implemented as busy non-blocking wait. This can prevent the delayed core from doing any useful work, even if it could be processing work for other drivers, unrelated to transport RX processing. After applying the earlier patch to fix out-of-order RX acknoledgement, the dma_sync_wait is no longer necessary. Remove it, so that cpu memcpy will proceed immediately for small frames, in parallel with ongoing dma for large frames. Do not hold up the cpu from doing work while dma is in progress. The prior fix will continue to ensure in-order completion of the RX frames to the upper layer, and in-order delivery of the RX acknoledgement. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-09-07 15:17:08 -04:00
Dave Jiang	d98ef99e37	NTB: Clean up QP stats info Make QP stats info more readable for debugging purposes. Also add an entry to indicate whether DMA is being used. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-09-07 15:17:08 -04:00
Dave Jiang	315100004f	NTB: Make the transport list in order of discovery The list should be added from the bottom and not the top in order to ensure the transport is provided in the same order to clients as ntb devices are discovered. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-09-07 15:17:08 -04:00
Dave Jiang	e74bfeedad	NTB: Add flow control to the ntb_netdev Right now if we push the NTB really hard, we start dropping packets due to not able to process the packets fast enough. We need to st:qop the upper layer from flooding us when that happens. A timer is necessary in order to restart the queue once the resource has been processed on the receive side. Due to the way NTB is setup, the resources on the tx side are tied to the processing of the rx side and there's no async way to know when the rx side has released those resources. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-09-07 15:17:08 -04:00
Allen Hubbe	30a4bb1e5a	NTB: Fix dereference before check Remove early dereference of a pointer that is checked later in the code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:22 -04:00
Allen Hubbe	8c9edf63e7	NTB: Fix zero size or integer overflow in ntb_set_mw A plain 32 bit integer will overflow for values over 4GiB. Change the plain integer size to the appropriate size type in ntb_set_mw. Change the type of the size parameter and two local variables used for size. Even if there is no overflow, a size of zero is invalid here. Reported-by: Juyoung Jung <jjung@micron.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:22 -04:00
Allen Hubbe	8b5a22d8f1	NTB: Schedule to receive on QP link up Schedule to receive on QP link up, to make sure that the doorbell is properly cleared for interrupts. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:22 -04:00
Dave Jiang	260bee9451	NTB: Fix oops in debugfs when transport is half-up When the remote side is not up, we do not have all the context for the transport, and that causes NULL ptr access. Have the debugfs reads check to see if transport is up before we make access. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:22 -04:00
Dave Jiang	c8650fd03d	NTB: Fix transport stats for multiple devices Currently the debugfs does not have files for all NTB transport queue pairs. When there are multiple NTBs present in a system, the QP names of the last transport clobber the names of previously added transport QPs. Only the last added QPs can be observed via debugfs. Create a directory per NTB transport to associate the QPs with that transport. Name the directory the same as the PCI device. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:21 -04:00
Allen Hubbe	da2e5ae561	NTB: Fix ntb_transport out-of-order RX update It was possible for a synchronous update of the RX index in the error case to get ahead of the asynchronous RX index update in the normal case. Change the RX processing to preserve an RX completion order. There were two error cases. First, if a buffer is not present to receive data, there would be no queue entry to preserve the RX completion order. Instead of dropping the RX frame, leave the RX frame in the ring. Schedule RX processing when RX entries are enqueued, in case there are RX frames waiting in the ring to be received. Second, if a buffer is too small to receive data, drop the frame in the ring, mark the RX entry as done, and indicate the error in the RX entry length. Check for a negative length in the receive callback in ntb_netdev, and count occurrences as rx_length_errors. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-08-09 16:32:21 -04:00
Dave Jiang	7eb387813d	NTB: Print driver name and version in module init Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2015-07-04 14:09:28 -04:00

1 2

98 Commits