Fix an uninitialised variable introduced in cifs.
Fixes: 3d78fe73fa ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
In the depths of the cifs RDMA code, extract part of an iov iterator
directly into an SGE list without going through an intermediate
scatterlist.
Note that this doesn't support extraction from an IOBUF- or UBUF-type
iterator (ie. user-supplied buffer). The assumption is that the higher
layers will extract those to a BVEC-type iterator first and do whatever is
required to stop the pages from going away.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
Link: https://lore.kernel.org/r/166697260361.61150.5064013393408112197.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732032518.3186319.1859601819981624629.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, the cifs I/O paths hand lists of pages from the VM interface
routines at the top all the way through the intervening layers to the
socket interface at the bottom.
This is a problem, however, for interfacing with netfslib which passes an
iterator through to the ->issue_read() method (and will pass an iterator
through to the ->issue_write() method in future). Netfslib takes over
bounce buffering for direct I/O, async I/O and encrypted content, so cifs
doesn't need to do that. Netfslib also converts IOVEC-type iterators into
BVEC-type iterators if necessary.
Further, cifs needs foliating - and folios may come in a variety of sizes,
so a page list pointing to an array of heterogeneous pages may cause
problems in places such as where crypto is done.
Change the cifs I/O paths to hand iov_iter iterators all the way through
instead.
Notes:
(1) Some old routines are #if'd out to be removed in a follow up patch so
as to avoid confusing diff, thereby making the diff output easier to
follow. I've removed functions that don't overlap with anything
added.
(2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and
rq_tailsz which describe the pages forming the buffer; instead there's
an rq_iter describing the source buffer and an rq_buffer which is used
to hold the buffer for encryption.
(3) struct cifs_readdata and cifs_writedata are similarly modified to
smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then
replaced with passing the iterator directly to the socket.
The iterators are stored in these structs so that they are persistent
and don't get deallocated when the function returns (unlike if they
were stack variables).
(4) Buffered writeback is overhauled, borrowing the code from the afs
filesystem to gather up contiguous runs of folios. The XARRAY-type
iterator is then used to refer directly to the pagecache and can be
passed to the socket to transmit data directly from there.
This includes:
cifs_extend_writeback()
cifs_write_back_from_locked_folio()
cifs_writepages_region()
cifs_writepages()
(5) Pages are converted to folios.
(6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type
iterator from an IOBUF/UBUF-type source iterator.
(7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page
fragments from the iterator into the scatterlists that the crypto
layer prefers.
(8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an
xarray, to use as a bounce buffer for encryption. An XARRAY-type
iterator can then be used to pass the bounce buffer to lower layers.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
If the MR allocate failed, the MR recovery work not initialized
and list not cleared. Then will be warning and UAF when release
the MR:
WARNING: CPU: 4 PID: 824 at kernel/workqueue.c:3066 __flush_work.isra.0+0xf7/0x110
CPU: 4 PID: 824 Comm: mount.cifs Not tainted 6.1.0-rc5+ #82
RIP: 0010:__flush_work.isra.0+0xf7/0x110
Call Trace:
<TASK>
__cancel_work_timer+0x2ba/0x2e0
smbd_destroy+0x4e1/0x990
_smbd_get_connection+0x1cbd/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
BUG: KASAN: use-after-free in smbd_destroy+0x4fc/0x990
Read of size 8 at addr ffff88810b156a08 by task mount.cifs/824
CPU: 4 PID: 824 Comm: mount.cifs Tainted: G W 6.1.0-rc5+ #82
Call Trace:
dump_stack_lvl+0x34/0x44
print_report+0x171/0x472
kasan_report+0xad/0x130
smbd_destroy+0x4fc/0x990
_smbd_get_connection+0x1cbd/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Allocated by task 824:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7a/0x90
_smbd_get_connection+0x1b6f/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 824:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x40
____kasan_slab_free+0x143/0x1b0
__kmem_cache_free+0xc8/0x330
_smbd_get_connection+0x1c6a/0x2110
smbd_get_connection+0x21/0x40
cifs_get_tcp_session+0x8ef/0xda0
mount_get_conns+0x60/0x750
cifs_mount+0x103/0xd00
cifs_smb3_do_mount+0x1dd/0xcb0
smb3_get_tree+0x1d5/0x300
vfs_get_tree+0x41/0xf0
path_mount+0x9b3/0xdd0
__x64_sys_mount+0x190/0x1d0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Let's initialize the MR recovery work before MR allocate to prevent
the warning, remove the MRs from the list to prevent the UAF.
Fixes: c739858334 ("CIFS: SMBD: Implement RDMA memory registration")
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If the MR allocate failed, the smb direct connection info is NULL,
then smbd_destroy() will directly return, then the connection info
will be leaked.
Let's set the smb direct connection info to the server before call
smbd_destroy().
Fixes: c739858334 ("CIFS: SMBD: Implement RDMA memory registration")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In smbd_destroy(), clear the server->smbd_conn pointer after freeing the
smbd_connection struct that it points to so that reconnection doesn't get
confused.
Fixes: 8ef130f9ec ("CIFS: SMBD: Implement function to destroy a SMB Direct connection")
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Long Li <longli@microsoft.com>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make the debug logging more consistent in formatting of addresses,
lengths, and bitfields.
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If/when an outgoing request contains more scatter/gather segments
than can be mapped in a single RDMA send work request, use smbdirect
fragments to send it in multiple packets.
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reduce client smbdirect max segment receive size to 1364 to match
protocol norms. Larger buffers are unnecessary and add significant
memory overhead.
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The client-side SMBDirect layer requires no more than 6 send SGEs
and 1 receive SGE. The previous default of 8 send and 8 receive
causes smbdirect to fail on the SoftiWARP (siw) provider, and
possibly others. Additionally, large numbers of SGEs reduces
performance significantly on adapter implementations.
Also correct the frmr page count comment (not an SGE count).
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The srv_mutex is used during writeback so cifs should ensure that
allocations done when that mutex is held are done with GFP_NOFS, to
avoid having direct reclaim ending up waiting for the same mutex and
causing a deadlock. This is detected by lockdep with the splat below:
======================================================
WARNING: possible circular locking dependency detected
5.18.0 #70 Not tainted
------------------------------------------------------
kswapd0/49 is trying to acquire lock:
ffff8880195782e0 (&tcp_ses->srv_mutex){+.+.}-{3:3}, at: compound_send_recv
but task is already holding lock:
ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire
kmem_cache_alloc_trace
__request_module
crypto_alg_mod_lookup
crypto_alloc_tfm_node
crypto_alloc_shash
cifs_alloc_hash
smb311_crypto_shash_allocate
smb311_update_preauth_hash
compound_send_recv
cifs_send_recv
SMB2_negotiate
smb2_negotiate
cifs_negotiate_protocol
cifs_get_smb_ses
cifs_mount
cifs_smb3_do_mount
smb3_get_tree
vfs_get_tree
path_mount
__x64_sys_mount
do_syscall_64
entry_SYSCALL_64_after_hwframe
-> #0 (&tcp_ses->srv_mutex){+.+.}-{3:3}:
__lock_acquire
lock_acquire
__mutex_lock
mutex_lock_nested
compound_send_recv
cifs_send_recv
SMB2_write
smb2_sync_write
cifs_write
cifs_writepage_locked
cifs_writepage
shrink_page_list
shrink_lruvec
shrink_node
balance_pgdat
kswapd
kthread
ret_from_fork
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&tcp_ses->srv_mutex);
lock(fs_reclaim);
lock(&tcp_ses->srv_mutex);
*** DEADLOCK ***
1 lock held by kswapd0/49:
#0: ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat
stack backtrace:
CPU: 2 PID: 49 Comm: kswapd0 Not tainted 5.18.0 #70
Call Trace:
<TASK>
dump_stack_lvl
dump_stack
print_circular_bug.cold
check_noncircular
__lock_acquire
lock_acquire
__mutex_lock
mutex_lock_nested
compound_send_recv
cifs_send_recv
SMB2_write
smb2_sync_write
cifs_write
cifs_writepage_locked
cifs_writepage
shrink_page_list
shrink_lruvec
shrink_node
balance_pgdat
kswapd
kthread
ret_from_fork
</TASK>
Fix this by using the memalloc_nofs_save/restore APIs around the places
where the srv_mutex is held. Do this in a wrapper function for the
lock/unlock of the srv_mutex, and rename the srv_mutex to avoid missing
call sites in the conversion.
Note that there is another lockdep warning involving internal crypto
locks, which was masked by this problem and is visible after this fix,
see the discussion in this thread:
https://lore.kernel.org/all/20220523123755.GA13668@axis.com/
Link: https://lore.kernel.org/r/CANT5p=rqcYfYMVHirqvdnnca4Mo+JQSw5Qu12v=kPfpk5yhhmg@mail.gmail.com/
Reported-by: Shyam Prasad N <nspmangalore@gmail.com>
Suggested-by: Lars Persson <larper@axis.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKRBV4ACgkQiiy9cAdy
T1FYcgv7BWeF/72rw2qxuLUnj9B2aCnjCkpb2r7sN0951gTgFV9Iw4Bg5KyCym1A
Pjl7H3hj0R/djIwzSTbPmsIUZxEzAB56MyKgaoBbkg0N0AfwHYqEOHpTa7c9NaqT
CkbgJxtqcFBl3uNLMW9qyAD7MFDqF8OkSFCv01HYUukaQKBgzUnuoLmhvNQYeN50
DhxSIk+6+ekyUpuTKitHclldbk8IbUDRO5jRZrhXjP7SObWID1EMVBz4QNyrw3Du
G3Mi4K/FbVkrHe4OTcyMMc4rTVbaOwaHJmvgBFM5Qb1buaplbGEo7lTxus0PUVzd
aWyaj2duXNuKjFZuov/ZCsnSJMvl2TG21Bku/uLNGKsnIQn7UhYCLcDyZa/UCnRE
zPd5M2PD/L8uKONSg/6IVlVIzNMmvYRpyqqGg/4CZpu1Qhs53MkLdnZqSB+NyzV7
O2I6CIGVbp64f8YyBFZ6bhdxBwyXeoiF3RkYeKYrtCp4Z0RfQYjyMb5t2NDcpVo/
gL0tho/Q
=bR0w
-----END PGP SIGNATURE-----
Merge tag '5.19-rc-smb3-client-fixes-updated' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client updates from Steve French:
- multichannel fixes to improve reconnect after network failure
- improved caching of root directory contents (extending benefit of
directory leases)
- two DFS fixes
- three fixes for improved debugging
- an NTLMSSP fix for mounts t0 older servers
- new mount parm to allow disabling creating sparse files
- various cleanup fixes and minor fixes pointed out by coverity
* tag '5.19-rc-smb3-client-fixes-updated' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
smb3: remove unneeded null check in cifs_readdir
cifs: fix ntlmssp on old servers
cifs: cache the dirents for entries in a cached directory
cifs: avoid parallel session setups on same channel
cifs: use new enum for ses_status
cifs: do not use tcpStatus after negotiate completes
smb3: add mount parm nosparse
smb3: don't set rc when used and unneeded in query_info_compound
smb3: check for null tcon
cifs: fix minor compile warning
Add various fsctl structs
Add defines for various newer FSCTLs
smb3: add trace point for oplock not found
cifs: return the more nuanced writeback error on close()
smb3: add trace point for lease not found issue
cifs: smbd: fix typo in comment
cifs: set the CREATE_NOT_FILE when opening the directory in use_cached_dir()
cifs: check for smb1 in open_cached_dir()
cifs: move definition of cifs_fattr earlier in cifsglob.h
cifs: print TIDs as hex
...
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There were two places where we weren't checking for error
(e.g. ERESTARTSYS) while waiting for rdma resolution.
Addresses-Coverity: 1462165 ("Unchecked return value")
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break/goto statements instead of
just letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Use pr_fmt to standardize all logging for fs/cifs.
Some logging output had no CIFS: specific prefix.
Now all output has one of three prefixes:
o CIFS:
o CIFS: VFS:
o Root-CIFS:
Miscellanea:
o Convert printks to pr_<level>
o Neaten macro definitions
o Remove embedded CIFS: prefixes from formats
o Convert "illegal" to "invalid"
o Coalesce formats
o Add missing '\n' format terminations
o Consolidate multiple cifs_dbg continuations into single calls
o More consistent use of upper case first word output logging
o Multiline statement argument alignment and wrapping
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Immediate packets should only be sent to peer when there are new
receive credits made available. New credits show up on freeing
receive buffer, not on receiving data.
Fix this by avoid unnenecessary work schedules.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When processing errors from ib_post_send(), the transport state needs to be
rolled back to the condition before the error.
Refactor the old code to make it easy to roll back on IB errors, and fix this.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recevie credits should be updated before sending the packet, not
before a work is scheduled. Also, the value needs roll back if
something fails and cannot send.
Signed-off-by: Long Li <longli@microsoft.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Sometimes the remote peer may return more send credits than the send queue
depth. If all the send credits are used to post senasd, we may overflow the
send queue.
Fix this by checking the send queue size before posting a send.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
As an optimization, SMBD tries to track two types of packets: packets with
payload and without payload. There is no obvious benefit or performance gain
to separately track two types of packets.
Just treat them as pending packets and merge the tracking code.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When a RDMA packet is received and server is extending send credits, we should
check and unblock senders immediately in IRQ context. Doing it in a worker
queue causes unnecessary delay and doesn't save much CPU on the receive path.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The packet size needs to take account of SMB2 header size and possible
encryption header size. This is only done when signing is used and it is for
RDMA send/receive, not read/write.
Also remove the dead SMBD code in smb2_negotiate_r(w)size.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
It's not necessary to queue invalidated memory registration to work queue, as
all we need to do is to unmap the SG and make it usable again. This can save
CPU cycles in normal data paths as memory registration errors are rare and
normally only happens during reconnection.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The transport should return this error so the upper layer will reconnect.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Log these activities to help production support.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
While it's not friendly to fail user processes that issue more iovs
than we support, at least we should return the correct error code so the
user process gets a chance to retry with smaller number of iovs.
Signed-off-by: Long Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Send and Receive completion is handled on a single CPU selected at
the time each Completion Queue is allocated. Typically this is when
an initiator instantiates an RDMA transport, or when a target
accepts an RDMA connection.
Some ULPs cannot open a connection per CPU to spread completion
workload across available CPUs and MSI vectors. For such ULPs,
provide an API that allows the RDMA core to select a completion
vector based on the device's complement of available comp_vecs.
ULPs that invoke ib_alloc_cq() with only comp_vector 0 are converted
to use the new API so that their completion workloads interfere less
with each other.
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <linux-cifs@vger.kernel.org>
Cc: <v9fs-developer@lists.sourceforge.net>
Link: https://lore.kernel.org/r/20190729171923.13428.52555.stgit@manet.1015granger.net
Signed-off-by: Doug Ledford <dledford@redhat.com>
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [kishon] [vijay] [abraham]
[i] [kishon]@[ti] [com] this program is distributed in the hope that
it will be useful but without any warranty without even the implied
warranty of merchantability or fitness for a particular purpose see
the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [graeme] [gregory]
[gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
[kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
[hk] [hemahk]@[ti] [com] this program is distributed in the hope
that it will be useful but without any warranty without even the
implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When sending data, use the DMA_TO_DEVICE to map buffers. Also log the number
of requests in a compounding request from upper layer.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
To support compounding, __smb_send_rqst() now sends an array of requests to
the transport layer.
Change smbd_send() to take an array of requests, and send them in as few
packets as possible.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Failure to send a packet doesn't mean it's a permanent failure, it can't be
returned to user process. This I/O should be retried or failed based on
server packet response and transport health. This logic is handled by the
upper layer.
Give this decision to upper layer.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When packets are waiting for outbound I/O and interrupted, return the
proper error code to user process.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Now upper layer is handling the transport shutdown and reconnect, remove
the code that handling transport shutdown on RDMA disconnect.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
On transport recoonect, upper layer CIFS code destroys the current
transport and then recoonect. This code path is not used by SMBD, in that
SMBD destroys its transport on RDMA disconnect notification independent of
CIFS upper layer behavior.
This approach adds some costs to SMBD layer to handle transport shutdown
and restart, and to deal with several racing conditions on reconnecting
transport.
Re-work this code path by introducing a new smbd_destroy. This function is
called form upper layer to ask SMBD to destroy the transport. SMBD will no
longer need to destroy the transport by itself while worrying about data
transfer is in progress. The upper layer guarantees the transport is
locked.
change log:
v2: fix build errors when CONFIG_CIFS_SMB_DIRECT is not configured
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
a trivial patch that replaces all use of snprintf with scnprintf.
scnprintf() is generally seen as a safer function to use than
snprintf for many use cases.
In our case, there is no actual difference between the two since we never
look at the return value. Thus we did not have any of the bugs that
scnprintf protects against and the patch does nothing.
However, for people reading our code it will be a receipt that we
have done our due dilligence and checked our code for this type of bugs.
See the presentation "Making C Less Dangerous In The Linux Kernel"
at this years LCA
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make all the required change to start use the ib_device_ops structure.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull AFS updates from Al Viro:
"AFS series, with some iov_iter bits included"
* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
missing bits of "iov_iter: Separate type from direction and use accessor functions"
afs: Probe multiple fileservers simultaneously
afs: Fix callback handling
afs: Eliminate the address pointer from the address list cursor
afs: Allow dumping of server cursor on operation failure
afs: Implement YFS support in the fs client
afs: Expand data structure fields to support YFS
afs: Get the target vnode in afs_rmdir() and get a callback on it
afs: Calc callback expiry in op reply delivery
afs: Fix FS.FetchStatus delivery from updating wrong vnode
afs: Implement the YFS cache manager service
afs: Remove callback details from afs_callback_break struct
afs: Commit the status on a new file/dir/symlink
afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
afs: Don't invoke the server to read data beyond EOF
afs: Add a couple of tracepoints to log I/O errors
afs: Handle EIO from delivery function
afs: Fix TTL on VL server and address lists
afs: Implement VL server rotation
afs: Improve FS server rotation error handling
...
It is not necessary to deregister a memory registration after it has been
successfully invalidated.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Use accessor functions to access an iterator's type and direction. This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.
Signed-off-by: David Howells <dhowells@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
L47IdXo=
=sJkt
-----END PGP SIGNATURE-----
Merge tag 'v4.18' into rdma.git for-next
Resolve merge conflicts from the -rc cycle against the rdma.git tree:
Conflicts:
drivers/infiniband/core/uverbs_cmd.c
- New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
- Merge removal of file->ucontext in for-next with new code in -rc
drivers/infiniband/core/uverbs_main.c
- for-next removed code from ib_uverbs_write() that was modified
in for-rc
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
SMB1 mounting broke in commit 35e2cc1ba7
("cifs: Use correct packet length in SMB2_TRANSFORM header")
Fix it and also rename smb2_rqst_len to smb_rqst_len
to make it less unobvious that the function is also called from
CIFS/SMB1
Good job by Paulo reviewing and cleaning up Ronnie's original patch.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In smb3_init_transform_rq(), 'orig_len' was only counting the request
length, but forgot to count any data pages in the request.
Writing or creating files with the 'seal' mount option was broken.
In addition, do some code refactoring by exporting smb2_rqst_len() to
calculate the appropriate packet size and avoid duplicating the same
calculation all over the code.
The start of the io vector is either the rfc1002 length (4 bytes) or a
SMB2 header which is always > 4. Use this fact to check and skip the
rfc1002 length if requested.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change code to pass the correct page offset during memory registration for
RDMA read/write.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
RDMA recv function needs to place data to the correct place starting at
page offset.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The RDMA send function needs to look at offset in the request pages, and
send data starting from there.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>