-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmP2kaAACgkQiiy9cAdy
T1Eergv9FHVs7hS0anJF0xgRghR4+g0m5UUo08iJazgJdDgcS5JY+ZasIpYpEsG3
QmsIT33XVYZypXoOzjMSsPlwo6esTCJQScVLz85e4ebedCbCBDks+wVQcbfTzD5/
KrwmUoTBLU0L/ppFhqRk9k53nrSf1SXCWPthjdfWa3mTHdIVM4kQJruTWwUDiJXp
mdYwTx6FnTNer3QWetNzYOwdUgLu3rk0zLcBwQNCo6g5LOpA44iFfEAO4zeiOuZT
LMDPbDj0nWQyWPLLdcbtsn2laYyEBDBLZevLirSaqPQ/KCtGcw0mBt6dCAzg8/CM
ONqHHxdEpvPON8Sxujcn4CxpXhl0nCLwwtKtWU4rt7IevI9U+PynNl57TtJJ16/s
b3XD2QVbFjlcdAMTmArvqnogdzoC3mZu1R1IRs+jukhLAOqZiLN6o/E2HAllt47i
krzXeXIzQr10w9fnJ7LtIc/7IUFgtUfrOkg4TKyNcnRVHQaSSxv+JLRgqMPOr/M0
I7zt0G0j
=4hIT
-----END PGP SIGNATURE-----
Merge tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client updates from Steve French:
"The largest subset of this is from David Howells et al: making the
cifs/smb3 driver pass iov_iters down to the lowest layers, directly to
the network transport rather than passing lists of pages around,
helping multiple areas:
- Pin user pages, thereby fixing the race between concurrent DIO read
and fork, where the pages containing the DIO read buffer may end up
belonging to the child process and not the parent - with the result
that the parent might not see the retrieved data.
- cifs shouldn't take refs on pages extracted from non-user-backed
iterators (eg. KVEC). With these changes, cifs will apply the
appropriate cleanup.
- Making it easier to transition to using folios in cifs rather than
pages by dealing with them through BVEC and XARRAY iterators.
- Allowing cifs to use the new splice function
The remainder are:
- fixes for stable, including various fixes for uninitialized memory,
wrong length field causing mount issue to very old servers,
important directory lease fixes and reconnect fixes
- cleanups (unused code removal, change one element array usage, and
a change form strtobool to kstrtobool, and Kconfig cleanups)
- SMBDIRECT (RDMA) fixes including iov_iter integration and UAF fixes
- reconnect fixes
- multichannel fixes, including improving channel allocation (to
least used channel)
- remove the last use of lock_page_killable by moving to
folio_lock_killable"
* tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (46 commits)
update internal module version number for cifs.ko
cifs: update ip_addr for ses only for primary chan setup
cifs: use tcon allocation functions even for dummy tcon
cifs: use the least loaded channel for sending requests
cifs: DIO to/from KVEC-type iterators should now work
cifs: Remove unused code
cifs: Build the RDMA SGE list directly from an iterator
cifs: Change the I/O paths to use an iterator rather than a page list
cifs: Add a function to read into an iter from a socket
cifs: Add some helper functions
cifs: Add a function to Hash the contents of an iterator
cifs: Add a function to build an RDMA SGE list from an iterator
netfs: Add a function to extract an iterator into a scatterlist
netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE
splice: Export filemap/direct_splice_read()
iov_iter: Add a function to extract a page list from an iterator
iov_iter: Define flags to qualify page extraction.
splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE
splice: Add a func to do a splice from a buffered file without ITER_PIPE
...
Currently, the cifs I/O paths hand lists of pages from the VM interface
routines at the top all the way through the intervening layers to the
socket interface at the bottom.
This is a problem, however, for interfacing with netfslib which passes an
iterator through to the ->issue_read() method (and will pass an iterator
through to the ->issue_write() method in future). Netfslib takes over
bounce buffering for direct I/O, async I/O and encrypted content, so cifs
doesn't need to do that. Netfslib also converts IOVEC-type iterators into
BVEC-type iterators if necessary.
Further, cifs needs foliating - and folios may come in a variety of sizes,
so a page list pointing to an array of heterogeneous pages may cause
problems in places such as where crypto is done.
Change the cifs I/O paths to hand iov_iter iterators all the way through
instead.
Notes:
(1) Some old routines are #if'd out to be removed in a follow up patch so
as to avoid confusing diff, thereby making the diff output easier to
follow. I've removed functions that don't overlap with anything
added.
(2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and
rq_tailsz which describe the pages forming the buffer; instead there's
an rq_iter describing the source buffer and an rq_buffer which is used
to hold the buffer for encryption.
(3) struct cifs_readdata and cifs_writedata are similarly modified to
smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then
replaced with passing the iterator directly to the socket.
The iterators are stored in these structs so that they are persistent
and don't get deallocated when the function returns (unlike if they
were stack variables).
(4) Buffered writeback is overhauled, borrowing the code from the afs
filesystem to gather up contiguous runs of folios. The XARRAY-type
iterator is then used to refer directly to the pagecache and can be
passed to the socket to transmit data directly from there.
This includes:
cifs_extend_writeback()
cifs_write_back_from_locked_folio()
cifs_writepages_region()
cifs_writepages()
(5) Pages are converted to folios.
(6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type
iterator from an IOBUF/UBUF-type source iterator.
(7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page
fragments from the iterator into the scatterlists that the crypto
layer prefers.
(8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an
xarray, to use as a bounce buffer for encryption. An XARRAY-type
iterator can then be used to pass the bounce buffer to lower layers.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
We already upcall to resolve hostnames during reconnect by calling
reconn_set_ipaddr_from_hostname(), so there is no point in having a
worker to periodically call it.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Just have @skip set to 0 after first iterations of the two nested
loops.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.
Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Steve French <stfrench@microsoft.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
We can't rely on cifs_tcon::ses to refresh cached referral as the
server target might not respond to referrals, e.g. share is not hosted
in a DFS root server. Consider the following
mount //dom/dfs/link -> /root1/dfs/link -> /fs0/share
where fs0 can't get a referral for "/root1/dfs/link".
To simplify and fix the access of dfs root sessions, store the dfs
root session pointer directly to new sessions so making it easier to
select the appropriate ipc connection and use it for failover or cache
refresh.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
When matching DFS superblocks we can't rely on either the server's
address or tcon's UNC name from mount(2) as the existing servers and
tcons might be connected to somewhere else. Instead, check if
superblock is dfs, and if so, match its original source pathname with
the new mount's source pathname.
For DFS connections, instead of checking server's address, match its
referral path as it could be connected to different targets.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Introduce and export two helpers for getting session and tcon during
mount(2). Those will be used by dfs when retrieving sessions and
tcons separately while chasing referrals. Besides, export
cifs_mount_ctx structure as it will be used by dfs code as well.
No functional changes.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
./fs/cifs/cifsglob.h: linux/scatterlist.h is included more than once.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3459
Fixes: f7f291e14d ("cifs: fix oops during encryption")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When running xfstests against Azure the following oops occurred on an
arm64 system
Unable to handle kernel write to read-only memory at virtual address
ffff0001221cf000
Mem abort info:
ESR = 0x9600004f
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x0f: level 3 permission fault
Data abort info:
ISV = 0, ISS = 0x0000004f
CM = 0, WnR = 1
swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000294f3000
[ffff0001221cf000] pgd=18000001ffff8003, p4d=18000001ffff8003,
pud=18000001ff82e003, pmd=18000001ff71d003, pte=00600001221cf787
Internal error: Oops: 9600004f [#1] PREEMPT SMP
...
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
pc : __memcpy+0x40/0x230
lr : scatterwalk_copychunks+0xe0/0x200
sp : ffff800014e92de0
x29: ffff800014e92de0 x28: ffff000114f9de80 x27: 0000000000000008
x26: 0000000000000008 x25: ffff800014e92e78 x24: 0000000000000008
x23: 0000000000000001 x22: 0000040000000000 x21: ffff000000000000
x20: 0000000000000001 x19: ffff0001037c4488 x18: 0000000000000014
x17: 235e1c0d6efa9661 x16: a435f9576b6edd6c x15: 0000000000000058
x14: 0000000000000001 x13: 0000000000000008 x12: ffff000114f2e590
x11: ffffffffffffffff x10: 0000040000000000 x9 : ffff8000105c3580
x8 : 2e9413b10000001a x7 : 534b4410fb86b005 x6 : 534b4410fb86b005
x5 : ffff0001221cf008 x4 : ffff0001037c4490 x3 : 0000000000000001
x2 : 0000000000000008 x1 : ffff0001037c4488 x0 : ffff0001221cf000
Call trace:
__memcpy+0x40/0x230
scatterwalk_map_and_copy+0x98/0x100
crypto_ccm_encrypt+0x150/0x180
crypto_aead_encrypt+0x2c/0x40
crypt_message+0x750/0x880
smb3_init_transform_rq+0x298/0x340
smb_send_rqst.part.11+0xd8/0x180
smb_send_rqst+0x3c/0x100
compound_send_recv+0x534/0xbc0
smb2_query_info_compound+0x32c/0x440
smb2_set_ea+0x438/0x4c0
cifs_xattr_set+0x5d4/0x7c0
This is because in scatterwalk_copychunks(), we attempted to write to
a buffer (@sign) that was allocated in the stack (vmalloc area) by
crypt_message() and thus accessing its remaining 8 (x2) bytes ended up
crossing a page boundary.
To simply fix it, we could just pass @sign kmalloc'd from
crypt_message() and then we're done. Luckily, we don't seem to pass
any other vmalloc'd buffers in smb_rqst::rq_iov...
Instead, let's map the correct pages and offsets from vmalloc buffers
as well in cifs_sg_set_buf() and then avoiding such oopses.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
checkpatch showed formatting problems with extra spaces,
and extra semicolon and some missing blank lines in some
cifs headers.
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Germano Percossi <germano.percossi@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change notification is a commonly supported feature by most servers,
but the current ioctl to request notification when a directory is
changed does not return the information about what changed
(even though it is returned by the server in the SMB3 change
notify response), it simply returns when there is a change.
This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
information structure which includes the name of the file(s) that
changed and why. See MS-SMB2 2.2.35 for details on the individual
filter flags and the file_notify_information structure returned.
To use this simply pass in the following (with enough space
to fit at least one file_notify_information structure)
struct __attribute__((__packed__)) smb3_notify {
uint32_t completion_filter;
bool watch_tree;
uint32_t data_len;
uint8_t data[];
} __packed;
using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When creating inode for symlink, the client used to send below
requests to fill it in:
* create+query_info+close (STATUS_STOPPED_ON_SYMLINK)
* create(+reparse_flag)+query_info+close (set file attrs)
* create+ioctl(get_reparse)+close (query reparse tag)
and then for every access to the symlink dentry, the ->link() method
would send another:
* create+ioctl(get_reparse)+close (parse symlink)
So, in order to improve:
(i) Get rid of unnecessary roundtrips and then resolve symlinks as
follows:
* create+query_info+close (STATUS_STOPPED_ON_SYMLINK +
parse symlink + get reparse tag)
* create(+reparse_flag)+query_info+close (set file attrs)
(ii) Set the resolved symlink target directly in inode->i_link and
use simple_get_link() for ->link() to simply return it.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The struct sdesc is just a wrapper around shash_desc, with exact same
memory layout. Replace the hashing TFMs with shash_desc as it's what's
passed to the crypto API anyway.
Also remove the crypto_shash pointers as they can be accessed via
shash_desc->tfm (and are actually only used in the setkey calls).
Adapt cifs_{alloc,free}_hash functions to this change.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Detach the TFM name from a specific algorithm (AES-CCM) as
AES-GCM is also supported, making the name misleading.
s/ccmaesencrypt/enc/
s/ccmaesdecrypt/dec/
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
This wrapper structure will later be expanded to contain a list of
fids that are cached and not just the root fid.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Needed this for debugging a failing xfstest.
Also change camel case for "treeName" to "tree_name" in tcon struct.
Example trace output (from "trace-cmd record -e smb3_tdis*"):
umount-9718 [006] ..... 5909.780244: smb3_tdis_enter: xid=206 sid=0xcf38894e tid=0x3d0b8cf8 path=\\localhost\test
umount-9718 [007] ..... 5909.780878: smb3_tdis_done: xid=206 sid=0xcf38894e tid=0x3d0b8cf8
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB1 server's header_preamble_size is not 0, add use is_smb1 function
to simplify the code, no actual functional changes.
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
It's better to use MID_HEADER_SIZE because the unfolded expression
too long. No actual functional changes, minor readability improvement.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
It's better to use HEADER_PREAMBLE_SIZE because the unfolded expression
too long. No actual functional changes, minor readability improvement.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
This parameter is unused by the called function
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
and move the structure definitions into cached_dir.h
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Also rename crfid to cfid to have consistent naming for this variable.
This commit does not change any logic.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
During analysis of multichannel perf, it was seen that
the global locks cifs_tcp_ses_lock and GlobalMid_Lock, which
were shared between various data structures were causing a
lot of contention points.
With this change, we're breaking down the use of these locks
by introducing new locks at more granular levels. i.e.
server->srv_lock, ses->ses_lock and tcon->tc_lock to protect
the unprotected fields of server, session and tcon structs;
and server->mid_lock to protect mid related lists and entries
at server level.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Removed remaining warnings related to externs. These warnings
although harmless could be distracting e.g.
fs/cifs/cifsfs.c: note: in included file:
fs/cifs/cifsglob.h:1968:24: warning: symbol 'sesInfoAllocCount' was not declared. Should it be static?
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
The build warning:
warning: symbol 'cifs_tcp_ses_lock' was not declared. Should it be static?
can be distracting. Fix two of these.
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Remove warnings for five global variables. For example:
fs/cifs/cifsglob.h:1984:24: warning: symbol 'midCount' was not declared. Should it be static?
Also change them from camelCase (e.g. "midCount" to "mid_count")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, we only query the server for network interfaces
information at the time of mount, and never afterwards.
This can be a problem, especially for services like Azure,
where the IP address of the channel endpoints can change
over time.
With this change, we schedule a 600s polling of this info
from the server for each tree connect.
An alternative for periodic polling was to do this only at
the time of reconnect. But this could delay the reconnect
time slightly. Also, there are some challenges w.r.t how
we have cifs_reconnect implemented today.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
A server's published interface list can change over time, and needs
to be updated. We've storing iface_list as a simple array, which
makes it difficult to manipulate an existing list.
With this change, iface_list is modified into a linked list of
interfaces, which is kept sorted by speed.
Also added a reference counter for an iface entry, so that each
channel can maintain a backpointer to the iface and drop it
easily when needed.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled. This was causing the
following complaint[1] from gcc v12:
In file included from include/linux/string.h:253,
from include/linux/ceph/ceph_debug.h:7,
from fs/ceph/inode.c:2:
In function 'fortify_memset_chk',
inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
242 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode). The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.
Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &ctx->inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).
Most of the changes were done with:
perl -p -i -e 's/vfs_inode/netfs.inode/'g \
`git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`
Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.
Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].
Version #2:
- Fix a couple of missed name changes due to a disabled cifs option.
- Rename nfs_i_context to nfs_inode
- Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
structs.
[ This also undoes commit 507160f46c ("netfs: gcc-12: temporarily
disable '-Wattribute-warning' for now") that is no longer needed ]
Fixes: bc899ee1c8 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <smfrench@gmail.com>
cc: William Kucharski <william.kucharski@oracle.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should not be including unused smb20 specific code when legacy
support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned
off). For example smb2_operations and smb2_values aren't used
in that case. Over time we can move more and more SMB1/CIFS and SMB2.0
code into the insecure legacy ifdefs
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The srv_mutex is used during writeback so cifs should ensure that
allocations done when that mutex is held are done with GFP_NOFS, to
avoid having direct reclaim ending up waiting for the same mutex and
causing a deadlock. This is detected by lockdep with the splat below:
======================================================
WARNING: possible circular locking dependency detected
5.18.0 #70 Not tainted
------------------------------------------------------
kswapd0/49 is trying to acquire lock:
ffff8880195782e0 (&tcp_ses->srv_mutex){+.+.}-{3:3}, at: compound_send_recv
but task is already holding lock:
ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire
kmem_cache_alloc_trace
__request_module
crypto_alg_mod_lookup
crypto_alloc_tfm_node
crypto_alloc_shash
cifs_alloc_hash
smb311_crypto_shash_allocate
smb311_update_preauth_hash
compound_send_recv
cifs_send_recv
SMB2_negotiate
smb2_negotiate
cifs_negotiate_protocol
cifs_get_smb_ses
cifs_mount
cifs_smb3_do_mount
smb3_get_tree
vfs_get_tree
path_mount
__x64_sys_mount
do_syscall_64
entry_SYSCALL_64_after_hwframe
-> #0 (&tcp_ses->srv_mutex){+.+.}-{3:3}:
__lock_acquire
lock_acquire
__mutex_lock
mutex_lock_nested
compound_send_recv
cifs_send_recv
SMB2_write
smb2_sync_write
cifs_write
cifs_writepage_locked
cifs_writepage
shrink_page_list
shrink_lruvec
shrink_node
balance_pgdat
kswapd
kthread
ret_from_fork
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(&tcp_ses->srv_mutex);
lock(fs_reclaim);
lock(&tcp_ses->srv_mutex);
*** DEADLOCK ***
1 lock held by kswapd0/49:
#0: ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat
stack backtrace:
CPU: 2 PID: 49 Comm: kswapd0 Not tainted 5.18.0 #70
Call Trace:
<TASK>
dump_stack_lvl
dump_stack
print_circular_bug.cold
check_noncircular
__lock_acquire
lock_acquire
__mutex_lock
mutex_lock_nested
compound_send_recv
cifs_send_recv
SMB2_write
smb2_sync_write
cifs_write
cifs_writepage_locked
cifs_writepage
shrink_page_list
shrink_lruvec
shrink_node
balance_pgdat
kswapd
kthread
ret_from_fork
</TASK>
Fix this by using the memalloc_nofs_save/restore APIs around the places
where the srv_mutex is held. Do this in a wrapper function for the
lock/unlock of the srv_mutex, and rename the srv_mutex to avoid missing
call sites in the conversion.
Note that there is another lockdep warning involving internal crypto
locks, which was masked by this problem and is visible after this fix,
see the discussion in this thread:
https://lore.kernel.org/all/20220523123755.GA13668@axis.com/
Link: https://lore.kernel.org/r/CANT5p=rqcYfYMVHirqvdnnca4Mo+JQSw5Qu12v=kPfpk5yhhmg@mail.gmail.com/
Reported-by: Shyam Prasad N <nspmangalore@gmail.com>
Suggested-by: Lars Persson <larper@axis.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Some older servers seem to require the workstation name during ntlmssp
to be at most 15 chars (RFC1001 name length), so truncate it before
sending when using insecure dialects.
Link: https://lore.kernel.org/r/e6837098-15d9-acb6-7e34-1923cf8c6fe1@winds.org
Reported-by: Byron Stanoszek <gandalf@winds.org>
Tested-by: Byron Stanoszek <gandalf@winds.org>
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
This adds caching of the directory entries for a cached directory while we keep
a lease on the directory.
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After allowing channels to reconnect in parallel, it now
becomes important to take care that multiple processes do not
call negotiate/session setup in parallel on the same channel.
This change avoids that by marking a channel as "in_reconnect".
During session setup if the channel in question has this flag
set, we return immediately.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
ses->status today shares statusEnum with server->tcpStatus.
This has been confusing, and tcon->status has deviated to use
a new enum. Follow suit and use new enum for ses_status as well.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This only moves these definitions to come earlier in the file
but not change the definition itself.
This is done to reduce the amount of changes in future patches.
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJGhDkACgkQiiy9cAdy
T1EquQv/V05eD1EWZzW+Y5Q+cbYBPn8T3r6YqSw6hIvbgdF6W6U45UPyJ4ASHKvl
+MvTPSJzEzSWKYfcDryUBsa7aAXaekPpxW6uZk7jMRuVIfkannTV9E+rZItwC/dS
g8kDDjvcWrwN9iQUyVNX1JCybpq5YnwEIA5z0C8rpuCjDelNfK5DCaf02PweuRlY
3pDlj8Jy4sY8mBvqzFiWheY6Xc3pbvheDIvHEieaZpAyPwF7r1hmwvMDkzbJfPjV
Qrwcrwq2FahK4E98gJQZ5U0CeXvNPEHPcc8c4bAkRpnaa/v2oVSCW4FGjhA1Stp2
0APC+AsjkY95DJ0GHerGfH5G0z6FAbRJjyXtt1NTkyKavEQZOqoQvi5yM/iXUEoA
z+1bgN7s02IMLU15gLDilK6QObWtUwvNxuS19MQ80yFnqmjNNpSmRTfpwzDJQ6Lj
B6Yml8tIvVPLtmuwehhljffMUv9lrdElDDjT50yTn/CTkQYUMBejitMGu8G4YwZI
luAN1msJ
=bNGL
-----END PGP SIGNATURE-----
Merge tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- three fixes for big endian issues in how Persistent and Volatile file
ids were stored
- Various misc. fixes: including some for oops, 2 for ioctls, 1 for
writeback
- cleanup of how tcon (tree connection) status is tracked
- Four changesets to move various duplicated protocol definitions
(defined both in cifs.ko and ksmbd) into smbfs_common/smb2pdu.h
- important performance improvement to use cached handles in some key
compounding code paths (reduces numbers of opens/closes sent in some
workloads)
- fix to allow alternate DFS target to be used to retry on a failed i/o
* tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
cifs: prevent bad output lengths in smb2_ioctl_query_info()
smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
smb3: cleanup and clarify status of tree connections
smb3: move defines for query info and query fsinfo to smbfs_common
smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
[smb3] move more common protocol header definitions to smbfs_common
cifs: fix incorrect use of list iterator after the loop
ksmbd: store fids as opaque u64 integers
cifs: fix bad fids sent over wire
cifs: change smb2_query_info_compound to use a cached fid, if available
cifs: convert the path to utf16 in smb2_query_info_compound
cifs: writeback fix
cifs: do not skip link targets when an I/O fails
Currently the way the tid (tree connection) status is tracked
is confusing. The same enum is used for structs cifs_tcon
and cifs_ses and TCP_Server_info, but each of these three has
different states that they transition among. The current
code also unnecessarily uses camelCase.
Convert from use of statusEnum to a new tid_status_enum for
tree connections. The valid states for a tid are:
TID_NEW = 0,
TID_GOOD,
TID_EXITING,
TID_NEED_RECON,
TID_NEED_TCON,
TID_IN_TCON,
TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */
TID_IN_FILES_INVALIDATE
It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and
CifsInFilesInvalidate from the statusEnum used for session and
TCP_Server_Info since they are not relevant for those.
A follow on patch will fix the places where we use the
tcon->need_reconnect flag to be more consistent with the tid->status.
Also fixes a bug that was:
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The definitions for the ioctl SMB3 request and response as well
as length of various fields defined in the protocol documentation
were duplicated in fs/ksmbd and fs/cifs. Move these to the common
code in fs/smbfs_common/smb2pdu.h
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Change the cifs filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in cifs.
The following changes have been made:
(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.
(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.
For cifs, I've made it render the volume name string as:
"cifs,<ipaddress>,<sharename>"
where the sharename has '/' characters replaced with ';'.
This probably needs rethinking a bit as the total name could exceed
the maximum filename component length.
Further, the coherency data is currently just set to 0. It needs
something else doing with it - I wonder if it would suffice simply to
sum the resource_id, vol_create_time and vol_serial_number or maybe
hash them.
(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.
fscache_acquire_cookie() is passed the same keying and coherency
information as before.
(4) The functions to set/reset cookies are removed and
fscache_use_cookie() and fscache_unuse_cookie() are used instead.
fscache_use_cookie() is passed a flag to indicate if the cookie is
opened for writing. fscache_unuse_cookie() is passed updates for the
metadata if we changed it (ie. if the file was opened for writing).
These are called when the file is opened or closed.
(5) cifs_setattr_*() are made to call fscache_resize() to change the size
of the cache object.
(6) The functions to read and write data are stubbed out pending a
conversion to use netfslib.
Changes
=======
ver #8:
- Abstract cache invalidation into a helper function.
- Fix some checkpatch warnings[3].
ver #7:
- Removed the accidentally added-back call to get the super cookie in
cifs_root_iget().
- Fixed the right call to cifs_fscache_get_super_cookie() to take account
of the "-o fsc" mount flag.
ver #6:
- Moved the change of gfpflags_allow_blocking() to current_is_kswapd() for
cifs here.
- Fixed one of the error paths in cifs_atomic_open() to jump around the
call to use the cookie.
- Fixed an additional successful return in the middle of cifs_open() to
use the cookie on the way out.
- Only get a volume cookie (and thus inode cookies) when "-o fsc" is
supplied to mount.
ver #5:
- Fixed a couple of bits of cookie handling[2]:
- The cookie should be released in cifs_evict_inode(), not
cifsFileInfo_put_final(). The cookie needs to persist beyond file
closure so that writepages will be able to write to it.
- fscache_use_cookie() needs to be called in cifs_atomic_open() as it is
for cifs_open().
ver #4:
- Fixed the use of sizeof with memset.
- tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
ver #3:
- Canonicalise the cifs coherency data to make the cache portable.
- Set volume coherency data.
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- Upgraded to -rc4 to allow for upstream changes[1].
- fscache_acquire_volume() now returns errors.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1]
Link: https://lore.kernel.org/r/3419813.1641592362@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/CAH2r5muTanw9pJqzAHd01d9A8keeChkzGsCEH6=0rHutVLAF-A@mail.gmail.com/ [3]
Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/3462849.1641593783@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/1318953.1642024578@warthog.procyon.org.uk/ # v6
Signed-off-by: Steve French <stfrench@microsoft.com>
With the new multichannel logic, when a channel needs reconnection,
the tree connect and other channels can still be active.
This fix will handle cases of checking for channel reconnect,
when the tcon does not need reconnect.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Help userland apps to identify cifs and smb2 mounts.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If functions like cifs_negotiate_protocol, cifs_setup_session,
cifs_tree_connect are called in parallel on different channels,
each of these will be execute the requests. This maybe unnecessary
in some cases, and only the first caller may need to do the work.
This is achieved by having more states for the tcp/smb/tcon session
status fields. And tracking the state of reconnection based on the
state machine.
For example:
for tcp connections:
CifsNew/CifsNeedReconnect ->
CifsNeedNegotiate ->
CifsInNegotiate ->
CifsNeedSessSetup ->
CifsInSessSetup ->
CifsGood
for smb sessions:
CifsNew/CifsNeedReconnect ->
CifsGood
for tcon:
CifsNew/CifsNeedReconnect ->
CifsInFilesInvalidate ->
CifsNeedTcon ->
CifsInTcon ->
CifsGood
If any channel reconnect sees that it's in the middle of
transition to CifsGood, then they can skip the function.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
While checking/updating status for tcp ses, smb ses or tcon,
we take GlobalMid_Lock. This doesn't make any sense.
Replaced it with cifs_tcp_ses_lock.
Ideally, we should take a spin lock per struct.
But since tcp ses, smb ses and tcon objects won't add up to a lot,
I think there should not be too much contention.
Also, in few other places, these are checked without locking.
Added locking for these.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
With the new per-channel bitmask for reconnect, we have an option to
reconnect the tcp session associated with the channel without reconnecting
the smb session. i.e. if there are still channels to operate on, we can
continue to use the smb session and tcon.
However, there are cases where it makes sense to reconnect the smb session
even when there are active channels underneath. For example for
SMB session expiry.
With this patch, we'll have an option to do either, and use the correct
option for specific cases.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We use the concept of "binding" when one of the secondary channel
is in the process of connecting/reconnecting to the server. Till this
binding process completes, and the channel is bound to an existing session,
we redirect traffic from other established channels on the binding channel,
effectively blocking all traffic till individual channels get reconnected.
With my last set of commits, we can get rid of this binding serialization.
We now have a bitmap of connection states for each channel. We will use
this bitmap instead for tracking channel status.
Having a bitmap also now enables us to keep the session alive, as long
as even a single channel underneath is alive.
Unfortunately, this also meant that we need to supply the tcp connection
info for the channel during all negotiate and session setup functions.
These changes have resulted in a slightly bigger code churn.
However, I expect perf and robustness improvements in the mchan scenario
after this change.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>