Commit Graph

32 Commits

Author SHA1 Message Date
Linus Torvalds 9f4b9beeb9 24 ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmM/K50ACgkQiiy9cAdy
 T1H+JAv+KOk1oTGTDKiY/+1aEl/G1SxpgTnaemzK8U95nN2yCmddjxhwsoTI9e68
 lS7C+f0M3VN6X3S3OZoQXI4+B/VEODTR9yF9J30LULBw3rF3qJsGGllzXhnOIGOB
 bFFBgqTxq5mK0k9QmAyrf6dvEKfkxaGAXza3sGVFB6JxQTeEhhgrNjG70NqFw+0M
 rFEByqi2DMAvHBIDoDVqydaah+VTBJLfa6vfXOC+MBVu/qekyB0gKnZ3pGDKMWgq
 uPS6DaSYXlezuPuAKB8upLcSQewH3W+fqiGz9pimimWPJQXeHbfkqiuD/Gah/Flp
 lMQG0jziPclt08Ygts2wUvx503OUCHy9JAPfdOxpxNy3UOqovY+mu2K+Kbj/ZfTB
 Ta0vifoSCxA3YgIzVViqL3aq4zfKIHwHiDmJWa1zpTcSZ+QQh7zIDHFFg9UbzfZH
 zDH7l40NTIlc1Zh8ddX3+kJCzKI5OLJ+0ToyoayGXzfYTUR/gta/R6Ox4hzYF8KB
 ovGHvOnX
 =zHi1
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd updates from Steve French:

 - RDMA (smbdirect) fixes

 - fixes for SMB3.1.1 POSIX Extensions (especially for id mapping)

 - various casemapping fixes for mount and lookup

 - UID mapping fixes

 - fix confusing error message

 - protocol negotiation fixes, including NTLMSSP fix

 - two encryption fixes

 - directory listing fix

 - some cleanup fixes

* tag '6.1-rc-ksmbd-fixes' of git://git.samba.org/ksmbd: (24 commits)
  ksmbd: validate share name from share config response
  ksmbd: call ib_drain_qp when disconnected
  ksmbd: make utf-8 file name comparison work in __caseless_lookup()
  ksmbd: Fix user namespace mapping
  ksmbd: hide socket error message when ipv6 config is disable
  ksmbd: reduce server smbdirect max send/receive segment sizes
  ksmbd: decrease the number of SMB3 smbdirect server SGEs
  ksmbd: Fix wrong return value and message length check in smb2_ioctl()
  ksmbd: set NTLMSSP_NEGOTIATE_SEAL flag to challenge blob
  ksmbd: fix encryption failure issue for session logoff response
  ksmbd: fix endless loop when encryption for response fails
  ksmbd: fill sids in SMB_FIND_FILE_POSIX_INFO response
  ksmbd: set file permission mode to match Samba server posix extension behavior
  ksmbd: change security id to the one samba used for posix extension
  ksmbd: update documentation
  ksmbd: casefold utf-8 share names and fix ascii lowercase conversion
  ksmbd: port to vfs{g,u}id_t and associated helpers
  ksmbd: fix incorrect handling of iterate_dir
  MAINTAINERS: remove Hyunchul Lee from ksmbd maintainers
  MAINTAINERS: Add Tom Talpey as ksmbd reviewer
  ...
2022-10-07 08:19:26 -07:00
Atte Heikkilä dbab80e207 ksmbd: make utf-8 file name comparison work in __caseless_lookup()
Case-insensitive file name lookups with __caseless_lookup() use
strncasecmp() for file name comparison. strncasecmp() assumes an
ISO8859-1-compatible encoding, which is not the case here as UTF-8
is always used. As such, use of strncasecmp() here produces correct
results only if both strings use characters in the ASCII range only.
Fix this by using utf8_strncasecmp() if CONFIG_UNICODE is set. On
failure or if CONFIG_UNICODE is not set, fallback to strncasecmp().
Also, as we are adding an include for `linux/unicode.h', include it
in `fs/ksmbd/connection.h' as well since it should be explicit there.

Signed-off-by: Atte Heikkilä <atteh.mailbox@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-05 01:15:44 -05:00
Al Viro c22180a5e2 ksmbd: constify struct path
... in particular, there should never be a non-const pointers to
any file->f_path.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-05 01:15:37 -05:00
Al Viro 369c1634cc ksmbd: don't open-code %pD
a bunch of places used %pd with file->f_path.dentry; shorter (and saner)
way to spell that is %pD with file...

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-05 01:15:37 -05:00
Al Viro 25885a35a7 Change calling conventions for filldir_t
filldir_t instances (directory iterators callbacks) used to return 0 for
"OK, keep going" or -E... for "stop".  Note that it's *NOT* how the
error values are reported - the rules for those are callback-dependent
and ->iterate{,_shared}() instances only care about zero vs. non-zero
(look at emit_dir() and friends).

So let's just return bool ("should we keep going?") - it's less confusing
that way.  The choice between "true means keep going" and "true means
stop" is bikesheddable; we have two groups of callbacks -
	do something for everything in directory, until we run into problem
and
	find an entry in directory and do something to it.

The former tended to use 0/-E... conventions - -E<something> on failure.
The latter tended to use 0/1, 1 being "stop, we are done".
The callers treated anything non-zero as "stop", ignoring which
non-zero value did they get.

"true means stop" would be more natural for the second group; "true
means keep going" - for the first one.  I tried both variants and
the things like
	if allocation failed
		something = -ENOMEM;
		return true;
just looked unnatural and asking for trouble.

[folded suggestion from Matthew Wilcox <willy@infradead.org>]
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-17 17:25:04 -04:00
Linus Torvalds eb555cb5b7 12 server fixes, including for oopses, memory leaks and adding a channel lock
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmLwixsACgkQiiy9cAdy
 T1FT6Av+NY7R2f7/vDRzX/8u1fFXmyQDU2/dl3S0ysqqFEYE5hnHgbFtVP1IYwId
 /IAV8R6E/YMm8Nkbkf173EqHI8E9QNUSngTCk1bcvsiNSW4z3QOsujs9B6oTUeXM
 ARthU4eFJQW0wcTyjElVcm59I/jdtpQKaoVDQ/uXlCjPMT+0BUHS4mFRJkAx6DrX
 7wOO0vJ/WCqO2u0rSvK4unOZsvhrKHibtPMvQSiV6k3a+LVCJ5/5lwBpENKK4Mdw
 n4LeNhSDtm6HE6WDFIxSj008WPX7CF3JvX+zvZJEsB7uFNry/GVkWySPLqVYUmtp
 aSftdnWnw0Mv9AG3L4OMzyCQQP89ccoV81d6lyyJEBaGiOFhH8L1v6b7qAStCZue
 zyyFWIVUXnstQKwDY8QWLKjmi7H+I1drFExxn0vABInPcaijQE8Mct0fSt015Tc1
 EuCsO1JRyo+zn0mx7a1L80kHUH+yvHjKkfbINvPSa+qTFy21bgQZcG5CWSPpjadr
 4zibpKko
 =cDaA
 -----END PGP SIGNATURE-----

Merge tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd updates from Steve French:

 - fixes for memory access bugs (out of bounds access, oops, leak)

 - multichannel fixes

 - session disconnect performance improvement, and session register
   improvement

 - cleanup

* tag '5.20-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix heap-based overflow in set_ntacl_dacl()
  ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT
  ksmbd: prevent out of bound read for SMB2_WRITE
  ksmbd: fix use-after-free bug in smb2_tree_disconect
  ksmbd: fix memory leak in smb2_handle_negotiate
  ksmbd: fix racy issue while destroying session on multichannel
  ksmbd: use wait_event instead of schedule_timeout()
  ksmbd: fix kernel oops from idr_remove()
  ksmbd: add channel rwlock
  ksmbd: replace sessions list in connection with xarray
  MAINTAINERS: ksmbd: add entry for documentation
  ksmbd: remove unused ksmbd_share_configs_cleanup function
2022-08-08 20:15:13 -07:00
Namjae Jeon 8f0541186e ksmbd: fix heap-based overflow in set_ntacl_dacl()
The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute
under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase
trigger the following overflow.

[ 4712.003781] ==================================================================
[ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190

[ 4712.003813] CPU: 0 PID: 4190 Comm: kworker/0:0 Not tainted 5.19.0-rc5 #1
[ 4712.003850] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[ 4712.003867] Call Trace:
[ 4712.003870]  <TASK>
[ 4712.003873]  dump_stack_lvl+0x49/0x5f
[ 4712.003935]  print_report.cold+0x5e/0x5cf
[ 4712.003972]  ? ksmbd_vfs_get_sd_xattr+0x16d/0x500 [ksmbd]
[ 4712.003984]  ? cmp_map_id+0x200/0x200
[ 4712.003988]  ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004000]  kasan_report+0xaa/0x120
[ 4712.004045]  ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004056]  kasan_check_range+0x100/0x1e0
[ 4712.004060]  memcpy+0x3c/0x60
[ 4712.004064]  build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004076]  ? parse_sec_desc+0x580/0x580 [ksmbd]
[ 4712.004088]  ? ksmbd_acls_fattr+0x281/0x410 [ksmbd]
[ 4712.004099]  smb2_query_info+0xa8f/0x6110 [ksmbd]
[ 4712.004111]  ? psi_group_change+0x856/0xd70
[ 4712.004148]  ? update_load_avg+0x1c3/0x1af0
[ 4712.004152]  ? asym_cpu_capacity_scan+0x5d0/0x5d0
[ 4712.004157]  ? xas_load+0x23/0x300
[ 4712.004162]  ? smb2_query_dir+0x1530/0x1530 [ksmbd]
[ 4712.004173]  ? _raw_spin_lock_bh+0xe0/0xe0
[ 4712.004179]  handle_ksmbd_work+0x30e/0x1020 [ksmbd]
[ 4712.004192]  process_one_work+0x778/0x11c0
[ 4712.004227]  ? _raw_spin_lock_irq+0x8e/0xe0
[ 4712.004231]  worker_thread+0x544/0x1180
[ 4712.004234]  ? __cpuidle_text_end+0x4/0x4
[ 4712.004239]  kthread+0x282/0x320
[ 4712.004243]  ? process_one_work+0x11c0/0x11c0
[ 4712.004246]  ? kthread_complete_and_exit+0x30/0x30
[ 4712.004282]  ret_from_fork+0x1f/0x30

This patch add the buffer validation for security descriptor that is
stored by malformed SMB2_SET_INFO_HE command. and allocate large
response buffer about SMB2_O_INFO_SECURITY file info class.

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-04 09:51:38 -05:00
Namjae Jeon af7c39d971 ksmbd: fix racy issue while destroying session on multichannel
After multi-channel connection with windows, Several channels of
session are connected. Among them, if there is a problem in one channel,
Windows connects again after disconnecting the channel. In this process,
the session is released and a kernel oop can occurs while processing
requests to other channels. When the channel is disconnected, if other
channels still exist in the session after deleting the channel from
the channel list in the session, the session should not be released.
Finally, the session will be released after all channels are disconnected.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-07-31 23:14:32 -05:00
Christian Brauner 0c5fd887d2
acl: move idmapped mount fixup into vfs_{g,s}etxattr()
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.

I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.

Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  -> __vfs_getxattr()
                  |     -> handler->get == ovl_posix_acl_xattr_get()
                  |        -> ovl_xattr_get()
                  |           -> vfs_getxattr()
                  |              -> __vfs_getxattr()
                  |                 -> handler->get() /* lower filesystem callback */
                  |> posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&init_user_ns, 4);
                              4 = mapped_kuid_fs(&init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                       ________________________________
                            -> cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -> vfs_getxattr_alloc()                                                                        |
                                  -> handler->get == ovl_other_xattr_get()                                                    |
                                     -> vfs_getxattr()                                                                        |
                                        -> xattr_getsecurity()                                                                |
                                           -> security_inode_getsecurity()                                                    |
                                              -> cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                   -> vfs_getxattr()
                      -> xattr_getsecurity()
                         -> security_inode_getsecurity()                                                _______________________________
                            -> cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -> vfs_getxattr_alloc()                                                                                |
                                  -> handler->get == ovl_other_xattr_get()                                                            |
                                    |-> vfs_getxattr()                                                                                |
                                        -> xattr_getsecurity()                                                                        |
                                           -> security_inode_getsecurity()                                                            |
                                              -> cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -> vfs_getxattr_alloc()
                                                    -> handler->get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.

The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().

To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().

Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs->creator_cred): 0:10000000:65536

        sys_getxattr()
        -> path_getxattr()
           -> getxattr()
              -> do_getxattr()
                  |> vfs_getxattr()
                  |  |> __vfs_getxattr()
                  |  |  -> handler->get == ovl_posix_acl_xattr_get()
                  |  |     -> ovl_xattr_get()
                  |  |        -> vfs_getxattr()
                  |  |           |> __vfs_getxattr()
                  |  |           |  -> handler->get() /* lower filesystem callback */
                  |  |           |> posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |> posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |> posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -> generic_permission()
           -> acl_permission_check()
              -> check_acl()
                 -> get_acl()
                    -> inode->i_op->get_acl() == ovl_get_acl()
                        > get_acl() /* on the underlying filesystem)
                          ->inode->i_op->get_acl() == /*lower filesystem callback */
                 -> posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.

The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.

Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-07-15 22:08:59 +02:00
Amir Goldstein 868f9f2f8e vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.

Before commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems.  After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.

Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.

Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().

This patch restores some cross-filesystem copy restrictions that existed
prior to commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().

Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user.  Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e.  nfsv3).

Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb.  This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().

nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.

fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.

Fixes: 5dae222a5f ("vfs: allow copy_file_range to copy across devices")
Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: 64bf5ff58d ("vfs: no fallback for ->copy_file_range")
Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
Reported-by: He Zhe <zhe.he@windriver.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-30 15:16:38 -07:00
Jason A. Donenfeld 067baa9a37 ksmbd: use vfs_llseek instead of dereferencing NULL
By not checking whether llseek is NULL, this might jump to NULL. Also,
it doesn't check FMODE_LSEEK. Fix this by using vfs_llseek(), which
always does the right thing.

Fixes: f441584858 ("cifsd: add file operations")
Cc: stable@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-25 19:52:49 -05:00
Namjae Jeon 18e39fb960 ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA
generic/091, 263 test failed since commit f66f8b94e7 ("cifs: when
extending a file with falloc we should make files not-sparse").
FSCTL_ZERO_DATA sets the range of bytes to zero without extending file
size. The VFS_FALLOCATE_FL_KEEP_SIZE flag should be used even on
non-sparse files.

Cc: stable@vger.kernel.org
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-23 23:30:46 -05:00
Namjae Jeon 50f500b7f6 ksmbd: remove filename in ksmbd_file
If the filename is change by underlying rename the server, fp->filename
and real filename can be different. This patch remove the uses of
fp->filename in ksmbd and replace it with d_path().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-14 20:56:13 -05:00
Christoph Hellwig 322cbb50de block: remove genhd.h
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it.  So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Ronnie Sahlberg 26a2787d45 ksmbd: Use the SMB3_Create definitions from the shared
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-11 19:22:58 -06:00
Namjae Jeon f7db8fd03a ksmbd: add validation in smb2_ioctl
Add validation for request/response buffer size check in smb2_ioctl and
fsctl_copychunk() take copychunk_ioctl_req pointer and the other arguments
instead of smb2_ioctl_req structure and remove an unused smb2_ioctl_req
argument of fsctl_validate_negotiate_info.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-13 23:37:18 -05:00
Hyunchul Lee 265fd1991c ksmbd: use LOOKUP_BENEATH to prevent the out of share access
instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.

ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Boehme <slow@samba.org>
Tested-by: Steve French <smfrench@gmail.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-24 21:25:23 -05:00
Namjae Jeon 4ea477988c ksmbd: remove follow symlinks support
Use  LOOKUP_NO_SYMLINKS flags for default lookup to prohibit the middle of
symlink component lookup and remove follow symlinks parameter support.
We re-implement it as reparse point later.

Test result:
smbclient -Ulinkinjeon%1234 //172.30.1.42/share -c
"get hacked/passwd passwd"
NT_STATUS_OBJECT_NAME_NOT_FOUND opening remote file \hacked\passwd

Cc: Ralph Böhme <slow@samba.org>
Cc: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-22 23:37:38 -05:00
Christian Brauner 0e844efebd ksmbd: fix translation in acl entries
The ksmbd server performs translation of posix acls to smb acls.
Currently the translation is wrong since the idmapping of the mount is
used to map the ids into raw userspace ids but what is relevant is the
user namespace of ksmbd itself. The user namespace of ksmbd itself which
is the initial user namespace. The operation is similar to asking "What
*ids would a userspace process see given that k*id in the relevant user
namespace?". Before the final translation we need to apply the idmapping
of the mount in case any is used. Add two simple helpers for ksmbd.

Cc: Steve French <stfrench@microsoft.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:44 -05:00
Christian Brauner da1e7ada5b ksmbd: fix lookup on idmapped mounts
It's great that the new in-kernel ksmbd server will support idmapped
mounts out of the box! However, lookup is currently broken. Lookup
helpers such as lookup_one_len() call inode_permission() internally to
ensure that the caller is privileged over the inode of the base dentry
they are trying to lookup under. So the permission checking here is
currently wrong.

Linux v5.15 will gain a new lookup helper lookup_one() that does take
idmappings into account. I've added it as part of my patch series to
make btrfs support idmapped mounts. The new helper is in linux-next as
part of David's (Sterba) btrfs for-next branch as commit
c972214c133b ("namei: add mapping aware lookup helper").

I've said it before during one of my first reviews: I would very much
recommend adding fstests to [1]. It already seems to have very
rudimentary cifs support. There is a completely generic idmapped mount
testsuite that supports idmapped mounts.

[1]: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Steve French <stfrench@microsoft.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: David Sterba <dsterba@suse.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-03 23:29:44 -05:00
Namjae Jeon 777cad1604 ksmbd: remove select FS_POSIX_ACL in Kconfig
ksmbd is forcing to turn on FS_POSIX_ACL in Kconfig to use vfs acl
functions(posix_acl_alloc, get_acl, set_posix_acl). OpenWRT and other
platform doesn't use acl and this config is disable by default in
kernel. This patch use IS_ENABLED() to know acl config is enable and use
acl function if it is enable.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-08-13 08:18:10 +09:00
Namjae Jeon 78ad2c277a ksmbd: fix memory leak in ksmbd_vfs_get_sd_xattr()
Add free acl.sd_buf and n.data on error handling in
ksmbd_vfs_get_sd_xattr().

Reported-by: Coverity Scan <scan-admin@coverity.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-13 17:22:47 +09:00
Hyunchul Lee 45a64e8b08 ksmbd: uninterruptible wait for a file being unlocked
the wait can be canceled by SMB2_CANCEL, SMB2_CLOSE,
SMB2_LOGOFF, disconnection or shutdown, we don't have
to use wait_event_interruptible.

And this remove the warning from Coverity:

CID 1502834 (#1 of 1): Unused value (UNUSED_VALUE)
returned_value: Assigning value from ksmbd_vfs_posix_lock_wait(flock)
to err here, but that stored value is overwritten before it can be used.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-10 16:23:56 +09:00
Dan Carpenter 07781de905 ksmbd: use kasprintf() in ksmbd_vfs_xattr_stream_name()
Simplify the code by using kasprintf().  This also silences a Smatch
warning:

    fs/ksmbd/vfs.c:1725 ksmbd_vfs_xattr_stream_name()
    warn: inconsistent indenting

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-10 09:29:13 +09:00
Hyunchul Lee 465d720485 ksmbd: call mnt_user_ns once in a function
Avoid calling mnt_user_ns() many time in
a function.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Christian Brauner <christian@brauner.io>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-05 09:22:49 +09:00
Hyunchul Lee af34983e83 ksmbd: add user namespace support
For user namespace support, call vfs functions
with struct user_namespace got from struct path.

This patch have been tested mannually as below.

Create an id-mapped mount using the mount-idmapped utility
(https://github.com/brauner/mount-idmapped).
$ mount-idmapped --map-mount b:1003:1002:1 /home/foo <EXPORT DIR>/foo
(the user, "foo" is 1003, and the user "bar" is 1002).

And  mount the export directory using cifs with the user, "bar".
succeed to create/delete/stat/read/write files and directory in
the <EXPORT DIR>/foo. But fail with a bind mount for /home/foo.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-02 16:27:10 +09:00
Namjae Jeon 12202c0594 ksmbd: use ksmbd_vfs_lock_parent to get stable parent dentry
Use ksmbd_vfs_lock_parent to get stable parent dentry and remove
PARENT_INODE macro.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-29 15:07:51 +09:00
Namjae Jeon ab0b263b74 ksmbd: opencode to remove FP_INODE macro
Opencode to remove FP_INODE macro.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-29 15:07:48 +09:00
Namjae Jeon 493fa2fbe4 ksmbd: fix dentry racy with rename()
Using ->d_name can be broken due to races with rename().
So use %pd with ->d_name to print filename and In other cases,
use it under ->d_lock.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-29 15:07:44 +09:00
Hyunchul Lee 6c5e36d13e ksmbd: set MAY_* flags together with open flags
set MAY_* flags together with open flags and
remove ksmbd_vfs_inode_permission().

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28 16:28:41 +09:00
Hyunchul Lee 333111a6dc ksmbd: factor out a ksmbd_vfs_lock_parent helper
Factor out a self-contained helper to
get stable parent dentry.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28 16:28:37 +09:00
Namjae Jeon 1a93084b9a ksmbd: move fs/cifsd to fs/ksmbd
Move fs/cifsd to fs/ksmbd and rename the remaining cifsd name to ksmbd.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-28 16:28:31 +09:00