Commit Graph

64143 Commits

Author SHA1 Message Date
Xiubo Li e64f44a884 ceph: skip checking caps when session reconnecting and releasing reqs
It make no sense to check the caps when reconnecting to mds. And
for the async dirop caps, they will be put by its _cb() function,
so when releasing the requests, it will make no sense too.

URL: https://tracker.ceph.com/issues/45635
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Xiubo Li ea8412b284 ceph: make sure mdsc->mutex is nested in s->s_mutex to fix dead lock
send_mds_reconnect takes the s_mutex while the mdsc->mutex is already
held. That inverts the locking order documented in mds_client.h. Drop
the mdsc->mutex, acquire the s_mutex and then reacquire the mdsc->mutex
to prevent a deadlock.

URL: https://tracker.ceph.com/issues/45609
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Luis Henriques 878dabb641 ceph: don't return -ESTALE if there's still an open file
Similarly to commit 03f219041f ("ceph: check i_nlink while converting
a file handle to dentry"), this fixes another corner case with
name_to_handle_at/open_by_handle_at.  The issue has been detected by
xfstest generic/467, when doing:

 - name_to_handle_at("/cephfs/myfile")
 - open("/cephfs/myfile")
 - unlink("/cephfs/myfile")
 - sync; sync;
 - drop caches
 - open_by_handle_at()

The call to open_by_handle_at should not fail because the file hasn't been
deleted yet (only unlinked) and we do have a valid handle to it.  -ESTALE
shall be returned only if i_nlink is 0 *and* i_count is 1.

This patch also makes sure we have LINK caps before checking i_nlink.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Luis Henriques dffdcd7145 ceph: allow rename operation under different quota realms
Returning -EXDEV when trying to 'mv' files/directories from different
quota realms results in copy+unlink operations instead of the faster
CEPH_MDS_OP_RENAME.  This will occur even when there aren't any quotas
set in the destination directory, or if there's enough space left for
the new file(s).

This patch adds a new helper function to be called on rename operations
which will allow these operations if they can be executed.  This patch
mimics userland fuse client commit b8954e5734b3 ("client:
optimize rename operation under different quota root").

Since ceph_quota_is_same_realm() is now called only from this new
helper, make it static.

URL: https://tracker.ceph.com/issues/44791
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Luis Henriques daa668fbac ceph: normalize 'delta' parameter usage in check_quota_exceeded
Function check_quota_exceeded() uses delta parameter only for the
QUOTA_CHECK_MAX_BYTES_OP operation.  Using this parameter also for
MAX_FILES will makes the code cleaner and will be required to support
cross-quota-tree renames.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Jeff Layton 829ad4db95 ceph: ceph_kick_flushing_caps needs the s_mutex
The mdsc->cap_dirty_lock is not held while walking the list in
ceph_kick_flushing_caps, which is not safe.

ceph_early_kick_flushing_caps does something similar, but the
s_mutex is held while it's called and I think that guards against
changes to the list.

Ensure we hold the s_mutex when calling ceph_kick_flushing_caps,
and add some clarifying comments.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:53 +02:00
Jeff Layton d67c72e6cc ceph: request expedited service on session's last cap flush
When flushing a lot of caps to the MDS's at once (e.g. for syncfs),
we can end up waiting a substantial amount of time for MDS replies, due
to the fact that it may delay some of them so that it can batch them up
together in a single journal transaction. This can lead to stalls when
calling sync or syncfs.

What we'd really like to do is request expedited service on the _last_
cap we're flushing back to the server. If the CHECK_CAPS_FLUSH flag is
set on the request and the current inode was the last one on the
session->s_cap_dirty list, then mark the request with
CEPH_CLIENT_CAPS_SYNC.

Note that this heuristic is not perfect. New inodes can race onto the
list after we've started flushing, but it does seem to fix some common
use cases.

URL: https://tracker.ceph.com/issues/44744
Reported-by: Jan Fajerski <jfajerski@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 1cf03a68e7 ceph: convert mdsc->cap_dirty to a per-session list
This is a per-sb list now, but that makes it difficult to tell when
the cap is the last dirty one associated with the session. Switch
this to be a per-session list, but continue using the
mdsc->cap_dirty_lock to protect the lists.

This list is only ever walked in ceph_flush_dirty_caps, so change that
to walk the sessions array and then flush the caps for inodes on each
session's list.

If the auth cap ever changes while the inode has dirty caps, then
move the inode to the appropriate session for the new auth_cap. Also,
ensure that we never remove an auth cap while the inode is still on the
s_cap_dirty list.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Yan, Zheng 6f05b30ea0 ceph: reset i_requested_max_size if file write is not wanted
write can stuck at waiting for larger max_size in following sequence of
events:

- client opens a file and writes to position 'A' (larger than unit of
  max size increment)
- client closes the file handle and updates wanted caps (not wanting
  file write caps)
- client opens and truncates the file, writes to position 'A' again.

At the 1st event, client set inode's requested_max_size to 'A'. At the
2nd event, mds removes client's writable range, but client does not reset
requested_max_size. At the 3rd event, client does not request max size
because requested_max_size is already larger than 'A'.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 88828190f0 ceph: throw a warning if we destroy session with mutex still locked
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton dc3da0461c ceph: fix potential race in ceph_check_caps
Nothing ensures that session will still be valid by the time we
dereference the pointer. Take and put a reference.

In principle, we should always be able to get a reference here, but
throw a warning if that's ever not the case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 4fb5dda39c ceph: document what protects i_dirty_item and i_flushing_item
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 7833323363 ceph: don't take i_ceph_lock in handle_cap_import
Just take it before calling it. This means we have to do a couple of
minor in-memory operations under the spinlock now, but those shouldn't
be an issue.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 7391fba267 ceph: don't release i_ceph_lock in handle_cap_trunc
There's no reason to do this here. Just have the caller handle it.
Also, add a lockdep assertion.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton d7dbfb4f2b ceph: add comments for handle_cap_flush_ack logic
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 681ac63488 ceph: split up __finish_cap_flush
This function takes a mdsc argument or ci argument, but if both are
passed in, it ignores the ci arg. Fortunately, nothing does that, but
there's no good reason to have the same function handle both cases.

Also, get rid of some branches and just use |= to set the wake_* vals.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:52 +02:00
Jeff Layton 0a454bdd50 ceph: reorganize __send_cap for less spinlock abuse
Get rid of the __releases annotation by breaking it up into two
functions: __prep_cap which is done under the spinlock and __send_cap
that is done outside it. Add new fields to cap_msg_args for the wake
boolean and old_xattr_buf pointer.

Nothing checks the return value from __send_cap, so make it void
return.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:51 +02:00
Xiubo Li 70c948206f ceph: add metadata perf metric support
Add a new "r_ended" field to struct ceph_mds_request and use that to
maintain the average latency of MDS requests.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:51 +02:00
Xiubo Li 97e27aaa9a ceph: add read/write latency metric support
Calculate the latency for OSD read requests. Add a new r_end_stamp
field to struct ceph_osd_request that will hold the time of that
the reply was received. Use that to calculate the RTT for each call,
and divide the sum of those by number of calls to get averate RTT.

Keep a tally of RTT for OSD writes and number of calls to track average
latency of OSD writes.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:51 +02:00
Xiubo Li 1af16d547f ceph: add caps perf metric for each superblock
Count hits and misses in the caps cache. If the client has all of
the necessary caps when a task needs references, then it's counted
as a hit. Any other situation is a miss.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:51 +02:00
Xiubo Li f9009efac4 ceph: add dentry lease metric support
For dentry leases, only count the hit/miss info triggered from the vfs
calls. For the cases like request reply handling and ceph_trim_dentries,
ignore them.

For now, these are only viewable using debugfs. Future patches will
allow the client to send the stats to the MDS.

The output looks like:

item          total           miss            hit
-------------------------------------------------
d_lease       11              7               141

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-06-01 13:22:51 +02:00
Linus Torvalds e2fce151d2 Cache tiering and cap handling fixups, both marked for stable.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl7RKm4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi6BJB/4pz7N1K3sqs3OXHsHHnMnpTmxV5lU3
 4pXDivwESypxJKBDZ96qgSNMGgL9XpxChfA/LCYVy92LvIbjr9vrUh9386Q2arqw
 nRe4kTiN7Y8HkLb47GmqzCQdxgGVC35OZJZQzdM5y9rVEH9nbEUHWhsvCHYUR8Cb
 Ndm7hT6QzLRTQzlUhu0lPfLc84R0Hl5aFJNkA7enbXL7s9yfTYRf9+zcl+8VOI09
 X01OOxsOVNoQUzhTn2Y+SDFLr5N7CNtW7UN17S6sCiiA0XgodxeWmnxl2aaVMG+z
 VbsXQPr9ma4gYaD7BjzqaPEQqpgoTrmNqPkrzSzZbFHRc+GC3S5PiLwU
 =TOVq
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.7-rc8' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Cache tiering and cap handling fixups, both marked for stable"

* tag 'ceph-for-5.7-rc8' of git://github.com/ceph/ceph-client:
  ceph: flush release queue when handling caps for unknown inode
  libceph: ignore pool overlay and cache logic on redirects
2020-05-29 13:59:54 -07:00
Linus Torvalds 835e36b119 Fix the previous, flawed gfs2_find_jhead commit
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl7RI6UUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpV8A/8DxTfUVzH+S8fXS6nEfJ2Q8soLeGa
 JE8ZalUmcc8G6R/hPekZbcV4NVN03PlfSMh6Jnr5o5Zz6mDsksC2Lh+i0legsm2Y
 /QPj5N/vnNbEANBtz2BBMRl8VRWyqh9wBP0UuErv+bw39EyUNRVKvRkw04gxYMdw
 kpHl8EFICsIqWcXM+1dzWTTVFlz4dXRFgglMOoFYBdx45H1uUUNx5FiU2WRsS107
 WLsWQ3znEK6iqmYfG0KLkmIuEQKUodfQ4IJX5BkrNyck+1UbSQkWJFlBsMzLOiMX
 XmmnSyGmfk8FOvb1NXk7BzlZBPSF1xt55QIeLjd0sWIyEAnqx4lTz/CRA7WCBkXo
 qCLD2EaUi1RQUNItGjnq2hPmtv6hlA2zusvh5kC2I6ojTJaYcU5Sr0jFARzONbCE
 dKJLmh3RoVA63tt4lFF7DYqWI+AXt1j50aq4CbV0GzoGYaQ4UHMtlyTUFEiVtoGO
 A4tYI23UCvJe0ozduCbSkAv8o9zHEyboIBMDlPbASKLtLMkwTxOQgJ+vyjljHgS+
 vDWvRw7auosQoLxF31bhyJlYWirNUz0SNKVveGawpBQxfXNr3CKubiDORJwZIp5s
 vZLvQ0f4CqC0sx/25cnqDRFRJaNZcxvEXNBUMuN9v2713IzU5+WzRlmykxj17yfb
 B4gBml3MurCwPFA=
 =rKvo
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.7-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:
 "Fix the previous, flawed gfs2_find_jhead commit"

* tag 'gfs2-v5.7-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Even more gfs2_find_jhead fixes
2020-05-29 13:58:13 -07:00
Andreas Gruenbacher 20be493b78 gfs2: Even more gfs2_find_jhead fixes
Fix several issues in the previous gfs2_find_jhead fix:
* When updating @blocks_submitted, @block refers to the first block block not
  submitted yet, not the last block submitted, so fix an off-by-one error.
* We want to ensure that @blocks_submitted is far enough ahead of @blocks_read
  to guarantee that there is in-flight I/O.  Otherwise, we'll eventually end up
  waiting for pages that haven't been submitted, yet.
* It's much easier to compare the number of blocks added with the number of
  blocks submitted to limit the maximum bio size.
* Even with bio chaining, we can keep adding blocks until we reach the maximum
  bio size, as long as we stop at a page boundary.  This simplifies the logic.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2020-05-29 17:00:24 +02:00
Linus Torvalds 75caf310d1 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  include/asm-generic/topology.h: guard cpumask_of_node() macro argument
  fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()
  mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()
  mm,thp: stop leaking unreleased file pages
  mm/z3fold: silence kmemleak false positives of slots
2020-05-28 13:04:25 -07:00
Alexander Potapenko 1d605416fb fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()
KMSAN reported uninitialized data being written to disk when dumping
core.  As a result, several kilobytes of kmalloc memory may be written
to the core file and then read by a non-privileged user.

Reported-by: sam <sunhaoyl@outlook.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com
Link: https://github.com/google/kmsan/issues/76
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28 11:35:40 -07:00
Linus Torvalds b0c3ba31be \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl7OoFIACgkQnJ2qBz9k
 QNm4Ewf/VeATmggs4mjetbrqmnr2sIdBxWHIq7Pv1MT9Wrz1WENGwi18yy36CfJU
 5Rign2pa00SIHj1qZsiwcoxFIU7D4WNG36I//aOZelrDp/atsfSAufXN4sZk1KyG
 PO5nVmAH0FkmyIJMDap7EG4jKnK+YSkuF56DLybbZqEwdkHMS2RMwWCmP6M/UjPW
 AdseMjEOnpGzXi2xah4TtEODCKe7koi/TMIrQxBdvd3UGn5VyonTilSTMUtieZic
 qfpotjyRPKQ3RjEQAwvX11jljTUjmdJeGz08PHTHAL3kGwduvFA73TUPuWd5Tz3X
 mAEsmBZNg38WxQYGdCshAvPbSHJFQw==
 =VeY8
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fanotify FAN_DIR_MODIFY disabling from Jan Kara:
 "A single patch that disables FAN_DIR_MODIFY support that was merged in
  this merge window.

  When discussing further functionality we realized it may be more
  logical to guard it with a feature flag or to call things slightly
  differently (or maybe not) so let's not set the API in stone for now."

* tag 'fsnotify_for_v5.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: turn off support for FAN_DIR_MODIFY
2020-05-27 11:03:24 -07:00
Linus Torvalds 3301f6ae2d Merge branch 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Reverted stricter synchronization for cgroup recursive stats which
   was prepping it for event counter usage which never got merged. The
   change was causing performation regressions in some cases.

 - Restore bpf-based device-cgroup operation even when cgroup1 device
   cgroup is disabled.

 - An out-param init fix.

* 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  device_cgroup: Cleanup cgroup eBPF device filter code
  xattr: fix uninitialized out-param
  Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
2020-05-27 10:58:19 -07:00
Amir Goldstein f17936993a fanotify: turn off support for FAN_DIR_MODIFY
FAN_DIR_MODIFY has been enabled by commit 44d705b037 ("fanotify:
report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are
planning further extensions to the fanotify API and during that we
realized that FAN_DIR_MODIFY may behave slightly differently to be more
consistent with extensions we plan. So until we finalize these
extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to
userland.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-27 18:55:54 +02:00
Jeff Layton fb33c114d3 ceph: flush release queue when handling caps for unknown inode
It's possible for the VFS to completely forget about an inode, but for
it to still be sitting on the cap release queue. If the MDS sends the
client a cap message for such an inode, it just ignores it today, which
can lead to a stall of up to 5s until the cap release queue is flushed.

If we get a cap message for an inode that can't be located, then go
ahead and flush the cap release queue.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/45532
Fixes: 1e9c2eb681 ("ceph: delete stale dentry when last reference is dropped")
Reported-and-Tested-by: Andrej Filipčič <andrej.filipcic@ijs.si>
Suggested-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-27 13:03:57 +02:00
Linus Torvalds caffb99b69 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
    Bhowmik.

 2) Nexthop attributes aren't being checked properly because of
    mis-initialized iterator, from David Ahern.

 3) Revert iop_idents_reserve() change as it caused performance
    regressions and was just working around what is really a UBSAN bug
    in the compiler. From Yuqi Jin.

 4) Read MAC address properly from ROM in bmac driver (double iteration
    proceeds past end of address array), from Jeremy Kerr.

 5) Add Microsoft Surface device IDs to r8152, from Marc Payne.

 6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
    Boris Sukholitko.

 7) Fix ACK discard behavior in rxrpc, from David Howells.

 8) Preserve flow hash across packet scrubbing in wireguard, from Jason
    A. Donenfeld.

 9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
    Dumazet.

10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.

11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.

12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.

13) Fix use after free in mlxsw driver, from Jiri Pirko.

14) Order kTLS key destruction properly in mlx5 driver, from Tariq
    Toukan.

15) Check devm_platform_ioremap_resource() return value properly in
    several drivers, from Tiezhu Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  net: smsc911x: Fix runtime PM imbalance on error
  net/mlx4_core: fix a memory leak bug.
  net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
  net: phy: mscc: fix initialization of the MACsec protocol mode
  net: stmmac: don't attach interface until resume finishes
  net: Fix return value about devm_platform_ioremap_resource()
  net/mlx5: Fix error flow in case of function_setup failure
  net/mlx5e: CT: Correctly get flow rule
  net/mlx5e: Update netdev txq on completions during closure
  net/mlx5: Annotate mutex destroy for root ns
  net/mlx5: Don't maintain a case of del_sw_func being null
  net/mlx5: Fix cleaning unmanaged flow tables
  net/mlx5: Fix memory leak in mlx5_events_init
  net/mlx5e: Fix inner tirs handling
  net/mlx5e: kTLS, Destroy key object after destroying the TIS
  net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
  net/mlx5: Avoid processing commands before cmdif is ready
  net/mlx5: Fix a race when moving command interface to events mode
  net/mlx5: Add command entry handling completion
  rxrpc: Fix a memory leak in rxkad_verify_response()
  ...
2020-05-23 17:16:18 -07:00
David Howells 8a1d24e1cc rxrpc: Fix a warning
Fix a warning due to an uninitialised variable.

le included from ../fs/afs/fs_probe.c:11:
../fs/afs/fs_probe.c: In function 'afs_fileserver_probe_result':
../fs/afs/internal.h:1453:2: warning: 'rtt_us' may be used uninitialized in this function [-Wmaybe-uninitialized]
 1453 |  printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__)
      |  ^~~~~~
../fs/afs/fs_probe.c:35:15: note: 'rtt_us' was declared here

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-23 00:31:39 +01:00
David S. Miller 4629ed2e48 RxRPC fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7FQH4ACgkQ+7dXa6fL
 C2u05BAAlilXCYGZU23/yQmMxxkmcG+jW+oDV9ySNDl6iJT+OtKxHEGReDD4D2f0
 rhaBivgGcOnZy89AGjNrSVROGlwSBXl7ArJcjsfsx8AuNzxHUHQKWlW/k8n87qEt
 NCTze7f65IT6NowgYAFgJn5kIpY/9iKuNiCf6NGL3Z35wqxPvwNs6AQSGM495uvB
 el/ddkr8QzzjI9Ejsgzj94x4DAOjk4T4WzfWMAgyr1OEqz6vKNKkCwSKPySOsQAK
 72JRaGhWA9rfAOkA7nAZpnjHdfFYnkFBOVQzmswOJYRYe3D/QY5D9PUlGIQ5OSjL
 yV5YOi/+AUrSif79NfEYXga0r/NFJMFqBg2zo/eiSrhfZZFZMDcagnGhzpGjbYF1
 IaeIu4q/MQOQybi8m1GJhvFfPOhdKRn731jlsUvEoxK0TonSu/u64eus+qelQxOd
 uiIcu/kLxfPZSznUd8cXZ+Pffce0uBIRWq0nRQZ703TyHY+/gYo7ZGHr/FZNKaK4
 lRNP4Nu3goLQCI40R7y7USnpX+kWfd4mYC9zl+VBSXG1JymYbOezXYrNATBCqpo6
 9VoYtqDdo8ESksFUBqM7fGRDZ20nah6KdRGmnrPU+rpODHZEZmNN7D/rayU31wua
 VIbVw1WluvSVnQ8+b1BwJwvhJQ4CazyGcnbxDqx1zd07EjUiiJI=
 =jJvy
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-fixes-20200520' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fix retransmission timeout and ACK discard

Here are a couple of fixes and an extra tracepoint for AF_RXRPC:

 (1) Calculate the RTO pretty much as TCP does, rather than making
     something up, including an initial 4s timeout (which causes return
     probes from the fileserver to fail if a packet goes missing), and add
     backoff.

 (2) Fix the discarding of out-of-order received ACKs.  We mustn't let the
     hard-ACK point regress, nor do we want to do unnecessary
     retransmission because the soft-ACK list regresses.  This is not
     trivial, however, due to some loose wording in various old protocol
     specs, the ACK field that should be used for this sometimes has the
     wrong information in it.

 (3) Add a tracepoint to log a discarded ACK.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 15:53:30 -07:00
Linus Torvalds 444565650a io_uring-5.7-2020-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7IByoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpigmEACWJoK7zk5OK2RhavpzOsb2SDu8nz/YAbUe
 +R6tXAjwe4Z7lVVa+FW/fmN9/mQcjyYRIbbG564IFs5fe6hPUoOjUzHqvGTOFLHd
 Fw8mjVKgWjWAE5GdoX6ATauLhVwwjnImej1PNfO/J5y29o0SQksP8MbM0eGuuNx1
 piqxBj0/3h3YyPn1GeJmqxwwcsFhzHDqk7fbkfbQokZk+7SPiKpqWgJBa7AKSlNC
 N0WTluT4UOummQZw1RFynPfA4cCuX6XHVgWAa9h7vrJHXigvuMWqLaHG+MBFqeKu
 xD6PPnaCnMwcLRe4T2sJvtjxmNSdyr15Q2kGkIi/RhohSIn4u/y8jEA6wTprCP48
 rDi30dn1o2LwUj2S1NO3YCOV8jIKWUguztEvKiAXmjf4KDZIDd4/OwrFsJdb4vg9
 EuK86SEwXbvFHf9nu1M7pHlGThKfQi0CiK6C6M7Qb/kOthio72wwZ46gGkwLDk5z
 DZWHymHBhQw/z1c20loX7pBvFIzLzbuYUThf23UegPzXVqqQfBkqs4BGFcOGuqy6
 yfEYF/MAX/O/TQgm2dDQHrhl05AevLu/UQXMXZ8Ha6OrmlC4C2qu3Te/iZO8FUew
 YIx5H5XmBh93McjpmJ8VCn7CjE+y/ufNTMdvm8WzCyAIfH40gfcyLangpre26QoJ
 CCAARffXrQ==
 =ZYUy
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A small collection of small fixes that should go into this release:

   - Two fixes for async request preparation (Pavel)

   - Busy clear fix for SQPOLL (Xiaoguang)

   - Don't use kiocb->private for O_DIRECT buf index, some file systems
     use it (Bijan)

   - Kill dead check in io_splice()

   - Ensure sqo_wait is initialized early

   - Cancel task_work if we fail adding to original process

   - Only add (IO)pollable requests to iopoll list, fixing a regression
     in this merge window"

* tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block:
  io_uring: reset -EBUSY error when io sq thread is waken up
  io_uring: don't add non-IO requests to iopoll pending list
  io_uring: don't use kiocb.private to store buf_index
  io_uring: cancel work if task_work_add() fails
  io_uring: remove dead check in io_splice()
  io_uring: fix FORCE_ASYNC req preparation
  io_uring: don't prepare DRAIN reqs twice
  io_uring: initialize ctx->sqo_wait earlier
2020-05-22 11:12:30 -07:00
Namjae Jeon 907fa89325 exfat: add the dummy mount options to be backward compatible with staging/exfat
As Ubuntu and Fedora release new version used kernel version equal to or
higher than v5.4, They started to support kernel exfat filesystem.

Linus reported a mount error with new version of exfat on Fedora:

        exfat: Unknown parameter 'namecase'

This is because there is a difference in mount option between old
staging/exfat and new exfat.  And utf8, debug, and codepage options as
well as namecase have been removed from new exfat.

This patch add the dummy mount options as deprecated option to be
backward compatible with old one.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-21 16:40:11 -07:00
Linus Torvalds 57f1b0cf2d Fix regression in ext4's FIEMAP handling introduced in v5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7Gh44ACgkQ8vlZVpUN
 gaM9TAgAkthbnWUb3uT7/Nx9PHtT5X5wZthMRCGpa0wlSvy51gwhi/8kVxw214Pn
 Z0Rlcopbx6gmWplbvVUCiHCgR/QMASaL3mQwmLTjTs1+fweNedrgPwTg6u7ZNaJe
 pXgUMdr/FSnAQdnQElAll7GdfN9+FpPzmsaXzu9uQUYtaPKDx4dv0GKzLgyxRRJn
 2OL4uUFPk0Q+hw8zGnloav6+rx9uw/Sees8tAUZgj5E2AjnqvKUrxB+UN481vk5T
 TUyhCK9S8SX+eWoL53dqL8QoTa9v5ovyrK/UNbLX8M8UPa5O8mIVNqES11htKzLu
 h9EhtiJCaAqEH5K/BgCh+qMgABLF6g==
 =hK/Y
 -----END PGP SIGNATURE-----

Merge tag 'fiemap-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix regression in ext4's FIEMAP handling introduced in v5.7-rc1"

* tag 'fiemap-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix fiemap size checks for bitmap files
  ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
2020-05-21 11:37:20 -07:00
Linus Torvalds fea371e259 This pull request contains the following bug fixes for UBI and UBIFS:
- Correctly set next cursor for detailed_erase_block_info debugfs file
 - Don't use crypto_shash_descsize() for digest size in UBIFS
 - Remove broken lazytime support from UBIFS
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl7Fh08WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW2WD/428LjXh+24Y3rekfnCRXG5w+es
 yITAfhOmNuzn2vjS1UvCD0HsoBaS/LYbjuaceoyfXF9BG5mcrRTjFH7dVEEWFGDZ
 YeRvBFkyt4xBEJtrY/6MW35KPRtnCp4Jau9HR9M5RCcQ5xzOeGtw0r/JMdZe56Av
 zc2mLnZag1x5NyS4TvS30nCgj5pxVbO2bdAkyULJwBfPYs0C3TKeIul/4vjRi+57
 PjyIUSR7CxpsOJde0tMjDvf23ewn1IUEW+YnewP1qk36ijRw1M6C90ERr4CU9BM5
 YTEfjsxAheCItSf8r+BC70gaPBQPADtvHzPFqs9yNMSsLHYdOkkvqT8Bpwisj76d
 1zL45DjZZ8UxC3HfSMFPl/dYDWvfddpffNwrimeltoAzzejI/Wk8AX0VqH1IQ3Z1
 zDbz0ixP21ADATvrHUxr7UsoeEU9havGV+2sg+4wSU1aLtKIZUTjceizjkTN+9oB
 ntHLuv6cS2iop22iSbJGClOv2TjpBlGQNwMDQ7TdD1a0QqxTSPRiguMmf/mDpQa/
 MgQGAO6xS5NKRNiEbifniiCugLqpUQBHBPyn+q+4unmfK5sPzzLdpb3vpc0XNmbm
 WgwfuMZdfmK0jO27P1/MRG6LUGxXKh5arsi6JrUJVIsdxzV3bdc2xBjkUFOOS/tH
 W7fn4QS+WmbPVm09Jg==
 =eCh7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI and UBIFS fixes from Richard Weinberger:

 - Correctly set next cursor for detailed_erase_block_info debugfs file

 - Don't use crypto_shash_descsize() for digest size in UBIFS

 - Remove broken lazytime support from UBIFS

* tag 'for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: Fix seq_file usage in detailed_erase_block_info debugfs file
  ubifs: fix wrong use of crypto_shash_descsize()
  ubifs: remove broken lazytime support
2020-05-20 13:07:01 -07:00
Linus Torvalds 8e2b7f634a overlayfs fixes for 5.7-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXsU6SgAKCRDh3BK/laaZ
 PABRAP9MCZz/CLH2sEqHqH9KQHScNc4uf4bReiCU1hrLs7PbYwD/Y+vbRMffki7I
 B/gt0Dg4kGxG5CV+ckeZK0+p2NWUUgQ=
 =PPLW
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix two bugs introduced in this cycle and one introduced in v5.5"

* tag 'ovl-fixes-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: potential crash in ovl_fid_to_fh()
  ovl: clear ATTR_OPEN from attr->ia_valid
  ovl: clear ATTR_FILE from attr->ia_valid
2020-05-20 11:28:35 -07:00
Tetsuo Handa 566d136289 pipe: Fix pipe_full() test in opipe_prep().
syzbot is reporting that splice()ing from non-empty read side to
already-full write side causes unkillable task, for opipe_prep() is by
error not inverting pipe_full() test.

  CPU: 0 PID: 9460 Comm: syz-executor.5 Not tainted 5.6.0-rc3-next-20200228-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:rol32 include/linux/bitops.h:105 [inline]
  RIP: 0010:iterate_chain_key kernel/locking/lockdep.c:369 [inline]
  RIP: 0010:__lock_acquire+0x6a3/0x5270 kernel/locking/lockdep.c:4178
  Call Trace:
     lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4720
     __mutex_lock_common kernel/locking/mutex.c:956 [inline]
     __mutex_lock+0x156/0x13c0 kernel/locking/mutex.c:1103
     pipe_lock_nested fs/pipe.c:66 [inline]
     pipe_double_lock+0x1a0/0x1e0 fs/pipe.c:104
     splice_pipe_to_pipe fs/splice.c:1562 [inline]
     do_splice+0x35f/0x1520 fs/splice.c:1141
     __do_sys_splice fs/splice.c:1447 [inline]
     __se_sys_splice fs/splice.c:1427 [inline]
     __x64_sys_splice+0x2b5/0x320 fs/splice.c:1427
     do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:295
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: syzbot+b48daca8639150bc5e73@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=9386d051e11e09973d5a4cf79af5e8cedf79386d
Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-20 10:54:29 -07:00
Xiaoguang Wang d4ae271dfa io_uring: reset -EBUSY error when io sq thread is waken up
In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:

int main(int argc, char *argv[])
{
        struct io_uring ring;
        int i, fd, ret;
        struct io_uring_sqe *sqe;
        struct io_uring_cqe *cqe;
        struct iovec *iovecs;
        void *buf;
        struct io_uring_params p;

        if (argc < 2) {
                printf("%s: file\n", argv[0]);
                return 1;
        }

        memset(&p, 0, sizeof(p));
        p.flags = IORING_SETUP_SQPOLL;
        ret = io_uring_queue_init_params(4, &ring, &p);
        if (ret < 0) {
                fprintf(stderr, "queue_init: %s\n", strerror(-ret));
                return 1;
        }

        fd = open(argv[1], O_RDONLY | O_DIRECT);
        if (fd < 0) {
                perror("open");
                return 1;
        }

        iovecs = calloc(10, sizeof(struct iovec));
        for (i = 0; i < 10; i++) {
                if (posix_memalign(&buf, 4096, 4096))
                        return 1;
                iovecs[i].iov_base = buf;
                iovecs[i].iov_len = 4096;
        }

        ret = io_uring_register_files(&ring, &fd, 1);
        if (ret < 0) {
                fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
                return ret;
        }

        for (i = 0; i < 10; i++) {
                sqe = io_uring_get_sqe(&ring);
                if (!sqe)
                        break;

                io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
                sqe->flags |= IOSQE_FIXED_FILE;

                ret = io_uring_submit(&ring);
                sleep(1);
                printf("submit %d\n", i);
        }

        for (i = 0; i < 10; i++) {
                io_uring_wait_cqe(&ring, &cqe);
                printf("receive: %d\n", i);
                if (cqe->res != 4096) {
                        fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
                        ret = 1;
                }
                io_uring_cqe_seen(&ring, cqe);
        }

        close(fd);
        io_uring_queue_exit(&ring);
        return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-20 07:26:47 -06:00
Jens Axboe b532576ed3 io_uring: don't add non-IO requests to iopoll pending list
We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.

Fixes: ddf0322db7 ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19 21:20:27 -06:00
Linus Torvalds 115a54162a Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Stable fodder fix: copy_fdtable() would get screwed on 64bit boxen
  with sysctl_nr_open raised to 512M or higher, which became possible
  since 2.6.25.

  Nobody sane would set the things up that way, but..."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix multiplication overflow in copy_fdtable()
2020-05-19 16:33:26 -07:00
Al Viro 4e89b72104 fix multiplication overflow in copy_fdtable()
cpy and set really should be size_t; we won't get an overflow on that,
since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
so nr that would've managed to overflow size_t on that multiplication
won't get anywhere near copy_fdtable() - we'll fail with EMFILE
before that.

Cc: stable@kernel.org # v2.6.25+
Fixes: 9cfe015aa4 (get rid of NR_OPEN and introduce a sysctl_nr_open)
Reported-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-19 18:29:36 -04:00
Bijan Mottahedeh 4f4eeba87c io_uring: don't use kiocb.private to store buf_index
kiocb.private is used in iomap_dio_rw() so store buf_index separately.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>

Move 'buf_index' to a hole in io_kiocb.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-19 16:19:49 -06:00
Christoph Hellwig 959f758451 ext4: fix fiemap size checks for bitmap files
Add an extra validation of the len parameter, as for ext4 some files
might have smaller file size limits than others.  This also means the
redundant size check in ext4_ioctl_get_es_cache can go away, as all
size checking is done in the shared fiemap handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-19 15:03:37 -04:00
Ritesh Harjani 9f44eda195 ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
ext4 supports max number of logical blocks in a file to be 0xffffffff.
(This is since ext4_extent's ee_block is __le32).
This means that EXT4_MAX_LOGICAL_BLOCK should be 0xfffffffe (starting
from 0 logical offset). This patch fixes this.

The issue was seen when ext4 moved to iomap_fiemap API and when
overlayfs was mounted on top of ext4. Since overlayfs was missing
filemap_check_ranges(), so it could pass a arbitrary huge length which
lead to overflow of map.m_len logic.

This patch fixes that.

Fixes: d3b6f23f71 ("ext4: move ext4_fiemap to use iomap framework")
Reported-by: syzbot+77fa5bdb65cc39711820@syzkaller.appspotmail.com
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200505154324.3226743-2-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-19 15:03:37 -04:00
Linus Torvalds 45088963ca Description for this pull request:
- Fix potential memory leak in exfat_find.
 - Set exfat's splice_write to iter_file_splice_write to fix the splice
   failure on direct-opened file
 -----BEGIN PGP SIGNATURE-----
 
 iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAl7CCAkYHG5hbWphZS5q
 ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIX3AQAM7cV9GZecl6YfQu5AIeFbHT
 uvSnvuW5O5JS9qdra4knSTthHYJ8eUucjcPlxUtHhs4oznm+erjZc9A0tRwDQyjy
 EjoZZGEBOphWFLCY28K9LdJZD89JhNh9v5XUD9dId3XFnznaRjvZRHlbCVzqAWG1
 DUcRedNEderpkg0FySEBIx6EHhKX6+YgkKOWlGG8r8bqdRrgZbjyAyduRdKlyX31
 7XIeS4qFMDWLrqcbJdmL9pljx4VH2MswNIXK6kA2pydMwItGhod2yRWzFMYPeTDm
 fTRDKzHvfA3J30h3wMI5FJu/ikfuVqsmp8i5rND7v/eRP13uuxZCSI2MfnUzHEj2
 ciWxGfr5kFGg/1eAjNtOy3AnS5wsaEQ0ixYFGgKb8ENvToyT4cHa+9X2y0PrVnRu
 bOyqJTBwlSisqp3DiK8aAhklHHbX1/CheGOLMj1B48H42eREUHFn/yPYroOb+Ot/
 CiRH4feACSCMRGn8HdlgnguOs4zwZIWtLQWpfqhu4CJSNFa3IW6PSl53U1vPzuXG
 v2Cdxn6D1gCqxsFbSmzmMJVkNfILrY7sLSU9lqrXWCQ4T6I8FpBxIvU8CCi1boQD
 7hpdXstL/0xhb/gTFQL2uJ2MasQdSzVQgl6dmGK5riJkqwgaWz4FDro+IF3JxdQT
 qtUZ5nd6e33pl6PwK3nt
 =JN5f
 -----END PGP SIGNATURE-----

Merge tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - Fix potential memory leak in exfat_find

 - Set exfat's splice_write to iter_file_splice_write to fix a splice
   failure on direct-opened files

* tag 'for-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix possible memory leak in exfat_find()
  exfat: use iter_file_splice_write
2020-05-18 10:33:13 -07:00
David Howells 9d1be4f4dc afs: Don't unlock fetched data pages until the op completes successfully
Don't call req->page_done() on each page as we finish filling it with
the data coming from the network.  Whilst this might speed up the
application a bit, it's a problem if there's a network failure and the
operation has to be reissued.

If this happens, an oops occurs because afs_readpages_page_done() clears
the pointer to each page it unlocks and when a retry happens, the
pointers to the pages it wants to fill are now NULL (and the pages have
been unlocked anyway).

Instead, wait till the operation completes successfully and only then
release all the pages after clearing any terminal gap (the server can
give us less data than we requested as we're allowed to ask for more
than is available).

KASAN produces a bug like the following, and even without KASAN, it can
oops and panic.

    BUG: KASAN: wild-memory-access in _copy_to_iter+0x323/0x5f4
    Write of size 1404 at addr 0005088000000000 by task md5sum/5235

    CPU: 0 PID: 5235 Comm: md5sum Not tainted 5.7.0-rc3-fscache+ #250
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Call Trace:
     memcpy+0x39/0x58
     _copy_to_iter+0x323/0x5f4
     __skb_datagram_iter+0x89/0x2a6
     skb_copy_datagram_iter+0x129/0x135
     rxrpc_recvmsg_data.isra.0+0x615/0xd42
     rxrpc_kernel_recv_data+0x1e9/0x3ae
     afs_extract_data+0x139/0x33a
     yfs_deliver_fs_fetch_data64+0x47a/0x91b
     afs_deliver_to_call+0x304/0x709
     afs_wait_for_call_to_complete+0x1cc/0x4ad
     yfs_fs_fetch_data+0x279/0x288
     afs_fetch_data+0x1e1/0x38d
     afs_readpages+0x593/0x72e
     read_pages+0xf5/0x21e
     __do_page_cache_readahead+0x128/0x23f
     ondemand_readahead+0x36e/0x37f
     generic_file_buffered_read+0x234/0x680
     new_sync_read+0x109/0x17e
     vfs_read+0xe6/0x138
     ksys_read+0xd8/0x14d
     do_syscall_64+0x6e/0x8a
     entry_SYSCALL_64_after_hwframe+0x49/0xb3

Fixes: 196ee9cd2d ("afs: Make afs_fs_fetch_data() take a list of pages")
Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-18 10:29:17 -07:00
Jens Axboe e3aabf9554 io_uring: cancel work if task_work_add() fails
We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.

Fixes: aa96bf8a9e ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-18 11:14:22 -06:00
Wei Yongjun 94182167ec exfat: fix possible memory leak in exfat_find()
'es' is malloced from exfat_get_dentry_set() in exfat_find() and should
be freed before leaving from the error handling cases, otherwise it will
cause memory leak.

Fixes: 5f2aa07507 ("exfat: add inode operations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-05-18 11:51:44 +09:00