OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Miklos Szeredi	615047eff1	fuse: convert init to simple api Bypass the fc->initialized check by setting the force flag. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	33826ebbbe	fuse: convert writepages to simple api Derive fuse_writepage_args from fuse_io_args. Sending the request is tricky since it was done with fi->lock held, hence we must either use atomic allocation or release the lock. Both are possible so try atomic first and if it fails, release the lock and do the regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are necessary for correct operation. Move the page realloc function from dev.c to file.c and convert to using fuse_writepage_args. The last caller of fuse_write_fill() is gone, so get rid of it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	43f5098eb8	fuse: convert readdir to simple api The old fuse_read_fill() helper can be deleted, now that the last user is gone. The fuse_io_args struct is moved to fuse_i.h so it can be shared between readdir/read code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	134831e36b	fuse: convert readpages to simple api Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the functionality of the same named members in fuse_req. fuse_short_read() can now take struct fuse_args_pages. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	45ac96ed7c	fuse: convert direct_io to simple api Change of semantics in fuse_async_req_send/fuse_send_(read\|write): these can now return error, in which case the 'end' callback isn't called, so the fuse_io_args object needs to be freed. Added verification that the return value is sane (less than or equal to the requested read/write size). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	1259728731	fuse: add simple background helper Create a helper named fuse_simple_background() that is similar to fuse_simple_request(). Unlike the latter, it returns immediately and calls the supplied 'end' callback when the reply is received. The supplied 'args' pointer is stored in 'fuse_req' which allows the callback to interpret the output arguments decoded from the reply. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	338f2e3f33	fuse: convert sync write to simple api Extract a fuse_write_flags() helper that converts ki_flags relevant write to open flags. The other parts of fuse_send_write() aren't used in the fuse_perform_write() case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	00793ca5d4	fuse: covert readpage to simple api Derive fuse_io_args from struct fuse_args_pages. This will be used for both synchronous and asynchronous read/write requests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	a0d45d84f4	fuse: fuse_short_read(): don't take fuse_req as argument This will allow the use of this function when converting to the simple api (which doesn't use fuse_req). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	093f38a2c1	fuse: convert ioctl to simple api fuse_simple_request() is converted to return length of last (instead of single) out arg, since FUSE_IOCTL_OUT has two out args, the second of which is variable length. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	4c4f03f78c	fuse: move page alloc fuse_req_pages_alloc() is moved to file.c, since its internal use by the device code will eventually be removed. Rename to fuse_pages_alloc() to signify that it's not only usable for fuse_req page array. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	4c29afece8	fuse: convert readlink to simple api Also turn BUG_ON into gracefully recovered WARN_ON. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	68583165f9	fuse: add pages to fuse_args Derive fuse_args_pages from fuse_args. This is used to handle requests which use pages for input or output. The related flags are added to fuse_args. New FR_ALLOC_PAGES flags is added to indicate whether the page arrays in fuse_req need to be freed by fuse_put_request() or not. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	1ccd1ea249	fuse: convert destroy to simple api We can use the "force" flag to make sure the DESTROY request is always sent to userspace. So no need to keep it allocated during the lifetime of the filesystem. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	e413754b26	fuse: add nocreds to fuse_args In some cases it makes no sense to set pid/uid/gid fields in the request header. Allow fuse_simple_background() to omit these. This is only required in the "force" case, so for now just WARN if set otherwise. Fold fuse_get_req_nofail_nopages() into its only caller. Comment is obsolete anyway. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	3545fe2112	fuse: convert fuse_force_forget() to simple api Move this function to the readdir.c where its only caller resides. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	454a7613f5	fuse: add noreply to fuse_args This will be used by fuse_force_forget(). We can expand fuse_request_send() into fuse_simple_request(). The FR_WAITING bit has already been set, no need to check. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	c500ebaa90	fuse: convert flush to simple api Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate the request in that case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	40ac7ab2d0	fuse: simplify 'nofail' request Instead of complex games with a reserved request, just use __GFP_NOFAIL. Both calers (flush, readdir) guarantee that connection was already initialized, so no need to wait for fc->initialized. Also remove unneeded clearing of FR_BACKGROUND flag. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	1f4e9d03d1	fuse: rearrange and resize fuse_args fields This makes the structure better packed. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	d5b4854357	fuse: flatten 'struct fuse_args' ...to make future expansion simpler. The hiearachical structure is a historical thing that does not serve any practical purpose. The generated code is excatly the same before and after the patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Eric Biggers	76e43c8cca	fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock. This may have to wait for fuse_iqueue::waitq.lock to be released by one of many places that take it with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by protecting the state of struct fuse_iqueue with a separate spinlock, and only accessing fuse_iqueue::waitq using the versions of the waitqueue functions which do IRQ-safe locking internally. Reproducer: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> #include <sys/stat.h> #include <sys/syscall.h> #include <unistd.h> #include <linux/aio_abi.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); aio_context_t ctx = 0; struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd }; struct iocb *cbp = &cb; sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0700); mount("foo", "mnt", "fuse", 0, opts); syscall(__NR_io_setup, 1, &ctx); syscall(__NR_io_submit, ctx, 1, &cbp); } Beginning of lockdep output: ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 5.3.0-rc5 #9 Not tainted ----------------------------------------------------- syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825 and this task is already holding: 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825 which would create a new lock dependency: (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&ctx->ctx_lock)->rlock){..-.} [...] Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com Fixes: `bfe4037e72` ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.19+ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:29 +02:00
David Howells	c7eb686963	vfs: subtype handling moved to fuse The unused vfs code can be removed. Don't pass empty subtype (same as if ->parse callback isn't called). The bits that are left involve determining whether it's permitted to split the filesystem type string passed in to mount(2). Consequently, this means that we cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string with a dot in it always indicates a subtype specification. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-06 21:28:49 +02:00
David Howells	c30da2e981	fuse: convert to use the new mount API Convert the fuse filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-06 21:27:09 +02:00
Miklos Szeredi	bf9261b818	Merge branch 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into HEAD Mount API convertion of fuse needs get_tree_bdev().	2019-09-06 21:22:58 +02:00
David Howells	0f07100410	mtd: Provide fs_context-aware mount_mtd() replacement Provide a function, get_tree_mtd(), to replace mount_mtd(), using an fs_context struct to hold the parameters. Signed-off-by: David Howells <dhowells@redhat.com> cc: David Woodhouse <dwmw2@infradead.org> cc: Brian Norris <computersforpeace@gmail.com> cc: Boris Brezillon <bbrezillon@kernel.org> cc: Marek Vasut <marek.vasut@gmail.com> cc: Richard Weinberger <richard@nod.at> cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:23 -04:00
David Howells	fe62c3a4e1	vfs: Create fs_context-aware mount_bdev() replacement Create a function, get_tree_bdev(), that is fs_context-aware and a ->get_tree() counterpart of mount_bdev(). It caches the block device pointer in the fs_context struct so that this information can be passed into sget_fc()'s test and set functions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: linux-block@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:22 -04:00
Al Viro	533770cc0a	new helper: get_tree_keyed() For vfs_get_keyed_super users. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:34:22 -04:00
Eric Biggers	1dd9bc08cf	vfs: set fs_context::user_ns for reconfigure fs_context::user_ns is used by fuse_parse_param(), even during remount, so it needs to be set to the existing value for reconfigure. Reproducer: #include <fcntl.h> #include <sys/mount.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0777); mount("foo", "mnt", "fuse.foo", 0, opts); mount("foo", "mnt", "fuse.foo", MS_REMOUNT, opts); } Crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 129 Comm: syz_make_kuid Not tainted 5.3.0-rc5-next-20190821 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:map_id_range_down+0xb/0xc0 kernel/user_namespace.c:291 [...] Call Trace: map_id_down kernel/user_namespace.c:312 [inline] make_kuid+0xe/0x10 kernel/user_namespace.c:389 fuse_parse_param+0x116/0x210 fs/fuse/inode.c:523 vfs_parse_fs_param+0xdb/0x1b0 fs/fs_context.c:145 vfs_parse_fs_string+0x6a/0xa0 fs/fs_context.c:188 generic_parse_monolithic+0x85/0xc0 fs/fs_context.c:228 parse_monolithic_mount_data+0x1b/0x20 fs/fs_context.c:708 do_remount fs/namespace.c:2525 [inline] do_mount+0x39a/0xa60 fs/namespace.c:3107 ksys_mount+0x7d/0xd0 fs/namespace.c:3325 __do_sys_mount fs/namespace.c:3339 [inline] __se_sys_mount fs/namespace.c:3336 [inline] __x64_sys_mount+0x20/0x30 fs/namespace.c:3336 do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: syzbot+7d6a57304857423318a5@syzkaller.appspotmail.com Fixes: 408cbe695350 ("vfs: Convert fuse to use the new mount API") Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-05 14:33:45 -04:00
Miklos Szeredi	56d250ef96	cuse: fix broken release The inode parameter in cuse_release() is likely not a fuse inode. It's a small wonder it didn't blow up until now. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-02 11:07:30 +02:00
Maxim Patlasov	17b2cbe294	fuse: cleanup fuse_wait_on_page_writeback fuse_wait_on_page_writeback() always returns zero and nobody cares. Let's make it void. Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-02 11:07:30 +02:00
Kirill Smelkov	1fb027d759	fuse: require /dev/fuse reads to have enough buffer capacity (take 2) [ This retries commit `d4b13963f2` ("fuse: require /dev/fuse reads to have enough buffer capacity"), which was reverted. In this version we require only `sizeof(fuse_in_header) + sizeof(fuse_write_in)` instead of 4K for FUSE request header room, because, contrary to libfuse and kernel client behaviour, GlusterFS actually provides only so much room for request header. ] A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem requests to handle. It does not know in advance what would be that request as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many requests are short and retrieve data from the filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. Before getting into operation phase, FUSE filesystem server and kernel client negotiate what should be the maximum write size the client will ever issue. After negotiation the contract in between server/client is that the filesystem server then should queue /dev/fuse sys_read calls with enough buffer capacity to receive any client request - WRITE in particular, while FUSE client should not, in particular, send WRITE requests with > negotiated max_write payload. FUSE client in kernel and libfuse historically reserve 4K for request header. However an existing filesystem server - GlusterFS - was found which reserves only 80 bytes for header room (= `sizeof(fuse_in_header) + sizeof(fuse_write_in)`). Since `sizeof(fuse_in_header) + sizeof(fuse_write_in)` == `sizeof(fuse_in_header) + sizeof(fuse_read_in)` == `sizeof(fuse_in_header) + sizeof(fuse_notify_retrieve_in)` is the absolute minimum any sane filesystem should be using for header room, the contract is that filesystem server should queue sys_reads with `sizeof(fuse_in_header) + sizeof(fuse_write_in)` + max_write buffer. If the filesystem server does not follow this contract, what can happen is that fuse_dev_do_read will see that request size is > buffer size, and then it will return EIO to client who issued the request but won't indicate in any way that there is a problem to filesystem server. This can be hard to diagnose because for some requests, e.g. for NOTIFY_REPLY which mimics WRITE, there is no client thread that is waiting for request completion and that EIO goes nowhere, while on filesystem server side things look like the kernel is not replying back after successful NOTIFY_RETRIEVE request made by the server. We can make the problem easy to diagnose if we indicate via error return to filesystem server when it is violating the contract. This should not practically cause problems because if a filesystem server is using shorter buffer, writes to it were already very likely to cause EIO, and if the filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER minimum buffer size. Please see [1] for context where the problem of stuck filesystem was hit for real (because kernel client was incorrectly sending more than max_write data with NOTIFY_REPLY; see also previous patch), how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Han-Wen Nienhuys <hanwen@google.com> Cc: Jakob Unterwurzacher <jakobunt@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-02 11:07:30 +02:00
Linus Torvalds	a55aa89aab	Linux 5.3-rc6	2019-08-25 12:01:23 -07:00
Linus Torvalds	c749088f25	A minor auxdisplay improvement: - ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl1ixpEACgkQGXyLc2ht IW3Alg//f7+tOooILnDsxByF6T3bD5ObZFuMAW01jnHER8q93sBAuddY28OjiSrI MZrwZLbz43Ek9zF+Q2A8RIVYD79vFUZbD33ZbQHJ1CJmD/urapVE13rmQMo+EsiB PsCgIKjRByj/WfUexRdJTZ7gbKb+l6l/gvLO9tqLbb0rD/CMEny7rLEzmC5uLwYE koM6A74AhXBEQMYR2Vn7HpLF9U3vzo7O0QuDLUlvaSv5TJgpdpZuLDJHXbBOcnRU qrD7ruPOXxwo6b218TaIeCP6IDIEOdHz/4XxcZ0rFjiTxF0nLx4OjDHlCYfsxlEw 6kujamc8kJmdUwHk3xQM2kxUlR/mMSmvpW5bRdUEBk2+Cqe4S5c2OFSxYoHMBiI/ SpmUJbkLgzQSo33k0rNKiZL49arlrsNN94EV9+QHSHbmTq/HlPWuPleUUfA0Ep46 mN7wbQkE1FAniwoOu3Tx4T1Kw+L2gTqAmqxNCFf6HoihnkFjf/RAYEGPLBP9mKAN o2W9icMSREeM9pKy4NYr0Fcq7eD1vcYGkSY1gpFfNDDEt7TTH7M3L85ty0ky+JvU jHRayXNRg/SGtx3CBhDw3iiq4Dj5t2YJ0NTNF2XyzHTkass4dGfE8duHoDDnaFP5 GphEAjf3kV+f+j9f7Kj8Y5cCeCMxctWp3bv6eZGK+LMiPUzPqC0= =qxac -----END PGP SIGNATURE----- Merge tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux Pull auxdisplay cleanup from Miguel Ojeda: "Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)" * tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux: auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant	2019-08-25 11:43:17 -07:00
Linus Torvalds	32ae83ffec	This pull request contains a single bug fix for UML: - Fix time travel mode -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl1ikVwWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7waNYEADaqyJu+2UAp2hZGkwric9dqh4l IibXY0bPKokDIAt/gGmh5CX8cqBWKjWJSny91mqrINm1SBv4iTm0GLrSq7ZmQmYH 1JRZSk3QtxRfVMVKizp2L/K22lPSMIViYoAsTGYTbRAmNyjBGJNSZrgCs3BBi/1F mxINtpyg2MyWOg9aNIzil6ZfwcPEazt9US6XM/2Tcs3z9wDO5bfRIgD3ILoWcT7D RPwLbtMi242Uak+Eyi44QCfwB5UjC1UvDdKjgr3paHiTVm7LS0dCEnBhaDhtGeb8 bqEnSVH9oHA0XQhUAYdFNMQN0n1+bEDbqnbz9JLg4iJt6jXpvY8oL9xi7k/FglSu zXlhRRE4G7AYpBoCvQp/Anh85aCAcsZ9nP4aSN8GXLi7IqyaZ7KRTBHrAFxYi/WP dXVaqR984w5bEBDLRUsGosKHlHXHMnAwPDthQhuRrCqqmE/YyzpOaCsG46Wzpriy Jg302QmlTOMfx0uUoCVsiEq6rwar6LGTP7raihaR8j9g0EzFr7f4FpzmWxQpvJqG YpE3jVwp3OOKJjOETIW6ko2lzai3GOP9rPqoPfOhtqeALHLtORlg7XAhBj7n3Tji rLHKmVIxiiAmkfQItMdRjJbu9gFAiW+ZR7nEnDnhMjer1iPkJX+DtCLEZFpui7Me WrrQx4ypeO4RFemQCQ== =bDrL -----END PGP SIGNATURE----- Merge tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML fix from Richard Weinberger: "Fix time travel mode" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: fix time travel mode	2019-08-25 11:40:24 -07:00
Linus Torvalds	94a76d9b52	This pull request contains the following fixes for UBIFS and JFFS2: UBIFS: - Don't block too long in writeback_inodes_sb() - Fix for a possible overrun of the log head - Fix double unlock in orphan_delete() JFFS2: - Remove C++ style from UAPI header and unbreak picky toolchains -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl1ik14WHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbP2D/4xVW7YP5Yyt6YrABJuclfoib30 2LI6eOz0+5OojQKUbOzXCN9N7Dv4TLJKrCjRc9qKYTIB1DiQXuBDqtYKg6CTBhHb MjiftEDiBQ6j3jVmRxkQRXZEB9I3Uu9CkA8s65+UmL8peJfgNElpH34omsU1fzup y0NhZhj77P5jsAG6r7yXvuaofCOTlZIZVPya9FX17J0Ra+3rMOCtVEqnaHk2E5RB EQPAEByqXUIx7+9mOi1Krw7B7fesB7oOVbCykE5knX1pZQCTURP64yNr35WxN+7Z crcpdEQtf54qWMCKf4ClIBHiPmmsDIHYJy3JXjgJKOwIYvrB3dZ5E170qPr3JixY nS+l8x69IYZhWUzHg8gxDizk92iFYKbO1h5vBwI7NUFHkHLzylsgonBK0KdaUnol OvI5oCO/rdJEMBPr5LEFpOjZJIEptPtXpDvQCpm5tWd5tuW+8edNpI38lDO9LThC O0diZZUQfsuzD1XrvKRORPU+4lskzGV5b1UA0DWXdGKALqM5VrQZo1XftvA74Zkv oZQcHNK5wdecQX81Oadfb/0a5SN7FGGtTUCKTpOyBIu0adarGIasC6TQr2aDiiNh 7jLjBoV2XEGhXZQrK2lm8G+6rJ7Mp11B6aoTFgDELzt+SB7htp6dARR2+4aGWXh9 iXgme0n9HXDDeuosag== =Bsgx -----END PGP SIGNATURE----- Merge tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBIFS and JFFS2 fixes from Richard Weinberger: "UBIFS: - Don't block too long in writeback_inodes_sb() - Fix for a possible overrun of the log head - Fix double unlock in orphan_delete() JFFS2: - Remove C++ style from UAPI header and unbreak picky toolchains" * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Limit the number of pages in shrink_liability ubifs: Correctly initialize c->min_log_bytes ubifs: Fix double unlock around orphan_delete() jffs2: Remove C++ style comments from uapi header	2019-08-25 11:29:27 -07:00
Linus Torvalds	146c3d3220	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Fix a boot regression caused by the recent bootparam sanitizing change, which escaped the attention of all people who reviewed that code. - Address a boot problem on machines with broken E820 tables caused by an underflow which ended up placing the trampoline start at physical address 0. - Handle machines which do not advertise a legacy timer of any form, but need calibration of the local APIC timer gracefully by making the calibration routine independent from the tick interrupt. Marked for stable as well as there seems to be quite some new laptops rolled out which expose this. - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are affected by broken firmware which does not initialize RDRAND correctly after resume. Add a command line parameter to override this for machine which either do not use suspend/resume or have a fixed BIOS. Unfortunately there is no way to detect this on boot, so the only safe decision is to turn it off by default. - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which caused fast KVM instruction emulation to break. - Explain the Intel CPU model naming convention so that the repeating discussions come to an end" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 x86/boot: Fix boot regression caused by bootparam sanitizing x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h x86/boot/compressed/64: Fix boot on machines with broken E820 table x86/apic: Handle missing global clockevent gracefully x86/cpu: Explain Intel model naming convention	2019-08-25 10:10:15 -07:00
Linus Torvalds	5a13fc3d8b	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping fix from Thomas Gleixner: "A single fix for a regression caused by the generic VDSO implementation where a math overflow causes CLOCK_BOOTTIME to become a random number generator" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Prevent math overflow in BOOTTIME update	2019-08-25 10:08:01 -07:00
Linus Torvalds	8a04c2ee62	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Handle the worker management in situations where a task is scheduled out on a PI lock contention correctly and schedule a new worker if possible" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Schedule new worker even if PI-blocked	2019-08-25 10:06:12 -07:00
Linus Torvalds	05bbb9360a	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two small fixes for kprobes and perf: - Prevent a deadlock in kprobe_optimizer() causes by reverse lock ordering - Fix a comment typo" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Fix potential deadlock in kprobe_optimizer() perf/x86: Fix typo in comment	2019-08-25 10:03:32 -07:00
Linus Torvalds	44c471e436	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a imbalanced kobject operation in the irq decriptor code which was unearthed by the new warnings in the kobject code" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Properly pair kobject_del() with kobject_add()	2019-08-25 10:00:21 -07:00
Linus Torvalds	f47edb59bb	Merge branch 'akpm' (patches from Andrew) Mergr misc fixes from Andrew Morton: "11 fixes" Mostly VM fixes, one psi polling fix, and one parisc build fix. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y mm/zsmalloc.c: fix race condition in zs_destroy_pool mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely mm, page_owner: handle THP splits correctly userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx psi: get poll_work to run when calling poll syscall next time mm: memcontrol: flush percpu vmevents before releasing memcg mm: memcontrol: flush percpu vmstats before releasing memcg parisc: fix compilation errrors mm, page_alloc: move_freepages should not examine struct page of reserved memory mm/z3fold.c: fix race between migration and destruction	2019-08-25 09:56:27 -07:00
Linus Torvalds	e67095fd2f	dma-mapping fixes for 5.3-rc Two fixes for regressions in this merge window: - select the Kconfig symbols for the noncoherent dma arch helpers on arm if swiotlb is selected, not just for LPAE to not break then Xen build, that uses swiotlb indirectly through swiotlb-xen - fix the page allocator fallback in dma_alloc_contiguous if the CMA allocation fails -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl1hvn4LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYON4w//Recfoy5T2Q4Gfjp1xVKGbr2sP7J93Vs7VCyQNZmX PrtzhmNKs4gxCEXVgHm+GVA+IJwQFqDtSFaPb8q3GQ+qM9NUDF4ScMFpfrLZsFr1 dorm5kC1xcwrQtWjS1CQS/Gj0VBtWiMQOoUcAESMqgBIUo4ssj3Ny+vnh8hWgAOs oVDgOM4wt35bW0Pv/iY44uQzOq7xcYJUUYtPIiP9vMDrhPsxe6D1DgFQ4HZKJWix uS3BjZnsZDnLltXM/0CKdRV9wLF+jHYP/wJTztksRlr/A5V3FJ8lJIvgphxG1v3J tDfQs4BNuGWBjqdg+Qo6qOPEL9krvVYYVVql93DXwtPK/cJW1Z+0glgC2rbbHmIy ew35DFnYm9v0sFLZnbpuoHd6sQ9G59nTZstkqt/Z/hldBvKotwBpeuILAcMC9Nlw 3iYW6Sz5L7cmkifC8OvopKKJWVoW5rVtMrVQw5niBiZVERtWbY825r/7ju2xYhZC iSAaUHT5wNtXsXQOTrFQ5LzTDBtgGyXRXgvNagEHhBf120jBQfOhvOCVT2HHOxdy 5vx7xeeRS0M2HpxIsmd3XQjIUQEY9x1to4FKiYczGM1kcKeyWWBMFOXfLxe2Rmhg h14lbfsAxIEWdFkJAVFhjyjzC6IzxyVGtHCxw1iw0VgGzYATO/K6Oo8T2hG3HagR abQ= =DXk9 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: "Two fixes for regressions in this merge window: - select the Kconfig symbols for the noncoherent dma arch helpers on arm if swiotlb is selected, not just for LPAE to not break then Xen build, that uses swiotlb indirectly through swiotlb-xen - fix the page allocator fallback in dma_alloc_contiguous if the CMA allocation fails" * tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: fix zone selection after an unaddressable CMA allocation arm: select the dma-noncoherent symbols for all swiotlb builds	2019-08-24 20:00:11 -07:00
Andrey Ryabinin	00fb24a42a	mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y The code like this: ptr = kmalloc(size, GFP_KERNEL); page = virt_to_page(ptr); offset = offset_in_page(ptr); kfree(page_address(page) + offset); may produce false-positive invalid-free reports on the kernel with CONFIG_KASAN_SW_TAGS=y. In the example above we lose the original tag assigned to 'ptr', so kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag is different from the tag in shadow hence print false report. Instead of just comparing tags, do the following: 1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's double-free and it doesn't matter what tag the pointer have. 2) If pointer tag is different from 0xFF, make sure that tag in the shadow is the same as in the pointer. Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com Fixes: `7f94ffbc4c` ("kasan: add hooks implementation for tag-based mode") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Henry Burns	701d678599	mm/zsmalloc.c: fix race condition in zs_destroy_pool In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time. Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work after zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it. Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward. In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction). Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: `48b4800a1c` ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Henry Burns	1a87aa0359	mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely In zs_page_migrate() we call putback_zspage() after we have finished migrating all pages in this zspage. However, the return value is ignored. If a zs_free() races in between zs_page_isolate() and zs_page_migrate(), freeing the last object in the zspage, putback_zspage() will leave the page in ZS_EMPTY for potentially an unbounded amount of time. To fix this, we need to do the same thing as zs_page_putback() does: schedule free_work to occur. To avoid duplicated code, move the sequence to a new putback_zspage_deferred() function which both zs_page_migrate() and zs_page_putback() call. Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com Fixes: `48b4800a1c` ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Vlastimil Babka	f7da677bc6	mm, page_owner: handle THP splits correctly THP splitting path is missing the split_page_owner() call that split_page() has. As a result, split THP pages are wrongly reported in the page_owner file as order-9 pages. Furthermore when the former head page is freed, the remaining former tail pages are not listed in the page_owner file at all. This patch fixes that by adding the split_page_owner() call into __split_huge_page(). Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz Fixes: `a9627bc5e3` ("mm/page_owner: introduce split_page_owner and replace manual handling") Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Oleg Nesterov	46d0b24c5e	userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if mm->core_state != NULL. Otherwise a page fault can see userfaultfd_missing() == T and use an already freed userfaultfd_ctx. Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com Fixes: `04f5866e41` ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Peter Xu <peterx@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Jason Xing	7b2b55da1d	psi: get poll_work to run when calling poll syscall next time Only when calling the poll syscall the first time can user receive POLLPRI correctly. After that, user always fails to acquire the event signal. Reproduce case: 1. Get the monitor code in Documentation/accounting/psi.txt 2. Run it, and wait for the event triggered. 3. Kill and restart the process. The question is why we can end up with poll_scheduled = 1 but the work not running (which would reset it to 0). And the answer is because the scheduling side sees group->poll_kworker under RCU protection and then schedules it, but here we cancel the work and destroy the worker. The cancel needs to pair with resetting the poll_scheduled flag. Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Roman Gushchin	bb65f89b7d	mm: memcontrol: flush percpu vmevents before releasing memcg Similar to vmstats, percpu caching of local vmevents leads to an accumulation of errors on non-leaf levels. This happens because some leftovers may remain in percpu caches, so that they are never propagated up by the cgroup tree and just disappear into nonexistence with on releasing of the memory cgroup. To fix this issue let's accumulate and propagate percpu vmevents values before releasing the memory cgroup similar to what we're doing with vmstats. Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate only over online cpus. Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com Fixes: `42a3003535` ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00

1 2 3 4 5 ...

857116 Commits All Branches Search

857116 Commits

All Branches