OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Pavel Begunkov	5af1d13e8f	io_uring: batch put_task_struct() As every iopoll request have a task ref, it becomes expensive to put them one by one, instead we can put several at once integrating that into io_req_free_batch(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:46 -06:00
Pavel Begunkov	cbcf72148d	io_uring: return locked and pinned page accounting Locked and pinned memory accounting in io_{,un}account_mem() depends on having ->sqo_mm, which is NULL after a recent change for non SQPOLL'ed io_ring. That disables the accounting. Return ->sqo_mm initialisation back, and do __io_sq_thread_acquire_mm() based on IORING_SETUP_SQPOLL flag. Fixes: `8eb06d7e8d` ("io_uring: fix missing ->mm on exit") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	5dbcad51f7	io_uring: don't miscount pinned memory io_sqe_buffer_unregister() uses cxt->sqo_mm for memory accounting, but io_ring_ctx_free() drops ->sqo_mm before leaving pinned_vm over-accounted. Postpone mm cleanup for when it's not needed anymore. Fixes: `309758254e` ("io_uring: report pinned memory usage") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	7fbb1b541f	io_uring: don't open-code recv kbuf managment Don't implement fast path of kbuf freeing and management inlined into io_recv{,msg}(), that's error prone and duplicates handling. Replace it with a helper io_put_recv_kbuf(), which mimics io_put_rw_kbuf() in the io_read/write(). This also keeps cflags calculation in one place, removing duplication between rw and recv/send. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	8ff069bf2e	io_uring: extract io_put_kbuf() helper Extract a common helper for cleaning up a selected buffer, this will be used shortly. By the way, correct cflags types to unsigned and, as kbufs are anyway tracked by a flag, remove useless zeroing req->rw.addr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	bc02ef3325	io_uring: move BUFFER_SELECT check into *recv[msg] Move REQ_F_BUFFER_SELECT flag check out of io_recv_buffer_select(), and do that in its call sites That saves us from double error checking and possibly an extra function call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	0e1b6fe3d1	io_uring: free selected-bufs if error'ed io_clean_op() may be skipped even if there is a selected io_buffer, that's because *select_buffer() funcions never set REQ_F_NEED_CLEANUP. Trigger io_clean_op() when REQ_F_BUFFER_SELECTED is set as well, and and clear the flag if was freed out of it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	14c32eee92	io_uring: don't forget cflags in io_recv() Instead of returning error from io_recv(), go through generic cleanup path, because it'll retain cflags for userspace. Do the same for io_send() for consistency. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	6b754c8b91	io_uring: remove extra checks in send/recv With the return on a bad socket, kmsg is always non-null by the end of the function, prune left extra checks and initialisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:45 -06:00
Pavel Begunkov	7a7cacba8b	io_uring: indent left {send,recv}[msg]() Flip over "if (sock)" condition with return on error, the upper layer will take care. That change will be handy later, but already removes an extra jump from hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:44 -06:00
Pavel Begunkov	06ef3608b0	io_uring: simplify file ref tracking in submission state Currently, file refs in struct io_submit_state are tracked with 2 vars: @has_refs -- how many refs were initially taken @used_refs -- number of refs used Replace it with a single variable counting how many refs left at the current moment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:44 -06:00
Pavel Begunkov	57f1a64958	io_uring/io-wq: move RLIMIT_FSIZE to io-wq RLIMIT_SIZE in needed only for execution from an io-wq context, hence move all preparations from hot path to io-wq work setup. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:44 -06:00
Pavel Begunkov	327d6d968b	io_uring: alloc ->io in io_req_defer_prep() Every call to io_req_defer_prep() is prepended with allocating ->io, just do that in the function. And while we're at it, mark error paths with unlikey and replace "if (ret < 0)" with "if (ret)". There is only one change in the observable behaviour, that's instead of killing the head request right away on error, it postpones it until the link is assembled, that looks more preferable. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:44 -06:00
Pavel Begunkov	1c2da9e883	io_uring: remove empty cleanup of OP_OPEN* reqs A switch in __io_clean_op() doesn't have default, it's pointless to list opcodes that doesn't do any cleanup. Remove IORING_OP_OPEN* from there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:44 -06:00
Pavel Begunkov	dca9cf8b87	io_uring: inline io_req_work_grab_env() The only caller of io_req_work_grab_env() is io_prep_async_work(), and they are both initialising req->work. Inline grab_env(), it's easier to keep this way, moreover there already were bugs with misplacing io_req_init_async(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 13:00:40 -06:00
Pavel Begunkov	0f7e466b39	io_uring: place cflags into completion data req->cflags is used only for defer-completion path, just use completion data to store it. With the 4 bytes from the ->sequence patch and compacting io_kiocb, this frees 8 bytes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:45 -06:00
Pavel Begunkov	9cf7c104de	io_uring: remove sequence from io_kiocb req->sequence is used only for deferred (i.e. DRAIN) requests, but initialised for every request. Remove req->sequence from io_kiocb together with its initialisation in io_init_req(). Replace it with a new field in struct io_defer_entry, that will be calculated only when needed in io_req_defer(), which is a slow path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:45 -06:00
Pavel Begunkov	27dc8338e5	io_uring: use non-intrusive list for defer The only left user of req->list is DRAIN, hence instead of keeping a separate per request list for it, do that with old fashion non-intrusive lists allocated on demand. That's a really slow path, so that's OK. This removes req->list and so sheds 16 bytes from io_kiocb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:45 -06:00
Pavel Begunkov	7d6ddea6be	io_uring: remove init for unused list poll*() doesn't use req->list, don't init it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:45 -06:00
Pavel Begunkov	135fcde849	io_uring: add req->timeout.list Instead of using shared req->list, hang timeouts up on their own list entry. struct io_timeout have enough extra space for it, but if that will be a problem ->inflight_entry can reused for that. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:45 -06:00
Pavel Begunkov	40d8ddd4fa	io_uring: use completion list for CQ overflow As with the completion path, also use compl.list for overflowed requests. If cleaned up properly, nobody needs per-op data there anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	d21ffe7eca	io_uring: use inflight_entry list for iopoll'ing req->inflight_entry is used to track requests that grabbed files_struct. Let's share it with iopoll list, because the only iopoll'ed ops are reads and writes, which don't need a file table. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	540e32a085	io_uring: rename ctx->poll into ctx->iopoll It supports both polling and I/O polling. Rename ctx->poll to clearly show that it's only in I/O poll case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	3ca405ebfc	io_uring: share completion list w/ per-op space Calling io_req_complete(req) means that the request is done, and there is nothing left but to clean it up. That also means that per-op data after that should not be used, so we're free to reuse it in completion path, e.g. to store overflow_list as done in this patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	252917c30f	io_uring: follow **iovec idiom in io_import_iovec As for import_iovec(), return !=NULL iovec from io_import_iovec() only when it should be freed. That includes returning NULL when iovec is already in req->io, because it should be deallocated by other means, e.g. inside op handler. After io_setup_async_rw() local iovec to ->io, just mark it NULL, to follow the idea in io_{read,write} as well. That's easier to follow, and especially useful if we want to reuse per-op space for completion data. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: only call kfree() on non-NULL pointer] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	c3e330a493	io_uring: add a helper for async rw iovec prep Preparing reads/writes for async is a bit tricky. Extract a helper to not repeat it twice. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	b64e3444d4	io_uring: simplify io_req_map_rw() Don't deref req->io->rw every time, but put it in a local variable. This looks prettier, generates less instructions, and doesn't break alias analysis. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	e73751225b	io_uring: replace rw->task_work with rq->task_work io_kiocb::task_work was de-unionised, and is not planned to be shared back, because it's too useful and commonly used. Hence, instead of keeping a separate task_work in struct io_async_rw just reuse req->task_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	2ae523ed07	io_uring: extract io_sendmsg_copy_hdr() Don't repeat send msg initialisation code, it's error prone. Extract and use a helper function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	1400e69705	io_uring: use more specific type in rcv/snd msg cp send/recv msghdr initialisation works with struct io_async_msghdr, but pulls the whole struct io_async_ctx for no reason. That complicates it with composite accessing, e.g. io->msg. Use and pass the most specific type, which is struct io_async_msghdr. It is the larget field in union io_async_ctx and doesn't save stack space, but looks clearer. The most of the changes are replacing "io->msg." with "iomsg->" Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Pavel Begunkov	270a594070	io_uring: rename sr->msg into umsg Every second field in send/recv is called msg, make it a bit more understandable by renaming ->msg, which is a user provided ptr, to ->umsg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Dmitry Vyukov	b36200f543	io_uring: fix sq array offset calculation rings_size() sets sq_offset to the total size of the rings (the returned value which is used for memory allocation). This is wrong: sq array should be located within the rings, not after them. Set sq_offset to where it should be. Fixes: `75b28affdd` ("io_uring: allocate the two rings together") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hristo Venev <hristo@venev.name> Cc: io-uring@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:55:44 -06:00
Jens Axboe	760618f7a8	Merge branch 'io_uring-5.8' into for-5.9/io_uring Merge in io_uring-5.8 fixes, as changes/cleanups to how we do locked mem accounting require a fixup, and only one of the spots are noticed by git as the other merges cleanly. The flags fix from io_uring-5.8 also causes a merge conflict, the leak fix for recvmsg, the double poll fix, and the link failure locking fix. * io_uring-5.8: io_uring: fix lockup in io_fail_links() io_uring: fix ->work corruption with poll_add io_uring: missed req_init_async() for IOSQE_ASYNC io_uring: always allow drain/link/hardlink/async sqe flags io_uring: ensure double poll additions work with both request types io_uring: fix recvmsg memory leak with buffer selection io_uring: fix not initialised work->flags io_uring: fix missing msg_name assignment io_uring: account user memory freed when exit has been queued io_uring: fix memleak in io_sqe_files_register() io_uring: fix memleak in __io_sqe_files_update() io_uring: export cq overflow status to userspace Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:53:31 -06:00
Pavel Begunkov	4ae6dbd683	io_uring: fix lockup in io_fail_links() io_fail_links() doesn't consider REQ_F_COMP_LOCKED leading to nested spin_lock(completion_lock) and lockup. [ 197.680409] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 6-... } 18239 jiffies s: 1421 root: 0x40/. [ 197.680411] rcu: blocking rcu_node structures: [ 197.680412] Task dump for CPU 6: [ 197.680413] link-timeout R running task 0 1669 1 0x8000008a [ 197.680414] Call Trace: [ 197.680420] ? io_req_find_next+0xa0/0x200 [ 197.680422] ? io_put_req_find_next+0x2a/0x50 [ 197.680423] ? io_poll_task_func+0xcf/0x140 [ 197.680425] ? task_work_run+0x67/0xa0 [ 197.680426] ? do_exit+0x35d/0xb70 [ 197.680429] ? syscall_trace_enter+0x187/0x2c0 [ 197.680430] ? do_group_exit+0x43/0xa0 [ 197.680448] ? __x64_sys_exit_group+0x18/0x20 [ 197.680450] ? do_syscall_64+0x52/0xa0 [ 197.680452] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:51:33 -06:00
Pavel Begunkov	d5e16d8e23	io_uring: fix ->work corruption with poll_add req->work might be already initialised by the time it gets into __io_arm_poll_handler(), which will corrupt it by using fields that are in an union with req->work. Luckily, the only side effect is missing put_creds(). Clean req->work before going there. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-24 12:51:33 -06:00
Pavel Begunkov	3e863ea3bb	io_uring: missed req_init_async() for IOSQE_ASYNC IOSQE_ASYNC branch of io_queue_sqe() is another place where an unitialised req->work can be accessed (i.e. prior io_req_init_async()). Nothing really bad though, it just looses IO_WQ_WORK_CONCURRENT flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-23 11:20:55 -06:00
Daniele Albano	61710e437f	io_uring: always allow drain/link/hardlink/async sqe flags We currently filter these for timeout_remove/async_cancel/files_update, but we only should be filtering for fixed file and buffer select. This also causes a second read of sqe->flags, which isn't needed. Just check req->flags for the relevant bits. This then allows these commands to be used in links, for example, like everything else. Signed-off-by: Daniele Albano <d.albano@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-18 14:15:16 -06:00
Jens Axboe	807abcb088	io_uring: ensure double poll additions work with both request types The double poll additions were centered around doing POLL_ADD on file descriptors that use more than one waitqueue (typically one for read, one for write) when being polled. However, it can also end up being triggered for when we use poll triggered retry. For that case, we cannot safely use req->io, as that could be used by the request type itself. Add a second io_poll_iocb pointer in the structure we allocate for poll based retry, and ensure we use the right one from the two paths. Fixes: `18bceab101` ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-17 19:41:05 -06:00
Pavel Begunkov	681fda8d27	io_uring: fix recvmsg memory leak with buffer selection io_recvmsg() doesn't free memory allocated for struct io_buffer. This can causes a leak when used with automatic buffer selection. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-15 13:35:56 -06:00
Pavel Begunkov	16d598030a	io_uring: fix not initialised work->flags `59960b9deb` ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: `7cdaf587de` ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-12 09:40:50 -06:00
Pavel Begunkov	dd821e0c95	io_uring: fix missing msg_name assignment Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-12 09:40:25 -06:00
Jens Axboe	309fc03a32	io_uring: account user memory freed when exit has been queued We currently account the memory after the exit work has been run, but that leaves a gap where a process has closed its ring and until the memory has been accounted as freed. If the memlocked ulimit is borderline, then that can introduce spurious setup errors returning -ENOMEM because the free work hasn't been run yet. Account this as freed when we close the ring, as not to expose a tiny gap where setting up a new ring can fail. Fixes: `85faa7b834` ("io_uring: punt final io_ring_ctx wait-and-free to workqueue") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-10 09:18:35 -06:00
Yang Yingliang	667e57da35	io_uring: fix memleak in io_sqe_files_register() I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0x607eeac06e78 (size 8): comm "test", pid 295, jiffies 4294735835 (age 31.745s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0 [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0 [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480 [<00000000591b89a6>] do_syscall_64+0x56/0xa0 [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Call percpu_ref_exit() on error path to avoid refcount memleak. Fixes: `05f3fb3c53` ("io_uring: avoid ring quiesce for fixed file set unregister and update") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-10 07:50:21 -06:00
Jens Axboe	4349f30ecb	io_uring: remove dead 'ctx' argument and move forward declaration We don't use 'ctx' at all in io_sq_thread_drop_mm(), it just works on the mm of the current task. Drop the argument. Move io_file_put_work() to where we have the other forward declarations of functions. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-09 15:07:01 -06:00
Jens Axboe	2bc9930e78	io_uring: get rid of __req_need_defer() We just have one caller of this, req_need_defer(), just inline the code in there instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-09 09:43:27 -06:00
Yang Yingliang	f3bd9dae37	io_uring: fix memleak in __io_sqe_files_update() I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: `c3a31e6056` ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-08 20:16:19 -06:00
Xiaoguang Wang	6d5f904904	io_uring: export cq overflow status to userspace For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring ring) { struct io_uring_cqe cqe; struct io_uring_sqe sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-08 19:17:06 -06:00
Jens Axboe	5acbbc8ed3	io_uring: only call kfree() for a non-zero pointer It's safe to call kfree() with a NULL pointer, but it's also pointless. Most of the time we don't have any data to free, and at millions of requests per second, the redundant function call adds noticeable overhead (about 1.3% of the runtime). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-08 15:15:26 -06:00
Dan Carpenter	aa340845ae	io_uring: fix a use after free in io_async_task_func() The "apoll" variable is freed and then used on the next line. We need to move the free down a few lines. Fixes: `0be0b0e33b` ("io_uring: simplify io_async_task_func()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-08 13:15:04 -06:00
Pavel Begunkov	b2edc0a77f	io_uring: don't burn CPU for iopoll on exit First of all don't spin in io_ring_ctx_wait_and_kill() on iopoll. Requests won't complete faster because of that, but only lengthen io_uring_release(). The same goes for offloaded cleanup in io_ring_exit_work() -- it already has waiting loop, don't do blocking active spinning. For that, pass min=0 into io_iopoll_[try_]reap_events(), so it won't actively spin. Leave the function if io_do_iopoll() there can't complete a request to sleep in io_ring_exit_work(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-07 12:00:03 -06:00

1 2 3 4 5 ...

65269 Commits