OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Al Viro	b9dc6f65bc	fix a fencepost error in pipe_advance() The logics in pipe_advance() used to release all buffers past the new position failed in cases when the number of buffers to release was equal to pipe->buffers. If that happened, none of them had been released, leaving pipe full. Worse, it was trivial to trigger and we end up with pipe full of uninitialized pages. IOW, it's an infoleak. Cc: stable@vger.kernel.org # v4.9 Reported-by: "Alan J. Wylie" <alan@wylie.me.uk> Tested-by: "Alan J. Wylie" <alan@wylie.me.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-14 19:50:41 -05:00
Al Viro	33844e6651	[iov_iter] fix iterate_all_kinds() on empty iterators Problem similar to ones dealt with in "fold checks into iterate_and_advance()" and followups, except that in this case we really want to do nothing when asked for zero-length operation - unlike zero-length iterate_and_advance(), zero-length iterate_all_kinds() has no side effects, and callers are simpler that way. That got exposed when copy_from_iter_full() had been used by tipc, which builds an msghdr with zero payload and (now) feeds it to a primitive based on iterate_all_kinds() instead of iterate_and_advance(). Reported-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-22 23:00:22 -05:00
Linus Torvalds	9a19a6db37	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: - more ->d_init() stuff (work.dcache) - pathname resolution cleanups (work.namei) - a few missing iov_iter primitives - copy_from_iter_full() and friends. Either copy the full requested amount, advance the iterator and return true, or fail, return false and do _not_ advance the iterator. Quite a few open-coded callers converted (and became more readable and harder to fuck up that way) (work.iov_iter) - several assorted patches, the big one being logfs removal * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: logfs: remove from tree vfs: fix put_compat_statfs64() does not handle errors namei: fold should_follow_link() with the step into not-followed link namei: pass both WALK_GET and WALK_MORE to should_follow_link() namei: invert WALK_PUT logics namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link() namei: saner calling conventions for mountpoint_last() namei.c: get rid of user_path_parent() switch getfrag callbacks to ..._full() primitives make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success [iov_iter] new primitives - copy_from_iter_full() and friends don't open-code file_inode() ceph: switch to use of ->d_init() ceph: unify dentry_operations instances lustre: switch to use of ->d_init()	2016-12-16 10:24:44 -08:00
Linus Torvalds	36869cb93d	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...	2016-12-13 10:19:16 -08:00
Al Viro	cbbd26b8b1	[iov_iter] new primitives - copy_from_iter_full() and friends copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. NOTE - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-05 14:33:36 -05:00
Abhi Das	680bb946a1	fix iov_iter_advance() for ITER_PIPE iov_iter_advance() needs to decrement iter->count by the number of bytes we'd moved beyond. Normal flavours do that, but ITER_PIPE doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files ends up with a bogus fallback to page cache read, resulting in incorrect values for file offset and bytes read. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-11-17 00:00:48 -05:00
Christoph Hellwig	2f8b544477	block,fs: untangle fs.h and blk_types.h Nothing in fs.h should require blk_types.h to be included. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Vegard Nossum	ffecee4f24	iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() Both import_iovec() and rw_copy_check_uvector() take an array (typically small and on-stack) which is used to hold an iovec array copy from userspace. This is to avoid an expensive memory allocation in the fast path (i.e. few iovec elements). The caller may have to check whether these functions actually used the provided buffer or allocated a new one -- but this differs between the too. Let's just add a kernel doc to clarify what the semantics are for each function. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-10-14 20:00:34 -04:00
Al Viro	1689c73a73	Fix off-by-one in __pipe_get_pages() it actually worked only when requested area ended on the page boundary... Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 10:40:01 -07:00
Linus Torvalds	abb5a14fa2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...	2016-10-10 13:04:49 -07:00
Miklos Szeredi	a779638cf6	pipe: add pipe_buf_release() helper Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-10-05 18:23:58 -04:00
Al Viro	241699cd72	new iov_iter flavour: pipe-backed iov_iter variant for passing data into pipe. copy_to_iter() copies data into page(s) it has allocated and stuffs them into the pipe; copy_page_to_iter() stuffs there a reference to the page given to it. Both will try to coalesce if possible. iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages() and friends will do as copy_to_iter() would have and return the pages where the data would've been copied. iov_iter_advance() will truncate everything past the spot it has advanced to. New primitive: iov_iter_pipe(), used for initializing those. pipe should be locked all along. Running out of space acts as fault would for iovec-backed ones; in other words, giving it to ->read_iter() may result in short read if the pipe overflows, or -EFAULT if it happens with nothing copied there. In other words, ->read_iter() on those acts pretty much like ->splice_read(). Moreover, all generic_file_splice_read() users, as well as many other ->splice_read() instances can be switched to that scheme - that'll happen in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-10-05 18:23:36 -04:00
Al Viro	4bce9f6ee8	get rid of separate multipage fault-in primitives * the only remaining callers of "short" fault-ins are just as happy with generic variants (both in lib/iov_iter.c); switch them to multipage variants, kill the "short" ones * rename the multipage variants to now available plain ones. * get rid of compat macro defining iov_iter_fault_in_multipage_readable by expanding it in its only user. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-27 18:12:24 -04:00
Al Viro	d4690f1e1c	fix iov_iter_fault_in_readable() ... by turning it into what used to be multipages counterpart Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-09-17 14:05:30 -07:00
Mikulas Patocka	3fa6c50731	mm: optimize copy_page_to/from_iter_iovec copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data to userspace or from userspace. These functions have a fast path where they map a page using kmap_atomic and a slow path where they use kmap. kmap is slower than kmap_atomic, so the fast path is preferred. However, on kernels without highmem support, kmap just calls page_address, so there is no need to avoid kmap. On kernels without highmem support, the fast path just increases code size (and cache footprint) and it doesn't improve copy performance in any way. This patch enables the fast path only if CONFIG_HIGHMEM is defined. Code size reduced by this patch: x86 (without highmem) 928 x86-64 960 sparc64 848 alpha 1136 pa-risc 1200 [akpm@linux-foundation.org: use IS_ENABLED(), per Andi] Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-28 16:07:41 -07:00
Ming Lei	1bdc76aea1	iov_iter: use bvec iterator to implement iterate_bvec() bvec has one native/mature iterator for long time, so not necessary to use the reinvented wheel for iterating bvecs in lib/iov_iter.c. Two ITER_BVEC test cases are run: - xfstest(-g auto) on loop dio/aio, no regression found - swap file works well under extreme stress(stress-ng --all 64 -t 800 -v), and lots of OOMs are triggerd, and the whole system still survives Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-09 10:02:47 -06:00
Linus Torvalds	0985b65d3b	Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs iov_iter regression fix from Al Viro: "Fix for braino in 'fold checks into iterate_and_advance()'" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do "fold checks into iterate_and_advance()" right	2016-05-25 15:59:09 -07:00
Al Viro	19f1845933	do "fold checks into iterate_and_advance()" right the only case when we should skip the iterate_and_advance() guts is when nothing's left in the iterator, _not_ just when requested amount is 0. Said guts will do nothing in the latter case anyway; the problem we tried to deal with in the aforementioned commit is that when there's nothing left and the amount requested is 0, we might end up deferencing one iovec too many; the value we fetch from there is discarded in that case, but theoretically it might oops if the iovec array ends exactly at the end of page with the next page not mapped. Bailing out on zero size requested had an unexpected side effect - zero-length segment in the beginning of iovec array ended up throwing do_loop_readv_writev() into infinite spin; we do not advance past the empty segment at all. Reproducer is trivial: echo '#include <sys/uio.h>' >a.c echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c cc a.c && ./a.out </proc/uptime which should end up with the process not hanging. Probably ought to go into LTP or xfstests... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-25 18:16:02 -04:00
Linus Torvalds	69370471d0	Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter cleanups from Al Viro. * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold checks into iterate_and_advance() rw_verify_area(): saner calling conventions aio: remove a pointless assignment	2016-05-18 11:46:23 -07:00
Al Viro	dd254f5a38	fold checks into iterate_and_advance() they are open-coded in all users except iov_iter_advance(), and there they wouldn't be a bad idea either - as it is, iov_iter_advance(i, 0) ends up dereferencing potentially past the end of iovec array. It doesn't do anything with the value it reads, and very unlikely to trigger an oops on dereference, but it is not impossible. Reported-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-09 14:04:29 -04:00
Al Viro	357f435d8a	fix the copy vs. map logics in blk_rq_map_user_iov() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-08 19:46:28 -04:00
Al Viro	e12675853d	iov_iter: export import_single_range() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 20:42:19 -05:00
Al Viro	36f7a8a4cd	iov_iter: constify {csum_and_,}copy_to_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 20:42:15 -05:00
Al Viro	36e9f6535f	Merge branch 'iov_iter' into for-next	2015-04-11 22:26:51 -04:00
Anton Altaparmakov	171a02032b	VFS: Add iov_iter_fault_in_multipages_readable() simillar to iov_iter_fault_in_readable() but differs in that it is not limited to faulting in the first iovec and instead faults in "bytes" bytes iterating over the iovecs as necessary. Also, instead of only faulting in the first and last page of the range, all pages are faulted in. This function is needed by NTFS when it does multi page file writes. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:24:32 -04:00
Al Viro	bc917be810	saner iov_iter initialization primitives iovec-backed iov_iter instances are assumed to satisfy several properties: * no more than UIO_MAXIOV elements in iovec array * total size of all ranges is no more than MAX_RW_COUNT * all ranges pass access_ok(). The problem is, invariants of data structures should be established in the primitives creating those data structures, not in the code using those primitives. And iov_iter_init() violates that principle. For a while we managed to get away with that, but once the use of iov_iter started to spread, it didn't take long for shit to hit the fan - missed check in sys_sendto() had introduced a roothole. We _do_ have primitives for importing and validating iovecs (both native and compat ones) and those primitives are almost always followed by shoving the resulting iovec into iov_iter. Life would be considerably simpler (and safer) if we combined those primitives with initializing iov_iter. That gives us two new primitives - import_iovec() and compat_import_iovec(). Calling conventions: iovec = iov_array; err = import_iovec(direction, uvec, nr_segs, ARRAY_SIZE(iov_array), &iovec, &iter); imports user vector into kernel space (into iov_array if it fits, allocated if it doesn't fit or if iovec was NULL), validates it and sets iter up to refer to it. On success 0 is returned and allocated kernel copy (or NULL if the array had fit into caller-supplied one) is returned via iovec. On failure all allocations are undone and -E... is returned. If the total size of ranges exceeds MAX_RW_COUNT, the excess is silently truncated. compat_import_iovec() expects uvec to be a pointer to user array of compat_iovec; otherwise it's identical to import_iovec(). Finally, import_single_range() sets iov_iter backed by single-element iovec covering a user-supplied range - err = import_single_range(direction, address, size, iovec, &iter); does validation and sets iter up. Again, size in excess of MAX_RW_COUNT gets silently truncated. Next commits will be switching the things up to use of those and reducing the amount of iov_iter_init() instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-30 11:08:16 -04:00
Al Viro	d879cb8341	move iov_iter.c from mm/ to lib/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-17 22:22:17 -05:00

27 Commits