OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ming Lei	81ba6abd2b	block: loop: mark bvec as ITER_BVEC_FLAG_NO_REF loop is one block device, for any bio submitted to this device, the upper layer does guarantee that pages added to loop's bio won't go away when the bio is in-flight. So mark loop's bvec as ITER_BVEC_FLAG_NO_REF then get_page/put_page can be saved for serving loop's IO. Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-01 12:11:48 -06:00
Holger Hoffstätte	56a85fd837	loop: properly observe rotational flag of underlying device The loop driver always declares the rotational flag of its device as rotational, even when the device of the mapped file is nonrotational, as is the case with SSDs or on tmpfs. This can confuse filesystem tools which are SSD-aware; in my case I frequently forget to tell mkfs.btrfs that my loop device on tmpfs is nonrotational, and that I really don't need any automatic metadata redundancy. The attached patch fixes this by introspecting the rotational flag of the mapped file's underlying block device, if it exists. If the mapped file's filesystem has no associated block device - as is the case on e.g. tmpfs - we assume nonrotational storage. If there is a better way to identify such non-devices I'd love to hear them. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: holger@applied-asynchrony.com Signed-off-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Benjamin Gordon <bmgordon@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-04-01 08:15:40 -06:00
Dongli Zhang	f7c8a4120e	loop: access lo_backing_file only when the loop device is Lo_bound Commit `758a58d0bc` ("loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and "lo->lo_state = Lo_unbound" into different critical regions protected by loop_ctl_mutex. However, there is below race that the NULL lo->lo_backing_file would be accessed when the backend of a loop is another loop device, e.g., loop0's backend is a file, while loop1's backend is loop0. loop0's backend is file loop1's backend is loop0 __loop_clr_fd() mutex_lock(&loop_ctl_mutex); lo->lo_backing_file = NULL; --> set to NULL mutex_unlock(&loop_ctl_mutex); loop_set_fd() mutex_lock_killable(&loop_ctl_mutex); loop_validate_file() f = l->lo_backing_file; --> NULL access if loop0 is not Lo_unbound mutex_lock(&loop_ctl_mutex); lo->lo_state = Lo_unbound; mutex_unlock(&loop_ctl_mutex); lo->lo_backing_file should be accessed only when the loop device is Lo_bound. In fact, the problem has been introduced already in commit `7ccd0791d9` ("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which loop_validate_file() could see devices in Lo_rundown state with which it did not count. It was harmless at that point but still. Fixes: `7ccd0791d9` ("loop: Push loop_ctl_mutex down into loop_clr_fd()") Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-03-18 08:20:53 -06:00
Dongli Zhang	758a58d0bc	loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() Commit `0da03cab87` ("loop: Fix deadlock when calling blkdev_reread_part()") moves blkdev_reread_part() out of the loop_ctl_mutex. However, GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result, __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and will not rescan the loop device to delete all partitions. Below are steps to reproduce the issue: step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100 step2 # losetup -P /dev/loop0 tmp.raw step3 # parted /dev/loop0 mklabel gpt step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1 step5 # losetup -d /dev/loop0 Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and there is below kernel warning message: [ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22) This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part(). Fixes: `0da03cab87` ("loop: Fix deadlock when calling blkdev_reread_part()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-02-22 15:51:22 -07:00
Dongli Zhang	40853d6fc6	loop: do not print warn message if partition scan is successful Do not print warn message when the partition scan returns 0. Fixes: `d57f3374ba` ("loop: Move special partition reread handling in loop_clr_fd()") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-02-22 15:51:21 -07:00
Ming Lei	56d18f62f5	block: kill BLK_MQ_F_SG_MERGE QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-02-15 08:40:12 -07:00
Ming Lei	86af5952a8	block: loop: pass multi-page bvec to iov_iter iov_iter is implemented on bvec itererator helpers, so it is safe to pass multi-page bvec to it, and this way is much more efficient than passing one page in each bvec. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-02-15 08:40:11 -07:00
Jaegeuk Kim	5db470e229	loop: drop caches if offset or block_size are changed If we don't drop caches used in old offset or block_size, we can get old data from new offset/block_size, which gives unexpected data to user. For example, Martijn found a loopback bug in the below scenario. 1) LOOP_SET_FD loads first two pages on loop file 2) LOOP_SET_STATUS64 changes the offset on the loop file 3) mount is failed due to the cached pages having wrong superblock Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Reported-by: Martijn Coenen <maco@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-01-09 20:51:44 -07:00
Chengguang Xu	c41103691b	block: loop: remove redundant code Code cleanup for removing redundant break in switch case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-22 14:45:31 -07:00
Chengguang Xu	38a3499f6d	block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add() blk_mq_init_queue() will not return NULL pointer to its caller, so it's better to replace IS_ERR_OR_NULL using IS_ERR in loop_add(). If in the future things change to check NULL pointer inside loop_add(), we should return -ENOMEM as return code instead of PTR_ERR(NULL). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-16 09:01:38 -07:00
Dennis Zhou	db6638d7d1	blkcg: remove bio->bi_css and instead use bio->bi_blkg Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-07 22:26:37 -07:00
Tetsuo Handa	628bd85947	loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl() Commit `0a42e99b58` ("loop: Get rid of loop_index_mutex") forgot to remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when replacing loop_index_mutex with loop_ctl_mutex. Fixes: `0a42e99b58` ("loop: Get rid of loop_index_mutex") Reported-by: syzbot <syzbot+c0138741c2290fc5e63f@syzkaller.appspotmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-12 08:44:06 -07:00
Jan Kara	c28445fa06	loop: Get rid of 'nested' acquisition of loop_ctl_mutex The nested acquisition of loop_ctl_mutex (->lo_ctl_mutex back then) has been introduced by commit `f028f3b2f9` "loop: fix circular locking in loop_clr_fd()" to fix lockdep complains about bd_mutex being acquired after lo_ctl_mutex during partition rereading. Now that these are properly fixed, let's stop fooling lockdep. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:37 -07:00
Jan Kara	1dded9acf6	loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex Code in loop_change_fd() drops reference to the old file (and also the new file in a failure case) under loop_ctl_mutex. Similarly to a situation in loop_set_fd() this can create a circular locking dependency if this was the last reference holding the file open. Delay dropping of the file reference until we have released loop_ctl_mutex. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:36 -07:00
Jan Kara	0da03cab87	loop: Fix deadlock when calling blkdev_reread_part() Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to complain about circular lock dependency between bdev->bd_mutex and lo->lo_ctl_mutex. The problem is that on loop device open or close lo_open() and lo_release() get called with bdev->bd_mutex held and they need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is called with loop_ctl_mutex held, it will call blkdev_reread_part() which acquires bdev->bd_mutex. See syzbot report for details [1]. Move call to blkdev_reread_part() in __loop_clr_fd() from under loop_ctl_mutex to finish fixing of the lockdep warning and the possible deadlock. [1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588 Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:32 -07:00
Jan Kara	85b0a54a82	loop: Move loop_reread_partitions() out of loop_ctl_mutex Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to complain about circular lock dependency between bdev->bd_mutex and lo->lo_ctl_mutex. The problem is that on loop device open or close lo_open() and lo_release() get called with bdev->bd_mutex held and they need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is called with loop_ctl_mutex held, it will call blkdev_reread_part() which acquires bdev->bd_mutex. See syzbot report for details [1]. Move all calls of loop_rescan_partitions() out of loop_ctl_mutex to avoid lockdep warning and fix deadlock possibility. [1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588 Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:31 -07:00
Jan Kara	d57f3374ba	loop: Move special partition reread handling in loop_clr_fd() The call of __blkdev_reread_part() from loop_reread_partition() happens only when we need to invalidate partitions from loop_release(). Thus move a detection for this into loop_clr_fd() and simplify loop_reread_partition(). This makes loop_reread_partition() safe to use without loop_ctl_mutex because we use only lo->lo_number and lo->lo_file_name in case of error for reporting purposes (thus possibly reporting outdate information is not a big deal) and we are safe from 'lo' going away under us by elevated lo->lo_refcnt. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:30 -07:00
Jan Kara	c371077000	loop: Push loop_ctl_mutex down to loop_change_fd() Push loop_ctl_mutex down to loop_change_fd(). We will need this to be able to call loop_reread_partitions() without loop_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:28 -07:00
Jan Kara	757ecf40b7	loop: Push loop_ctl_mutex down to loop_set_fd() Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to call loop_reread_partitions() without lo_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:27 -07:00
Jan Kara	550df5fdac	loop: Push loop_ctl_mutex down to loop_set_status() Push loop_ctl_mutex down to loop_set_status(). We will need this to be able to call loop_reread_partitions() without loop_ctl_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:22 -07:00
Jan Kara	4a5ce9ba58	loop: Push loop_ctl_mutex down to loop_get_status() Push loop_ctl_mutex down to loop_get_status() to avoid the unusual convention that the function gets called with loop_ctl_mutex held and releases it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:20 -07:00
Jan Kara	7ccd0791d9	loop: Push loop_ctl_mutex down into loop_clr_fd() loop_clr_fd() has a weird locking convention that is expects loop_ctl_mutex held, releases it on success and keeps it on failure. Untangle the mess by moving locking of loop_ctl_mutex into loop_clr_fd(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:19 -07:00
Jan Kara	a2505b799a	loop: Split setting of lo_state from loop_clr_fd Move setting of lo_state to Lo_rundown out into the callers. That will allow us to unlock loop_ctl_mutex while the loop device is protected from other changes by its special state. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:18 -07:00
Jan Kara	a13165441d	loop: Push lo_ctl_mutex down into individual ioctls Push acquisition of lo_ctl_mutex down into individual ioctl handling branches. This is a preparatory step for pushing the lock down into individual ioctl handling functions so that they can release the lock as they need it. We also factor out some simple ioctl handlers that will not need any special handling to reduce unnecessary code duplication. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:16 -07:00
Jan Kara	0a42e99b58	loop: Get rid of loop_index_mutex Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as there is no good reason to keep these two separate and it just complicates the locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:15 -07:00
Jan Kara	967d1dc144	loop: Fold __loop_release into loop_release __loop_release() has a single call site. Fold it there. This is currently not a huge win but it will make following replacement of loop_index_mutex more obvious. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:14 -07:00
Tetsuo Handa	310ca162d7	block/loop: Use global lock for ioctl() operation. syzbot is reporting NULL pointer dereference [1] which is caused by race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other loop devices at loop_validate_file() without holding corresponding lo->lo_ctl_mutex locks. Since ioctl() request on loop devices is not frequent operation, we don't need fine grained locking. Let's use global lock in order to allow safe traversal at loop_validate_file(). Note that syzbot is also reporting circular locking dependency between bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling blkdev_reread_part() with lock held. This patch does not address it. [1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3 [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:11 -07:00
Tetsuo Handa	b1ab5fa309	block/loop: Don't grab "struct file" for vfs_getattr() operation. vfs_getattr() needs "struct path" rather than "struct file". Let's use path_get()/path_put() rather than get_file()/fput(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-08 06:30:09 -07:00
Linus Torvalds	5f21585384	for-linus-20181102 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlvchGgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpj/1D/4kEQx4ncnFoZk8QshHV1L++rH3BbcLjQDd Wbh9ZSIQdI/gHTzS6bE7x3YfcbpMWPMO3+jFawdfRiFTEjlF8vQ+mnJ+Btb3z4D6 mGEeFGVhHExlp2a0x/Ma8YWVNlMB7BE8Tq73bZEVMY+9lbpmDW/vp7Sfa87LBDKQ ZmY+My+VdHN7qLtQ7t3W/HtpbU+kcXMMd3ICjK4i+ofXy6mynk4+oQ2jwyXc5L86 UCJCsTsSRr3CgbnkW/uprHo0XHk8i7O/4C3oR+x4pAIxCCa9g+vmw0EO9fvi/2iQ qe8jKdm7Y09xu/TiPBa7iz45tdh0cNMJKo3OezmSF9Np+r69KL5C/U4GRPKN3Iwm keoqn14ScABkYMSe4ys1AdEgKD6bNUaW3r/lJxTH2oUR23mjnCLp7c4WD/G+MlbB CzoakQyCHTZmDFLr2Kc8bkjmpil2T2UFfmLIDAu30LWIYeSGpiIO/V+g1foJMF2f 06ERltNvgX1BJjoh4NSWySLEf1ZtkUU60NeATRol6gwhnIyLrHsgfm6OEhqlW/7x Xc1BWyzX7K6c3Dskk/u5aSRyXOyRC9KkMt3/2XexeDNHkte9yMH0IgSvopPBuER8 +iPvPjNp7ychTKZB3zpSnlqGgePTjbufIEBtO3OyUmDZKjUqxahtxkQfmPhoclu+ XdR4ArcqNg== =0zM4 -----END PGP SIGNATURE----- Merge tag 'for-linus-20181102' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "The biggest part of this pull request is the revert of the blkcg cleanup series. It had one fix earlier for a stacked device issue, but another one was reported. Rather than play whack-a-mole with this, revert the entire series and try again for the next kernel release. Apart from that, only small fixes/changes. Summary: - Indentation fixup for mtip32xx (Colin Ian King) - The blkcg cleanup series revert (Dennis Zhou) - Two NVMe fixes. One fixing a regression in the nvme request initialization in this merge window, causing nvme-fc to not work. The other is a suspend/resume p2p resource issue (James, Keith) - Fix sg discard merge, allowing us to merge in cases where we didn't before (Jianchao Wang) - Call rq_qos_exit() after the queue is frozen, preventing a hang (Ming) - Fix brd queue setup, fixing an oops if we fail setting up all devices (Ming)" * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block: nvme-pci: fix conflicting p2p resource adds nvme-fc: fix request private initialization blkcg: revert blkcg cleanups series block: brd: associate with queue until adding disk block: call rq_qos_exit() after queue is frozen mtip32xx: clean an indentation issue, remove extraneous tabs block: fix the DISCARD request merge	2018-11-02 11:25:48 -07:00
Linus Torvalds	9931a07d51	Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...	2018-11-01 19:58:52 -07:00
Dennis Zhou	b5f2954d30	blkcg: revert blkcg cleanups series This reverts a series committed earlier due to null pointer exception bug report in [1]. It seems there are edge case interactions that I did not consider and will need some time to understand what causes the adverse interactions. The original series can be found in [2] with a follow up series in [3]. [1] https://www.spinics.net/lists/cgroups/msg20719.html [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/ This reverts the following commits: `d459d853c2`, `b2c3fa5467`, `101246ec02`, `b3b9f24f5f`, `e2b0989954`, `f0fcb3ec89`, `c839e7a03f`, `bdc2491708`, `74b7c02a9b`, `5bf9a1f3b4`, `a7b39b4e96`, `07b05bcc32`, `49f4c2dc2b`, `27e6fa996c` Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-01 19:59:53 -06:00
David Howells	aa563d7bca	iov_iter: Separate type from direction and use accessor functions In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells <dhowells@redhat.com>	2018-10-24 00:41:07 +01:00
Dennis Zhou (Facebook)	c839e7a03f	blkcg: remove bio->bi_css and instead use bio->bi_blkg Prior patches ensured that all bios are now associated with some blkg. This now makes bio->bi_css unnecessary as blkg maintains a reference to the blkcg already. This patch removes the field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-21 20:29:13 -06:00
Gustavo A. R. Silva	d893ff8603	block/loop: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Colin Ian King	f4354a94e2	loop: remove redundant pointer inode Pointer inode is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'inode' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:53 -06:00
Linus Torvalds	29119529d8	for-linus-20180706 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltAFSMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpleLD/4qr8zm9EvQbdg3KFyvePehlAuEKkEp1jrk wf6LQ9Uc+bNbnXCvxk+U3tXiPDmFOMakZBAyjDFLxL6ArfSEqymJMg1JxXUYwq3l tESQNNHjZBZaRpv5YuRLwDPpEivGlA80AE+2tkNTXDaGxRHahZnqV6/eUk1YAtbw N1ADOS+EquS/p2DH07QhLqKNWecz6SSr9R+oVU6if1ivC3T0uZbTmKMtUspZEwv/ 2AZr6my0DSk15FdFAGpDaDjPMO26VlWaOrQPXH30/63KysWR2NNuasKVGjcmGSjC 9/aJREC1BE/hcDSNmu8MhFaL2JFy2F3l7VJzdXlYFbpP0OC6hqdlXPl+CKP43w2L iO2N4g+2DbXOf6Si7NtArZMQfY/PQipbcj7Y8o9sq0P6VNO9+csGHzwa8tCQSEi9 CtinHYV8L7YyopZYRvgRaX1ND3S7ClFRFX9kkAmTA5iOHHJtaKUexKOHu+Pvn5zi 3/O2y6U7F34CkM5C92+nvTTTPx6bTgs8U9c+V3Eut9InroQAU/SwvH/NsT67fFsu FcTWKfQ0HGgcsAxOAN2/vEztgZvu6kwhu3KkkQmt2Qy7B9dtBf44pTdv0b+RsPPq Ow/PtFaHK7lwxsGlrqXLdmvqlmJs8DXcc2TQw/JX8lPIc04Hs3ZpxMyt6wSDJToL OCZ2cY/bRg== =pge/ -----END PGP SIGNATURE----- Merge tag 'for-linus-20180706' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Two minor fixes for this series: - add LOOP_SET_BLOCK_SIZE as compat ioctl (Evan Green) - drbd use-after-free fix (Lars Ellenberg)" * tag 'for-linus-20180706' of git://git.kernel.dk/linux-block: loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl drbd: fix access after free	2018-07-06 19:13:42 -07:00
Evan Green	9fea4b3952	loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl This change adds LOOP_SET_BLOCK_SIZE as one of the supported ioctls in lo_compat_ioctl. It only takes an unsigned long argument, and in practice a 32-bit value works fine. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Evan Green <evgreen@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-03 07:52:34 -06:00
Kees Cook	6da2ec5605	treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) \| kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) \| kmalloc(sizeof(TYPE) * C2, ...) \| kmalloc(C1 * C2 * C3, ...) \| kmalloc(C1 * C2, ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Linus Torvalds	a3818841bd	for-linus-20180608 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlsa4sQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqPNEADbZby01Q6i+dTZYIosz5+/gq8gSkCcpQ/T krK/f2MlwD7Rdog1BnGNNP5XOqK8pKIGdARL1FQKpViii6xGIoOc2F4VK+vO44yR LI+BeeOM6rWNOAoBO4CqeZz/Fv5IYi7KURWogYhZMqrxBqT2OeD9MMowm5NulBix YZ2ttFWiTJScJttJCDPE6cu9EjHDeK63Nr7+UU80k3atU4eUpUp1mRFGmtaYWulq l3KaENCwm00WCVqM4i/gVWr2AkgTZqAAyeCx7IrPsrQrCMEhxEpMnU52e2kXSxhM Qx6FLNEOjzARuBDurtfJE74usQcW2xDLzT8fh2UStnPpt6S/JX6f9GMBVk0G7I8B 8COF4DF+bzdbhhz2SiZaTFOmDML5H1iQ8t6lTTms0Bnq29mE3E4QFom8lO+2BxN3 g6PFhvYaOkhTVtV5BPXpXs9xZBLHrv5G/JopXsZh0RF1kpiova+nfA1K2uJPFpJ0 NcHuMZKmIG3uBqY3fj5Ul+zuVhZ/1v8B69zWoSWafLrk+VRdcEAniuY2E6SsQFP5 gV4GNja85S53DnlIVwEUXPYMiY6opiwP53yMNMvkB/FdzaQB5Ehdif2fhZu64QmE TtqbHtAuV0VZ3z4GrJ3XNbV6Np4wMOhYls4lTkZsnqNNO2sw/eoTYcmwxLDEYOQw uQ9rhZh4IQ== =N3BP -----END PGP SIGNATURE----- Merge tag 'for-linus-20180608' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few fixes for this merge window, where some of them should go in sooner rather than later, hence a new pull this week. This pull request contains: - Set of NVMe fixes, mostly follow up cleanups/fixes to the queue changes, but also teardown/removal and misc changes (Christop/Dan/ Johannes/Sagi/Steve). - Two lightnvm fixes for issues that showed up in this window (Colin/Wei). - Failfast/driver flags inheritance for flush requests (Hannes). - The md device put sanitization and fix (Kent). - dm bio_set inheritance fix (me). - nbd discard granularity fix (Josef). - nbd consistency in command printing (Kevin). - Loop recursion validation fix (Ted). - Partition overlap check (Wang)" [ .. and now my build is warning-free again thanks to the md fix - Linus ] * tag 'for-linus-20180608' of git://git.kernel.dk/linux-block: (22 commits) nvme: cleanup double shift issue nvme-pci: make CMB SQ mod-param read-only nvme-pci: unquiesce dead controller queues nvme-pci: remove HMB teardown on reset nvme-pci: queue creation fixes nvme-pci: remove unnecessary completion doorbell check nvme-pci: remove unnecessary nested locking nvmet: filter newlines from user input nvme-rdma: correctly check for target keyed sgl support nvme: don't hold nvmf_transports_rwsem for more than transport lookups nvmet: return all zeroed buffer when we can't find an active namespace md: Unify mddev destruction paths dm: use bioset_init_from_src() to copy bio_set block: add bioset_init_from_src() helper block: always set partition number to '0' in blk_partition_remap() block: pass failfast and driver-specific flags to flush requests nbd: set discard_alignment to the granularity nbd: Consistently use request pointer in debug messages. block: add verifier for cmdline partition lightnvm: pblk: fix resource leak of invalid_bitmap ...	2018-06-08 13:36:19 -07:00
Linus Torvalds	4a189982e2	Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio iopriority support from Al Viro: "The rest of aio stuff for this cycle - Adam's aio ioprio series" * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: aio ioprio use ioprio_check_cap ret val fs: aio ioprio add explicit block layer dependence fs: iomap dio set bio prio from kiocb prio fs: blkdev set bio prio from kiocb prio fs: Add aio iopriority support fs: Convert kiocb rw_hint from enum to u16 block: add ioprio_check_cap function	2018-06-08 10:00:20 -07:00
Theodore Ts'o	d2ac838e4c	loop: add recursion validation to LOOP_CHANGE_FD Refactor the validation code used in LOOP_SET_FD so it is also used in LOOP_CHANGE_FD. Otherwise it is possible to construct a set of loop devices that all refer to each other. This can lead to a infinite loop in starting with "while (is_loop_device(f)) .." in loop_set_fd(). Fix this by refactoring out the validation code and using it for LOOP_CHANGE_FD as well as LOOP_SET_FD. Reported-by: syzbot+4349872271ece473a7c91190b68b4bac7c5dbc87@syzkaller.appspotmail.com Reported-by: syzbot+40bd32c4d9a3cc12a339@syzkaller.appspotmail.com Reported-by: syzbot+769c54e66f994b041be7@syzkaller.appspotmail.com Reported-by: syzbot+0a89a9ce473936c57065@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-05 09:08:21 -06:00
Linus Torvalds	f459c34538	for-4.18/block-20180603 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJbFIrHAAoJEPfTWPspceCm2+kQAKo7o7HL30aRxJYu+gYafkuW PV47zr3e4vhMDEzDaMsh1+V7I7bm3uS+NZu6cFbcV+N9KXFpeb4V4Hvvm5cs+OC3 WCOBi4eC1h4qnDQ3ZyySrCMN+KHYJ16pZqddEjqw+fhVudx8i+F+jz3Y4ZMDDc3q pArKZvjKh2wEuYXUMFTjaXY46IgPt+er94OwvrhyHk+4AcA+Q/oqSfSdDahUC8jb BVR3FV4I3NOHUaru0RbrUko13sVZSboWPCIFrlTDz8xXcJOnVHzdVS1WLFDXLHnB O8q9cADCfa4K08kz68RxykcJiNxNvz5ChDaG0KloCFO+q1tzYRoXLsfaxyuUDg57 Zd93OFZC6hAzXdhclDFIuPET9OQIjDzwphodfKKmDsm3wtyOtydpA0o7JUEongp0 O1gQsEfYOXmQsXlo8Ot+Z7Ne/HvtGZ91JahUa/59edxQbcKaMrktoyQsQ/d1nOEL 4kXID18wPcFHWRQHYXyVuw6kbpRtQnh/U2m1eenSZ7tVQHwoe6mF3cfSf5MMseak k8nAnmsfEvOL4Ar9ftg61GOrImaQlidxOC2A8fmY5r0Sq/ZldvIFIZizsdTTCcni 8SOTxcQowyqPf5NvMNQ8cKqqCJap3ppj4m7anZNhbypDIF2TmOWsEcXcMDn4y9on fax14DPLo59gBRiPCn5f =nga/ -----END PGP SIGNATURE----- Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...	2018-06-04 07:58:06 -07:00
Adam Manzanares	d9a08a9e61	fs: Add aio iopriority support This is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. This patch depends on block: add ioprio_check_cap function. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-31 10:50:55 -04:00
Joe Perches	5657a819a8	block drivers/block: Use octal not symbolic permissions Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-24 13:38:59 -06:00
Jeff Layton	eedffa28c9	loop: clear wb_err in bd_inode when detaching backing file When a loop block device encounters a writeback error, that error will get propagated to the bd_inode's wb_err field. If we then detach the backing file from it, attach another and fsync it, we'll get back the writeback error that we had from the previous backing file. This is a bit of a grey area as POSIX doesn't cover loop devices, but it is somewhat counterintuitive. If we detach a backing file from the loopdev while there are still unreported errors, take it as a sign that we're no longer interested in the previous file, and clear out the wb_err in the loop blockdev. Reported-and-Tested-by: Theodore Y. Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-21 12:36:03 -06:00
Tetsuo Handa	d3349b6b3c	loop: remember whether sysfs_create_group() was done syzbot is hitting WARN() triggered by memory allocation fault injection [1] because loop module is calling sysfs_remove_group() when sysfs_create_group() failed. Fix this by remembering whether sysfs_create_group() succeeded. [1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Renamed sysfs_ready -> sysfs_inited. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-07 15:26:36 -06:00
Jens Axboe	f9de14bc7e	loop: handle short DIO reads We ran into an issue with loop and btrfs, where btrfs would complain about checksum errors. It turns out that is because we don't handle short reads at all, we just zero fill the remainder. Worse than that, we don't handle the filling properly, which results in loop trying to advance a single bio by much more than its size, since it doesn't take chaining into account. Handle short reads appropriately, by simply retrying at the new correct offset. End the remainder of the request with EIO, if we get a 0 read. Fixes: `bc07c10a36` ("block: loop: support DIO & AIO") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-14 22:34:27 -06:00
Jens Axboe	1894e91654	loop: remove cmd->rq member We can always get at the request from the payload, no need to store a pointer to it. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-14 22:34:27 -06:00
Omar Sandoval	bdac616db9	loop: fix LOOP_GET_STATUS lock imbalance Commit `2d1d4c1e59` made loop_get_status() drop lo_ctx_mutex before returning, but the loop_get_status_old(), loop_get_status64(), and loop_get_status_compat() wrappers don't call loop_get_status() if the passed argument is NULL. The callers expect that the lock is dropped, so make sure we drop it in that case, too. Reported-by: syzbot+31e8daa8b3fc129e75f2@syzkaller.appspotmail.com Fixes: `2d1d4c1e59` ("loop: don't call into filesystem while holding lo_ctl_mutex") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-10 08:38:46 -06:00
Tetsuo Handa	1e047eaab3	block/loop: fix deadlock after loop_set_status syzbot is reporting deadlocks at __blkdev_get() [1]. ---------------------------------------- [ 92.493919] systemd-udevd D12696 525 1 0x00000000 [ 92.495891] Call Trace: [ 92.501560] schedule+0x23/0x80 [ 92.502923] schedule_preempt_disabled+0x5/0x10 [ 92.504645] __mutex_lock+0x416/0x9e0 [ 92.510760] __blkdev_get+0x73/0x4f0 [ 92.512220] blkdev_get+0x12e/0x390 [ 92.518151] do_dentry_open+0x1c3/0x2f0 [ 92.519815] path_openat+0x5d9/0xdc0 [ 92.521437] do_filp_open+0x7d/0xf0 [ 92.527365] do_sys_open+0x1b8/0x250 [ 92.528831] do_syscall_64+0x6e/0x270 [ 92.530341] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 92.931922] 1 lock held by systemd-udevd/525: [ 92.933642] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x73/0x4f0 ---------------------------------------- The reason of deadlock turned out that wait_event_interruptible() in blk_queue_enter() got stuck with bdev->bd_mutex held at __blkdev_put() due to q->mq_freeze_depth == 1. ---------------------------------------- [ 92.787172] a.out S12584 634 633 0x80000002 [ 92.789120] Call Trace: [ 92.796693] schedule+0x23/0x80 [ 92.797994] blk_queue_enter+0x3cb/0x540 [ 92.803272] generic_make_request+0xf0/0x3d0 [ 92.807970] submit_bio+0x67/0x130 [ 92.810928] submit_bh_wbc+0x15e/0x190 [ 92.812461] __block_write_full_page+0x218/0x460 [ 92.815792] __writepage+0x11/0x50 [ 92.817209] write_cache_pages+0x1ae/0x3d0 [ 92.825585] generic_writepages+0x5a/0x90 [ 92.831865] do_writepages+0x43/0xd0 [ 92.836972] __filemap_fdatawrite_range+0xc1/0x100 [ 92.838788] filemap_write_and_wait+0x24/0x70 [ 92.840491] __blkdev_put+0x69/0x1e0 [ 92.841949] blkdev_close+0x16/0x20 [ 92.843418] __fput+0xda/0x1f0 [ 92.844740] task_work_run+0x87/0xb0 [ 92.846215] do_exit+0x2f5/0xba0 [ 92.850528] do_group_exit+0x34/0xb0 [ 92.852018] SyS_exit_group+0xb/0x10 [ 92.853449] do_syscall_64+0x6e/0x270 [ 92.854944] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 92.943530] 1 lock held by a.out/634: [ 92.945105] #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x3c/0x1e0 ---------------------------------------- The reason of q->mq_freeze_depth == 1 turned out that loop_set_status() forgot to call blk_mq_unfreeze_queue() at error paths for info->lo_encrypt_type != NULL case. ---------------------------------------- [ 37.509497] CPU: 2 PID: 634 Comm: a.out Tainted: G W 4.16.0+ #457 [ 37.513608] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 37.518832] RIP: 0010:blk_freeze_queue_start+0x17/0x40 [ 37.521778] RSP: 0018:ffffb0c2013e7c60 EFLAGS: 00010246 [ 37.524078] RAX: 0000000000000000 RBX: ffff8b07b1519798 RCX: 0000000000000000 [ 37.527015] RDX: 0000000000000002 RSI: ffffb0c2013e7cc0 RDI: ffff8b07b1519798 [ 37.529934] RBP: ffffb0c2013e7cc0 R08: 0000000000000008 R09: 47a189966239b898 [ 37.532684] R10: dad78b99b278552f R11: 9332dca72259d5ef R12: ffff8b07acd73678 [ 37.535452] R13: 0000000000004c04 R14: 0000000000000000 R15: ffff8b07b841e940 [ 37.538186] FS: 00007fede33b9740(0000) GS:ffff8b07b8e80000(0000) knlGS:0000000000000000 [ 37.541168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.543590] CR2: 00000000206fdf18 CR3: 0000000130b30006 CR4: 00000000000606e0 [ 37.546410] Call Trace: [ 37.547902] blk_freeze_queue+0x9/0x30 [ 37.549968] loop_set_status+0x67/0x3c0 [loop] [ 37.549975] loop_set_status64+0x3b/0x70 [loop] [ 37.549986] lo_ioctl+0x223/0x810 [loop] [ 37.549995] blkdev_ioctl+0x572/0x980 [ 37.550003] block_ioctl+0x34/0x40 [ 37.550006] do_vfs_ioctl+0xa7/0x6d0 [ 37.550017] ksys_ioctl+0x6b/0x80 [ 37.573076] SyS_ioctl+0x5/0x10 [ 37.574831] do_syscall_64+0x6e/0x270 [ 37.576769] entry_SYSCALL_64_after_hwframe+0x42/0xb7 ---------------------------------------- [1] https://syzkaller.appspot.com/bug?id=cd662bc3f6022c0979d01a262c318fab2ee9b56f Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <bot+48594378e9851eab70bcd6f99327c7db58c5a28a@syzkaller.appspotmail.com> Fixes: `ecdd09597a` ("block/loop: fix race between I/O and set_status") Cc: Ming Lei <tom.leiming@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable <stable@vger.kernel.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-04-10 08:38:46 -06:00

1 2 3 4 5 ...

282 Commits