OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Josef Bacik	5e65a20341	blk-wbt: wake up all when we scale up, not down Tetsuo brought to my attention that I screwed up the scale_up/scale_down helpers when I factored out the rq-qos code. We need to wake up all the waiters when we add slots for requests to make, not when we shrink the slots. Otherwise we'll end up things waiting forever. This was a mistake and simply puts everything back the way it was. cc: stable@vger.kernel.org Fixes: `a79050434b` ("blk-rq-qos: refactor out common elements of blk-wbt") eported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-11 13:31:28 -06:00
Jens Axboe	133424a207	Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus Pull NVMe fix from Christoph. * 'nvme-4.19' of git://git.infradead.org/nvme: nvme: properly propagate errors in nvme_mpath_init	2018-09-28 09:41:40 -06:00
Juergen Gross	6c76786740	xen/blkfront: correct purging of persistent grants Commit `a46b53672b` ("xen/blkfront: cleanup stale persistent grants") introduced a regression as purged persistent grants were not pu into the list of free grants again. Correct that. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-28 09:40:39 -06:00
Jens Axboe	15c2068876	Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" Fix didn't work for all cases, reverting to add a (hopefully) better fix. This reverts commit `f151ba989d`. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-28 09:40:17 -06:00
Ilya Dryomov	587562d0c7	blk-mq: I/O and timer unplugs are inverted in blktrace trace_block_unplug() takes true for explicit unplugs and false for implicit unplugs. schedule() unplugs are implicit and should be reported as timer unplugs. While correct in the legacy code, this has been inverted in blk-mq since 4.11. Cc: stable@vger.kernel.org Fixes: `bd166ef183` ("blk-mq-sched: add framework for MQ capable IO schedulers") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-27 13:12:44 -06:00
Guoju Fang	0f843e65d9	bcache: add separate workqueue for journal_write to avoid deadlock After write SSD completed, bcache schedules journal_write work to system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM flag. system_wq is also a bound wq, and there may be no idle kworker on current processor. Creating a new kworker may unfortunately need to reclaim memory first, by shrinking cache and slab used by vfs, which depends on bcache device. That's a deadlock. This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM flag. It's rescuer thread will work to avoid the deadlock. Signed-off-by: Guoju Fang <fangguoju@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-27 09:47:01 -06:00
Boris Ostrovsky	f151ba989d	xen/blkfront: When purging persistent grants, keep them in the buffer Commit `a46b53672b` ("xen/blkfront: cleanup stale persistent grants") added support for purging persistent grants when they are not in use. As part of the purge, the grants were removed from the grant buffer, This eventually causes the buffer to become empty, with BUG_ON triggered in get_free_grant(). This can be observed even on an idle system, within 20-30 minutes. We should keep the grants in the buffer when purging, and only free the grant ref. Fixes: `a46b53672b` ("xen/blkfront: cleanup stale persistent grants") Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-27 08:26:38 -06:00
Damien Le Moal	854f31ccdd	block: fix deadline elevator drain for zoned block devices When the deadline scheduler is used with a zoned block device, writes to a zone will be dispatched one at a time. This causes the warning message: deadline: forced dispatching is broken (nr_sorted=X), please report this to be displayed when switching to another elevator with the legacy I/O path while write requests to a zone are being retained in the scheduler queue. Prevent this message from being displayed when executing elv_drain_elevator() for a zoned block device. __blk_drain_queue() will loop until all writes are dispatched and completed, resulting in the desired elevator queue drain without extensive modifications to the deadline code itself to handle forced-dispatch calls. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Fixes: `8dc8146f9c` ("deadline-iosched: Introduce zone locking support") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-26 19:57:24 -06:00
Keith Busch	530ca2c9bd	blk-mq: Allow blocking queue tag iter callbacks A recent commit runs tag iterator callbacks under the rcu read lock, but existing callbacks do not satisfy the non-blocking requirement. The commit intended to prevent an iterator from accessing a queue that's being modified. This patch fixes the original issue by taking a queue reference instead of reading it, which allows callbacks to make blocking calls. Fixes: `f5bbbbe4d6` ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter") Acked-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-25 20:17:59 -06:00
Susobhan Dey	bb830add19	nvme: properly propagate errors in nvme_mpath_init Signed-off-by: Susobhan Dey <susobhan.dey@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-09-25 16:21:40 -07:00
Omar Sandoval	b57e99b4b8	block: use nanosecond resolution for iostat Klaus Kusche reported that the I/O busy time in /proc/diskstats was not updating properly on 4.18. This is because we started using ktime to track elapsed time, and we convert nanoseconds to jiffies when we update the partition counter. However, this gets rounded down, so any I/Os that take less than a jiffy are not accounted for. Previously in this case, the value of jiffies would sometimes increment while we were doing I/O, so at least some I/Os were accounted for. Let's convert the stats to use nanoseconds internally. We still report milliseconds as before, now more accurately than ever. The value is still truncated to 32 bits for backwards compatibility. Fixes: `522a777566` ("block: consolidate struct request timestamp fields") Cc: stable@vger.kernel.org Reported-by: Klaus Kusche <klaus.kusche@computerix.info> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-21 20:26:59 -06:00
Jens Axboe	d611aaf336	Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus Pull NVMe fix from Christoph. * 'nvme-4.19' of git://git.infradead.org/nvme: nvme: count all ANA groups for ANA Log page	2018-09-20 09:10:38 -06:00
Andy Whitcroft	65eea8edc3	floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl The final field of a floppy_struct is the field "name", which is a pointer to a string in kernel memory. The kernel pointer should not be copied to user memory. The FDGETPRM ioctl copies a floppy_struct to user memory, including this "name" field. This pointer cannot be used by the user and it will leak a kernel address to user-space, which will reveal the location of kernel code and data and undermine KASLR protection. Model this code after the compat ioctl which copies the returned data to a previously cleared temporary structure on the stack (excluding the name pointer) and copy out to userspace from there. As we already have an inparam union with an appropriate member and that memory is already cleared even for read only calls make use of that as a temporary store. Based on an initial patch by Brian Belleville. CVE-2018-7755 Signed-off-by: Andy Whitcroft <apw@canonical.com> Broke up long line. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-20 09:09:48 -06:00
Jens Axboe	7ce5c8cd75	libata: mask swap internal and hardware tag hen we're comparing the hardware completion mask passed in from the driver with the internal tag pending mask, we need to account for the fact that the internal tag is different from the hardware tag. If not, then we can end up either prematurely completing the internal tag (since it's not set in the hw mask), or simply flag an error: ata2: illegal qc_active transition (100000000->00000001) If the internal tag is set, then swap that with the hardware tag in this case before comparing with what the hardware reports. Fixes: `28361c4036` ("libata: add extra internal command") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151 Cc: stable@vger.kernel.org Reported-by: Paul Sbarra <sbarra.paul@gmail.com> Tested-by: Paul Sbarra <sbarra.paul@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-20 08:30:55 -06:00
Hannes Reinecke	be1277f5eb	nvme: count all ANA groups for ANA Log page When issuing a short read on the ANA log page the number of groups should not change, even though the final returned data might contain less groups than that number. Signed-off-by: Hannes Reinecke <hare@suse.com> [switched to a for loop] Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-09-17 15:49:40 +02:00
Jens Axboe	b228ba1cb9	null_blk: fix zoned support for non-rq based operation The supported added for zones in null_blk seem to assume that only rq based operation is possible. But this depends on the queue_mode setting, if this is set to 0, then cmd->bio is what we need to be operating on. Right now any attempt to load null_blk with queue_mode=0 will insta-crash, since cmd->rq is NULL and null_handle_cmd() assumes it to always be set. Make the zoned code deal with bio's instead, or pass in the appropriate sector/nr_sectors instead. Fixes: `ca4b2a0119` ("null_blk: add zone support") Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-12 18:21:11 -06:00
Jens Axboe	01c5f85aeb	blk-cgroup: increase number of supported policies After merging the iolatency policy, we potentially now have 4 policies being registered, but only support 3. This causes one of them to fail loading. Takashi reports that BFQ no longer works for him, because it fails to load due to policy registration failure. Bump to 5 policies, and also add a warning for when we have exceeded the global amount. If we have to touch this again, we should switch to a dynamic scheme instead. Reported-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-11 10:59:53 -06:00
Jens Axboe	bf93585ee1	Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus Pull single NVMe fix from Christoph. * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet-rdma: fix possible bogus dereference under heavy load	2018-09-10 08:16:56 -06:00
Konstantin Khlebnikov	d5274b3cd6	block: bfq: swap puts in bfqg_and_blkg_put Fix trivial use-after-free. This could be last reference to bfqg. Fixes: `8f9bebc33d` ("block, bfq: access and cache blkg data only when safe") Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-06 11:32:58 -06:00
Mikulas Patocka	8b2ded1c94	block: don't warn when doing fsync on read-only devices It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch `b089cfd95d` ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: `721c7fc701` ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-05 16:14:36 -06:00
Sagi Grimberg	8407879c4e	nvmet-rdma: fix possible bogus dereference under heavy load Currently we always repost the recv buffer before we send a response capsule back to the host. Since ordering is not guaranteed for send and recv completions, it is posible that we will receive a new request from the host before we got a send completion for the response capsule. Today, we pre-allocate 2x rsps the length of the queue, but in reality, under heavy load there is nothing that is really preventing the gap to expand until we exhaust all our rsps. To fix this, if we don't have any pre-allocated rsps left, we dynamically allocate a rsp and make sure to free it when we are done. If under memory pressure we fail to allocate a rsp, we silently drop the command and wait for the host to retry. Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: dropped a superflous assignment] Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-09-05 12:18:01 -07:00
Jens Axboe	bc811f05d7	nbd: don't allow invalid blocksize settings syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl. We need proper validation of the input here. Not just if it's zero, but also if the value is a power-of-2 and in a valid range. Add that. Cc: stable@vger.kernel.org Reported-by: syzbot <syzbot+25dbecbec1e62c6b0dd4@syzkaller.appspotmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-04 11:54:58 -06:00
Dennis Zhou (Facebook)	3111885015	blkcg: use tryget logic when associating a blkg with a bio There is a very small change a bio gets caught up in a really unfortunate race between a task migration, cgroup exiting, and itself trying to associate with a blkg. This is due to css offlining being performed after the css->refcnt is killed which triggers removal of blkgs that reach their blkg->refcnt of 0. To avoid this, association with a blkg should use tryget and fallback to using the root_blkg. Fixes: `08e18eab0c` ("block: add bi_blkg to the bio for cgroups") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-31 14:48:58 -06:00
Dennis Zhou (Facebook)	59b57717ff	blkcg: delay blkg destruction until after writeback has finished Currently, blkcg destruction relies on a sequence of events: 1. Destruction starts. blkcg_css_offline() is called and blkgs release their reference to the blkcg. This immediately destroys the cgwbs (writeback). 2. With blkgs giving up their reference, the blkcg ref count should become zero and eventually call blkcg_css_free() which finally frees the blkcg. Jiufei Xue reported that there is a race between blkcg_bio_issue_check() and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent on the completion of all writeback associated with the blkcg. A count of the number of cgwbs is maintained and once that goes to zero, blkg destruction can follow. This should prevent premature blkg destruction related to writeback. The new process for blkcg cleanup is as follows: 1. Destruction starts. blkcg_css_offline() is called which offlines writeback. Blkg destruction is delayed on the cgwb_refcnt count to avoid punting potentially large amounts of outstanding writeback to root while maintaining any ongoing policies. Here, the base cgwb_refcnt is put back. 2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called and handles destruction of blkgs. This is where the css reference held by each blkg is released. 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called. This finally frees the blkg. It seems in the past blk-throttle didn't do the most understandable things with taking data from a blkg while associating with current. So, the simplification and unification of what blk-throttle is doing caused this. Fixes: `08e18eab0c` ("block: add bi_blkg to the bio for cgroups") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-31 14:48:56 -06:00
Dennis Zhou (Facebook)	6b06546206	Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" This reverts commit `4c6994806f`. Destroying blkgs is tricky because of the nature of the relationship. A blkg should go away when either a blkcg or a request_queue goes away. However, blkg's pin the blkcg to ensure they remain valid. To break this cycle, when a blkcg is offlined, blkgs put back their css ref. This eventually lets css_free() get called which frees the blkcg. The above commit (`4c6994806f`) breaks this order of events by trying to destroy blkgs in css_free(). As the blkgs still hold references to the blkcg, css_free() is never called. The race between blkcg_bio_issue_check() and cgroup_rmdir() will be addressed in the following patch by delaying destruction of a blkg until all writeback associated with the blkcg has been finished. Fixes: `4c6994806f` ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-31 14:48:54 -06:00
Jens Axboe	52bd456a66	Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-linus Pull NVMe fixes from Christoph. * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: free workqueue object if module init fails nvme-fcloop: Fix dropped LS's to removed target port nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event	2018-08-29 11:05:20 -06:00
Scott Bauer	8f3fafc9c2	cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()" There is another cast from unsigned long to int which causes a bounds check to fail with specially crafted input. The value is then used as an index in the slot array in cdrom_slot_status(). Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-29 08:09:20 -06:00
Chaitanya Kulkarni	04db0e5ec5	nvmet: free workqueue object if module init fails Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-08-28 08:40:44 +02:00
James Smart	afd299ca99	nvme-fcloop: Fix dropped LS's to removed target port When a targetport is removed from the config, fcloop will avoid calling the LS done() routine thinking the targetport is gone. This leaves the initiator reset/reconnect hanging as it waits for a status on the Create_Association LS for the reconnect. Change the filter in the LS callback path. If tport null (set when failed validation before "sending to remote port"), be sure to call done. This was the main bug. But, continue the logic that only calls done if tport was set but there is no remoteport (e.g. case where remoteport has been removed, thus host doesn't expect a completion). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-08-28 08:40:43 +02:00
Michal Wnukowski	f1ed3df20d	nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event In many architectures loads may be reordered with older stores to different locations. In the nvme driver the following two operations could be reordered: - Write shadow doorbell (dbbuf_db) into memory. - Read EventIdx (dbbuf_ei) from memory. This can result in a potential race condition between driver and VM host processing requests (if given virtual NVMe controller has a support for shadow doorbell). If that occurs, then the NVMe controller may decide to wait for MMIO doorbell from guest operating system, and guest driver may decide not to issue MMIO doorbell on any of subsequent commands. This issue is purely timing-dependent one, so there is no easy way to reproduce it. Currently the easiest known approach is to run "Oracle IO Numbers" (orion) that is shipped with Oracle DB: orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \ concat -write 40 -duration 120 -matrix row -testname nvme_test Where nvme_test is a .lun file that contains a list of NVMe block devices to run test against. Limiting number of vCPUs assigned to given VM instance seems to increase chances for this bug to occur. On test environment with VM that got 4 NVMe drives and 1 vCPU assigned the virtual NVMe controller hang could be observed within 10-20 minutes. That correspond to about 400-500k IO operations processed (or about 100GB of IO read/writes). Orion tool was used as a validation and set to run in a loop for 36 hours (equivalent of pushing 550M IO operations). No issues were observed. That suggest that the patch fixes the issue. Fixes: `f9f38e3338` ("nvme: improve performance for virtual NVMe devices") Signed-off-by: Michal Wnukowski <wnukowski@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: updated changelog and comment a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-08-28 08:40:42 +02:00
John Pittman	db193954ed	block: bsg: move atomic_t ref_count variable to refcount API Currently, variable ref_count within the bsg_device struct is of type atomic_t. For variables being used as reference counters, the refcount API should be used instead of atomic. The newer refcount API works to prevent counter overflows and use-after-free bugs. So, move this varable from the atomic API to refcount, potentially avoiding the issues mentioned. Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 19:17:02 -06:00
Chengguang Xu	62d2a19407	block: remove unnecessary condition check kmem_cache_destroy() can handle NULL pointer correctly, so there is no need to check e->icq_cache before calling kmem_cache_destroy(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 19:16:06 -06:00
Linus Walleij	46cb52ad41	ata: ftide010: Add a quirk for SQ201 The DMA is broken on this specific device for some unknown reason (probably badly designed or plain broken interface electronics) and will only work with PIO. Other users of the same hardware does not have this problem. Add a specific quirk so that this Gemini device gets DMA turned off. Also fix up some code around passing the port information around in probe while we're at it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 14:25:54 -06:00
Jens Axboe	b0a84beb2e	blk-wbt: remove dead code We already note and mark discard and swap IO from bio_to_wbt_flags(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 13:32:12 -06:00
Jens Axboe	057d3ccf93	Merge branch 'stable/for-jens-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Pull Xen block driver fixes from Konrad: "Fix for flushing out persistent pages at a deterministic rate" * 'stable/for-jens-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring xen/blkback: move persistent grants flags to bool xen/blkfront: reorder tests in xlblk_init() xen/blkfront: cleanup stale persistent grants xen/blkback: don't keep persistent grants too long	2018-08-27 11:27:32 -06:00
Jens Axboe	38cfb5a45e	blk-wbt: improve waking of tasks We have two potential issues: 1) After commit `2887e41b91`, we only wake one process at the time when we finish an IO. We really want to wake up as many tasks as can queue IO. Before this commit, we woke up everyone, which could cause a thundering herd issue. 2) A task can potentially consume two wakeups, causing us to (in practice) miss a wakeup. Fix both by providing our own wakeup function, which stops __wake_up_common() from waking up more tasks if we fail to get a queueing token. With the strict ordering we have on the wait list, this wakes the right tasks and the right amount of tasks. Based on a patch from Jianchao Wang <jianchao.w.wang@oracle.com>. Tested-by: Agarwal, Anchal <anchalag@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 11:27:26 -06:00
Jens Axboe	061a542753	blk-wbt: abstract out end IO completion handler Prep patch for calling the handler from a different context, no functional changes in this patch. Tested-by: Agarwal, Anchal <anchalag@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-27 11:27:24 -06:00
Juergen Gross	6f2f39ad1a	xen/blkback: remove unused pers_gnts_lock from struct xen_blkif_ring pers_gnts_lock isn't being used anywhere. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-08-27 12:12:04 -04:00
Juergen Gross	d77ff24e7f	xen/blkback: move persistent grants flags to bool The struct persistent_gnt flags member is meant to be a bitfield of different flags. There is only PERSISTENT_GNT_ACTIVE flag left, so convert it to a bool named "active". Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-08-27 12:12:04 -04:00
Juergen Gross	4bcddbae01	xen/blkfront: reorder tests in xlblk_init() In case we don't want pv block devices we should not test parameters for sanity and eventually print out error messages. So test precluding conditions before checking parameters. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-08-27 12:12:03 -04:00
Juergen Gross	a46b53672b	xen/blkfront: cleanup stale persistent grants Add a periodic cleanup function to remove old persistent grants which are no longer in use on the backend side. This avoids starvation in case there are lots of persistent grants for a device which no longer is involved in I/O business. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-08-27 12:12:03 -04:00
Juergen Gross	973e5405f2	xen/blkback: don't keep persistent grants too long Persistent grants are allocated until a threshold per ring is being reached. Those grants won't be freed until the ring is being destroyed meaning there will be resources kept busy which might no longer be used. Instead of freeing only persistent grants until the threshold is reached add a timestamp and remove all persistent grants not having been in use for a minute. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-08-27 12:12:03 -04:00
Linus Torvalds	b8dcdab36f	for-linus-20180825 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAluBbOMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqIBD/0e2/kyEE5UbIPeFK2bmrjDnbmo7Iq7Scb3 MQwYjYL1lQK2y0yu810Ews5acWiVxJtb0D5Hko/eBEiOE81qB9j+DpTB804ewYFR IQwB9pKYhUVuwULHQDyEQTlUUHZba/cgqeHtF+20AmLhdv8NqpL7NfQ4d0XScHti AAsxKcLJ5gWAafmLQB2ZVbjlmLib5rSTLdEjVSLAdKIsED9+g12/WwIzZsnK0s9Q ozGasr3jPegPBPwpR0pepBBlwInwvdsCFlmzIkL1B8Nn0EWdfLY1qsgS47xJynOf Wz/u9ROrHaKClfJuQ08ccqnl3JNUoHBKi06CGpZP+w1S8BmZpH6WAyMEkiLz/l0r 56US10YAC6hyq4Ectfa5aR3Ch/YnR+bVOsv6GVg4xqoXetr8sEHBM4i+iB05tsfL fAoHE7RPB8UwMdbJnnh+OTKsykKZG+IARurWjKBrM0j+zSNcEiRebs+5JI7tgCee MDRH+XOUzREbwcI5pggnz9QPavwf6j07clMfbbT6/C6CNjZJ4rAPwTvYEDnQVpos bXh2ifonOKVp+pq6LHW2vxBCmE6+Xh3Do+q5GBQ5png4BIM9A+hJsJT/ACqzjAip KYN0wdFPJu5eI2w8/X3fxgq2mrN7jUe/9x9HI+SnQ7cIkoQ7ZM7YcBrLtl6ignTN wH2uJIhQMQ== =KQSS -----END PGP SIGNATURE----- Merge tag 'for-linus-20180825' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few small fixes for this merge window: - Locking imbalance fix for bcache (Shan Hai) - A few small fixes for wbt. One is a cleanup/prep, one is a fix for an existing issue, and the last two are fixes for changes that went into this merge window (me)" * tag 'for-linus-20180825' of git://git.kernel.dk/linux-block: blk-wbt: don't maintain inflight counts if disabled blk-wbt: fix has-sleeper queueing check blk-wbt: use wq_has_sleeper() for wq active check blk-wbt: move disable check into get_limit() bcache: release dc->writeback_lock properly in bch_writeback_thread()	2018-08-25 13:30:23 -07:00
Linus Torvalds	db84abf5f8	This pull request contains a single fix for UBIFS: - Remove an empty file from UBIFS source -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAluBE84WHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wc73EACHsV8cp3Atkg18yZYQq3t8uOZG lZApUWTDr2SpWW88/OJ+RAFKGflKQqSiS1y1HYd2s8is4yYMA/XBto5pWvGqg03P GsFGzjMqRztZtNwg7n7JUD0Sq9WLoB1BimuMAG1b8vjTH2uuCxJZVyVi7ZF5W5Oc 065ebVblFhziI7jddRh/Gpwa2dvx3SlKZa0VImiJd6Mw9AnjO4grU+rOofswXTjX AKVjeRvNp0DuxJkxaqO9mH7mug87Owi4co4DBLCJu7pPEq2kvFRHQQtn6QKCdtRk gyYBD/ZGkhNYz3wPoQxlHM0bGVAToKtjcT1/T61lrlXKjGJu8S8IHC/raEyCB8yb Cz3XxMCH7LXyS3eaov4sZAiogENsoP7i20E2AKwlGil65gx4DVR814MTAi5m0AwN Kpv7GHRG9oMEar378Rrv8xptRJhj6nwCxAYuBwmPFlK7d95qMQ9hLjsvYp3TLuzn 7ihUzT8m/GZG1qrlkkk24Bjm834dCVRKPBJJ4beh3YG8/mj/h6OlQGs0NPRIF6O1 JknbQqXOz6UZ1UsvgtDK7p8GuHztKA4K+3Qv4BWwIRFisX+gTWKB8IveGPpouUaa K8NMOkbw8RsatxzM4DNZBPl2YNjvHuuTGudNqMn3pytsi4Y8y5YKdkpFVgZH1V25 QmQZbhicHSlfwWR0FQ== =QoL7 -----END PGP SIGNATURE----- Merge tag 'upstream-4.19-rc1-fix' of git://git.infradead.org/linux-ubifs Pull UBIFS fix from Richard Weinberger: "Remove an empty file from UBIFS source" * tag 'upstream-4.19-rc1-fix' of git://git.infradead.org/linux-ubifs: ubifs: Remove empty file.h	2018-08-25 13:27:35 -07:00
Linus Torvalds	04faac10fc	three small SMB3 fixes, one for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAluAdzkACgkQiiy9cAdy T1GjlAv/SOsNm2sj9Bcq/Z/CpPoFRFoJBFsLeReF78QdbF/+eUFuQJvq0aIK8BmV 0NRmvlQk9oCGQfWN0pLWeRn7a7xUqMQ7HYKSS1fzW1O+kJGlyA1cFjCbiZe4py6m AFSlpraPTtL7isX/ZMOyZ1D7YKMj4Fq5wndcHPSnMQXI2GlaUAip5k/zamXUbMmo dFaGDkXc67un6Y/04v18LsKJtOHgbVIAES2OgO0sjqiwp0cnGATsZl/OGzvsTo31 brstBum/0Ig2Mpr+5IXa4QFoP+naNXDyhv+D69huETwsMSImnjGL6L/GAgUUO5/t sDN6bpQdM9wqpckNuFcV9hKbBHQ1nZhzr0gUjRnmN8CJFHOgI2Ndo4J4N9slnWo9 wEXJV7RN5/VQh3Ozb7m+DpAYo4K4r3Oexxa3MG7+IFY8t89O39PBc0dEQ6O8ztaJ NCcubQVpSFxC7fwVjY56IHHofyznS7JLKMAcCe3+ssOHwJt0PZr/iqdBNHC9W4ZE 7fcNZtct =jMLB -----END PGP SIGNATURE----- Merge tag '4.19-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three small SMB3 fixes, one for stable" * tag '4.19-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number for cifs.ko to 2.12 cifs: check kmalloc before use cifs: check if SMB2 PDU size has been padded and suppress the warning cifs: create a define for how many iovs we need for an SMB2_open()	2018-08-25 13:17:53 -07:00
Linus Torvalds	1b2de5d039	mm/cow: don't bother write protecting already write-protected pages This is not normally noticeable, but repeated forks are unnecessarily expensive because they repeatedly dirty the parent page tables during the page table copy operation. It's trivial to just avoid write protecting the page table entry if it was already not writable. This patch was inspired by https://bugzilla.kernel.org/show_bug.cgi?id=200447 which points to an ancient "waste time re-doing fork" issue in the presence of lots of signals. That bug was fixed by Eric Biederman's signal handling series culminating in commit `c3ad2c3b02` ("signal: Don't restart fork when signals come in"), but the unnecessary work for repeated forks is still work just fixing, particularly since the fix is trivial. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-25 13:15:03 -07:00
Colin Ian King	e0fcfe1f1a	hpfs: remove unnecessary checks on the value of r when assigning error code At the point where r is being checked for different values, r is always going to be equal to 2 as the previous if statements jump to end or end1 if r is not 2. Hence the assignment to err can be simplified to just err an assignment without any checks on the value or r. Detected by CoverityScan, CID#1226737 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-25 12:42:33 -07:00
Jens Axboe	7634ccd2da	libata: maintainership update Tejun Heo wrote: > > I asked Jens whether he could take care of the libata tree and he > thankfully agreed, so, from now on, Jens will be the libata > maintainer. > > Thanks a lot! Thanks for your work in this area. I still remember the first linux storage summit we did in Vancouver 2001, Tejun was invited to talk about his libata error handling work. Before that, it was basically a crap shoot if we recovered properly or not... A lot of water has flown under the bridge since then! Here's an "official" patch. Linus, can you apply it? Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-25 12:35:45 -07:00
Linus Torvalds	0519359784	Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Nothing too interesting. Mostly ahci and ahci_platform changes, many around power management" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits) ata: ahci_platform: enable to get and control reset ata: libahci_platform: add reset control support ata: add an extra argument to ahci_platform_get_resources() ata: sata_rcar: Add r8a77965 support ata: sata_rcar: exclude setting of PHY registers in Gen3 ata: sata_rcar: really mask all interrupts on Gen2 and later Revert "ata: ahci_platform: allow disabling of hotplug to save power" ata: libahci: Allow reconfigure of DEVSLP register ata: libahci: Correct setting of DEVSLP register ata: ahci: Enable DEVSLP by default on x86 with SLP_S0 ata: ahci: Support state with min power but Partial low power state Revert "ata: ahci_platform: convert kcalloc to devm_kcalloc" ata: sata_rcar: Add rudimentary Runtime PM support ata: sata_rcar: Provide a short-hand for &pdev->dev ata: Only output sg element mapped number in verbose debug ata: Guard ata_scsi_dump_cdb() by ATA_VERBOSE_DEBUG ata: ahci_platform: convert kcalloc to devm_kcalloc ata: ahci_platform: convert kzallloc to kcalloc ata: ahci_platform: correct parameter documentation for ahci_platform_shutdown libata: remove ata_sff_data_xfer_noirq() ...	2018-08-24 13:20:33 -07:00
Linus Torvalds	596766102a	Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Just one commit from Steven to take out spin lock from trace event handlers" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/tracing: Move taking of spin lock out of trace event handlers	2018-08-24 13:19:27 -07:00

1 2 3 4 5 ...

781454 Commits All Branches Search

781454 Commits

All Branches